Audio dataset

Task 1 - Acoustic scene classification

In case you are using the provided baseline system, there is no need to download the dataset as the system will automatically download needed dataset for you.

Development dataset

TUT Acoustic scenes 2017, development dataset (10.7 GB)

Evaluation dataset

TUT Acoustic scenes 2017, evaluation dataset (3.6 GB)

In publications using the datasets, cite as:

Publication

Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. TUT database for acoustic scene classification and sound event detection. In 24th European Signal Processing Conference 2016 (EUSIPCO 2016). Budapest, Hungary, 2016.

PDF

TUT Database for Acoustic Scene Classification and Sound Event Detection

Abstract

We introduce TUT Acoustic Scenes 2016 database for environmental sound research, consisting ofbinaural recordings from 15 different acoustic environments. A subset of this database, called TUT Sound Events 2016, contains annotations for individual sound events, specifically created for sound event detection. TUT Sound Events 2016 consists of residential area and home environments, and is manually annotated to mark onset, offset and label of sound events. In this paper we present the recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models. The database is publicly released to provide support for algorithm development and common ground for comparison of different techniques.

PDF

Task 2 - Detection of rare sound events

In case you are using the provided baseline system, there is no need to download the dataset as the system will automatically download needed dataset for you.

Development dataset

TUT Rare sound events 2017, development dataset (10.7 GB)

Evaluation dataset

TUT Rare sound events 2017, evaluation dataset (3.6 GB)

In publications using the datasets, cite as:

Publication

A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Raj, and T. Virtanen. DCASE 2017 challenge setup: tasks, datasets and baseline system. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), 85–92. November 2017.

PDF

DCASE 2017 Challenge Setup: Tasks, Datasets and Baseline System

Abstract

DCASE 2017 Challenge consists of four tasks: acoustic scene classification, detection of rare sound events, sound event detection in real-life audio, and large-scale weakly supervised sound event detection for smart cars. This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset. The baseline systems for all tasks rely on the same implementation using multilayer perceptron and log mel-energies, but differ in the structure of the output layer and the decision making process, as well as the evaluation of system output using task specific metrics.

Keywords

Sound scene analysis, Acoustic scene classification, Sound event detection, Audio tagging, Rare sound events

PDF

Task 3 - Sound event detection in real life audio

In case you are using the provided baseline system, there is no need to download the dataset as the system will automatically download needed dataset for you.

Development dataset

TUT Sound events 2017, development dataset v2 (1.3 GB)

version 2

Evaluation dataset

TUT Sound events 2017, evaluation dataset (388.2 MB)

In publications using the datasets, cite as:

Publication

PDF

TUT Database for Acoustic Scene Classification and Sound Event Detection

Abstract

PDF

Task4 - Large-scale weakly supervised sound event detection for smart cars

Development dataset

Large-scale weakly supervised sound event detection for smart cars, Development dataset

Evaluation dataset

Large-scale weakly supervised sound event detection for smart cars, Evaluation dataset (863 MB)
(.zip)
password "DCASE_2017_evaluation_set"

Baseline system

The baseline system is meant to implement basic approach for acoustic scene classification and sound event detection, and provide some comparison point for the participants while developing their systems.

The system is implemented using Python (version 2.7 and 3.6). Participants are allowed to build their system on top of the given baseline system. The system has all needed functionality for dataset handling, storing / accessing features and models, and evaluating the results, making the adaptation for one's needs rather easy. The baseline system is also a good starting point for entry level researchers.

DCASE2017 Baseline, repository

Evaluation metric code

sed_eval is used for the evaluation. Use sed_eval.EventBasedMetrics with parameters t_collar=0.5, percentage_of_length=0.5, evaluate_onset=True, evaluate_offset=False for Task 2, and sed_eval.sound_event.SegmentBasedMetrics with parameters time_resolution=1 for Task 3 to align it with the baseline system results.

Install the toolbox with:

pip install sed_eval

Or clone directly from repository:

sed_eval - Evaluation toolbox for Sound Event Detection

Detailed information on metrics implemented in the toolbox see:

Publication

Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. Metrics for polyphonic sound event detection. Applied Sciences, 6(6):162, 2016. URL: http://www.mdpi.com/2076-3417/6/6/162, doi:10.3390/app6060162.

PDF

Metrics for Polyphonic Sound Event Detection

Abstract

This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources active simultaneously. The system output in this case contains overlapping events, marked as multiple sounds detected as being active at the same time. The polyphonic system output requires a suitable procedure for evaluation against a reference. Metrics from neighboring fields such as speech recognition and speaker diarization can be used, but they need to be partially redefined to deal with the overlapping events. We present a review of the most common metrics in the field and the way they are adapted and interpreted in the polyphonic case. We discuss segment-based and event-based definitions of each metric and explain the consequences of instance-based and class-based averaging using a case study. In parallel, we provide a toolbox containing implementations of presented metrics.

PDF Web publication

Toolbox

Toolboxes

sed_eval - Evaluation toolbox for Sound Event Detection

sed_eval is an open source Python toolbox to provide a standardized, and transparent way to evaluate sound event detection systems.

sed_eval - Evaluation toolbox for Sound Event Detection

sed_vis - Visualization toolbox for Sound Event Detection

sed_vis is a toolbox for visually inspecting sound event annotations and playing back the audio while following the annotations. The annotations are visualized with an event-roll.

sed_vis - Visualization toolbox for Sound Event Detection

Content

Audio dataset

Task 1 - Acoustic scene classification

TUT Database for Acoustic Scene Classification and Sound Event Detection

Abstract

Task 2 - Detection of rare sound events

DCASE 2017 Challenge Setup: Tasks, Datasets and Baseline System

Abstract

Keywords

Task 3 - Sound event detection in real life audio

TUT Database for Acoustic Scene Classification and Sound Event Detection

Abstract

Task4 - Large-scale weakly supervised sound event detection for smart cars

Baseline system

Evaluation metric code

Metrics for Polyphonic Sound Event Detection

Abstract

Toolboxes

sed_eval - Evaluation toolbox for Sound Event Detection

sed_vis - Visualization toolbox for Sound Event Detection