Audio dataset
Task 1 - Acoustic scene classification
In case you are using the provided baseline system, there is no need to download the dataset as the system will automatically download needed dataset for you.
Development dataset
Evaluation dataset
In publications using the datasets, cite as:
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. TUT database for acoustic scene classification and sound event detection. In 24th European Signal Processing Conference 2016 (EUSIPCO 2016). Budapest, Hungary, 2016.
TUT Database for Acoustic Scene Classification and Sound Event Detection
Abstract
We introduce TUT Acoustic Scenes 2016 database for environmental sound research, consisting ofbinaural recordings from 15 different acoustic environments. A subset of this database, called TUT Sound Events 2016, contains annotations for individual sound events, specifically created for sound event detection. TUT Sound Events 2016 consists of residential area and home environments, and is manually annotated to mark onset, offset and label of sound events. In this paper we present the recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models. The database is publicly released to provide support for algorithm development and common ground for comparison of different techniques.
Task 2 - Detection of rare sound events
In case you are using the provided baseline system, there is no need to download the dataset as the system will automatically download needed dataset for you.
Development dataset
Evaluation dataset
In publications using the datasets, cite as:
A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Raj, and T. Virtanen. DCASE 2017 challenge setup: tasks, datasets and baseline system. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), 85–92. November 2017.
DCASE 2017 Challenge Setup: Tasks, Datasets and Baseline System
Abstract
DCASE 2017 Challenge consists of four tasks: acoustic scene classification, detection of rare sound events, sound event detection in real-life audio, and large-scale weakly supervised sound event detection for smart cars. This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset. The baseline systems for all tasks rely on the same implementation using multilayer perceptron and log mel-energies, but differ in the structure of the output layer and the decision making process, as well as the evaluation of system output using task specific metrics.
Keywords
Sound scene analysis, Acoustic scene classification, Sound event detection, Audio tagging, Rare sound events
Task 3 - Sound event detection in real life audio
In case you are using the provided baseline system, there is no need to download the dataset as the system will automatically download needed dataset for you.
Development dataset
Evaluation dataset
In publications using the datasets, cite as:
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. TUT database for acoustic scene classification and sound event detection. In 24th European Signal Processing Conference 2016 (EUSIPCO 2016). Budapest, Hungary, 2016.
TUT Database for Acoustic Scene Classification and Sound Event Detection
Abstract
We introduce TUT Acoustic Scenes 2016 database for environmental sound research, consisting ofbinaural recordings from 15 different acoustic environments. A subset of this database, called TUT Sound Events 2016, contains annotations for individual sound events, specifically created for sound event detection. TUT Sound Events 2016 consists of residential area and home environments, and is manually annotated to mark onset, offset and label of sound events. In this paper we present the recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models. The database is publicly released to provide support for algorithm development and common ground for comparison of different techniques.
Task4 - Large-scale weakly supervised sound event detection for smart cars
Development dataset
Evaluation dataset
(.zip)
password "DCASE_2017_evaluation_set"
Baseline system
The baseline system is meant to implement basic approach for acoustic scene classification and sound event detection, and provide some comparison point for the participants while developing their systems.
The system is implemented using Python (version 2.7 and 3.6). Participants are allowed to build their system on top of the given baseline system. The system has all needed functionality for dataset handling, storing / accessing features and models, and evaluating the results, making the adaptation for one's needs rather easy. The baseline system is also a good starting point for entry level researchers.
Evaluation metric code
sed_eval
is used for the evaluation. Use sed_eval.EventBasedMetrics
with parameters t_collar=0.5, percentage_of_length=0.5, evaluate_onset=True, evaluate_offset=False
for Task 2, and sed_eval.sound_event.SegmentBasedMetrics
with parameters time_resolution=1
for Task 3 to align it with the baseline system results.
Install the toolbox with:
pip install sed_eval
Or clone directly from repository:
Detailed information on metrics implemented in the toolbox see:
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. Metrics for polyphonic sound event detection. Applied Sciences, 6(6):162, 2016. URL: http://www.mdpi.com/2076-3417/6/6/162, doi:10.3390/app6060162.
Metrics for Polyphonic Sound Event Detection
Abstract
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources active simultaneously. The system output in this case contains overlapping events, marked as multiple sounds detected as being active at the same time. The polyphonic system output requires a suitable procedure for evaluation against a reference. Metrics from neighboring fields such as speech recognition and speaker diarization can be used, but they need to be partially redefined to deal with the overlapping events. We present a review of the most common metrics in the field and the way they are adapted and interpreted in the polyphonic case. We discuss segment-based and event-based definitions of each metric and explain the consequences of instance-based and class-based averaging using a case study. In parallel, we provide a toolbox containing implementations of presented metrics.
Toolboxes
sed_eval - Evaluation toolbox for Sound Event Detection
sed_eval
is an open source Python toolbox to provide a standardized, and transparent way to evaluate sound event detection systems.
sed_vis - Visualization toolbox for Sound Event Detection
sed_vis
is a toolbox for visually inspecting sound event annotations and playing back the audio while following the annotations. The annotations are visualized with an event-roll.