Audio dataset

Development dataset are currently available. Evaluation datasets without ground truth will be released shortly before the submission deadline.

1. Acoustic scene classification

TUT Acoustic scenes 2016, development dataset (7.5 GB)

TUT Acoustic scenes 2016, evaluation dataset (2.5 GB)

In case you are using the provided baseline system, there is no need to download the dataset as the system will automatically download needed dataset for you.

In publications using the datasets, cite as:

Publication

Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. TUT database for acoustic scene classification and sound event detection. In 24th European Signal Processing Conference 2016 (EUSIPCO 2016). Budapest, Hungary, 2016.

PDF

TUT Database for Acoustic Scene Classification and Sound Event Detection

Abstract

We introduce TUT Acoustic Scenes 2016 database for environmental sound research, consisting ofbinaural recordings from 15 different acoustic environments. A subset of this database, called TUT Sound Events 2016, contains annotations for individual sound events, specifically created for sound event detection. TUT Sound Events 2016 consists of residential area and home environments, and is manually annotated to mark onset, offset and label of sound events. In this paper we present the recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models. The database is publicly released to provide support for algorithm development and common ground for comparison of different techniques.

PDF

2. Sound event detection in synthetic audio

Task 2, train and development datasets (120 MB)

Task 2, evaluation dataset (314 MB)

3. Sound event detection in real life audio

TUT Sound events 2016, development dataset (0.9 GB)

TUT Sound events 2016, evaluation dataset (0.4 GB)

In case you are using provided baseline system, there is no need to download dataset as the system will automatically download needed datasets for you.

In publications using the datasets, cite as:

Publication

PDF

TUT Database for Acoustic Scene Classification and Sound Event Detection

Abstract

PDF

4. Domestic audio tagging

CHiME-Home, development & evaluation dataset (3.9 GB)

In case you are using provided baseline system, there is no need to download dataset as the system will automatically download needed datasets for you.

Submissions

All the submissions (system outputs and technical reports describing the systems) are published as a one package. This package is meant to archive the DCASE0216 Challenge outcome and enable later evaluation with additional evaluation metrics.

DCASE2016 Challenge Submissions Package (28.7 MB)

Baseline systems

The system is meant to implement basic approach for sound event detection, and provide some comparison point for the participants while developing their systems.

1+3. Acoustic scene classification and Sound event detection in real life audio

The baseline systems for task 1 and task 3 share the code base, and implements quite similar approach for both tasks. The baseline systems will download automatically the needed datasets and produce the reported baseline results when ran with the default parameters.

Baseline systems are provided for both Python and Matlab platforms. Python implementation is regarded as the main implementation. The Matlab implementation replicates the code structure of the main baseline to allow easy switching between platforms. The implementations are not intended to produce exactly the same results. The differences between implementations are due to the used libraries for MFCC extraction (RASTAMAT vs Librosa) and for GMM modeling (VOICEBOX vs scikit-learn).

Participants are allowed to build their system on top of the given baseline systems. The systems have all needed functionality for dataset handling, storing / accessing features and models, and evaluating the results, making the adaptation for one's needs rather easy. The baseline systems are also good starting point for entry level researchers.

The baseline systems provide also reference implementation of evaluation metrics.

In publications using the datasets, cite as:

Publication

PDF

TUT Database for Acoustic Scene Classification and Sound Event Detection

Abstract

PDF

Python implementation

DCASE2016 Task 1&3 Python baseline, repository

DCASE2016 Task 1&3 Python baseline, release
version 1.0.7 (.zip)

Matlab implementation

DCASE2016 Task 1&3 Matlab baseline, repository

DCASE2016 Task 1&3 Matlab baseline, release
version 1.0.6 (.zip)

2. Synthetic audio sound event detection

Matlab implementation

DCASE2016 Task 2 Matlab baseline, repository
version 1.0.2 (.zip)

4. Domestic audio tagging

Python implementation

DCASE2016 Task 4 Python baseline

Evaluation metric code

1. Acoustic scene classification

Code is available with the baseline system:

Python implementation from src.evaluation import DCASE2016_SceneClassification_Metrics.
Matlab implementation, use class src/evaluation/DCASE2016_SceneClassification_Metrics.m.

2. Sound event detection in synthetic audio

Code is available with the baseline system. Use classes:

metrics/DCASE2016_EventDetection_SegmentBasedMetrics.m
metrics/DCASE2016_EventDetection_EventBasedMetrics.m

3. Sound event detection in real life audio

Code is available with the baseline system:

Python implementation from src.evaluation import DCASE2016_EventDetection_SegmentBasedMetrics and from src.evaluation import DCASE2016_EventDetection_EventBasedMetrics.
Matlab implementation, use classes src/evaluation/DCASE2016_EventDetection_SegmentBasedMetrics.m and src/evaluation/DCASE2016_EventDetection_EventBasedMetrics.m.

sed_eval - Evaluation toolbox for Sound Event Detection

sed_eval contains same metrics as baseline system, and they are tested to give same values.

sed_eval - Evaluation toolbox for Sound Event Detection

4. Domestic audio tagging

Equal Error rate (EER).

DCASE2016 Task 4 Python & Matlab, evalution

Toolboxes

sed_eval - Evaluation toolbox for Sound Event Detection

sed_eval contains same metrics as baseline system, and they are tested to give same values. Use parameters time_resolution=1 and t_collar=0.250 to align it with the baseline system results.

sed_eval - Evaluation toolbox for Sound Event Detection

sed_vis - Visualization toolbox for Sound Event Detection

sed_vis is a toolbox for visually inspecting sound event annotations and playing back the audio while following the annotations. The annotations are visualized with an event-roll.

sed_vis - Visualization toolbox for Sound Event Detection

Content

Audio dataset

1. Acoustic scene classification

TUT Database for Acoustic Scene Classification and Sound Event Detection

Abstract

2. Sound event detection in synthetic audio

3. Sound event detection in real life audio

TUT Database for Acoustic Scene Classification and Sound Event Detection

Abstract

4. Domestic audio tagging

Submissions

Baseline systems

1+3. Acoustic scene classification and Sound event detection in real life audio

TUT Database for Acoustic Scene Classification and Sound Event Detection

Abstract

Python implementation

Matlab implementation

2. Synthetic audio sound event detection

Matlab implementation

4. Domestic audio tagging

Python implementation

Evaluation metric code

1. Acoustic scene classification

2. Sound event detection in synthetic audio

3. Sound event detection in real life audio

sed_eval - Evaluation toolbox for Sound Event Detection

4. Domestic audio tagging

Toolboxes

sed_eval - Evaluation toolbox for Sound Event Detection

sed_vis - Visualization toolbox for Sound Event Detection