Challenge has ended. Full results for this task can be found here

Description

This task will focus on event detection of office sounds in synthetic mixtures. This task will focus on event detection of overlapping office sounds in synthetic mixtures. By using synthetic mixtures in testing, this task will study the behaviour of tested algorithms when facing different levels of complexity (noise, polyphony), with the added benefit of a very accurate ground truth.

Figure 1: Overview of sound event detection system.

Audio dataset

Training material for this task consists of isolated sound events for each class and synthetic mixtures of the same examples in multiple SNR and event density conditions. The participants are allowed to use any combination of them for training their system. The test data will consist of synthetic mixtures of (source-independent) sound examples at various SNR levels, event density conditions and polyphony.

The provided sound event categories are: (11)

Clearing throat
Coughing
Door knock
Door slam
Drawer
Human laughter
Keyboard
Keys (put on table)
Page turning
Phone ringing
Speech

There will be 20 samples provided for each sound event class in the training set, plus a development set consisting of 18 minutes of synthetic mixture material in 2 minute length audio files. The test set will be provided close to the challenge deadline.

Recording and annotation procedure

Audio is provided by IRCCYN, École Centrale de Nantes. The material was recorded in a calm environment, using the shotgun microphone AT8035 connected to a ZOOM H4n recorder. Audio files are sampled at 44.1kHz and are monophonic. Parameters controlling the synthesized material include the event-to-background ratio (EBR) with values -6, 0, 6 dB, the presence/absence of overlapping events (monophonic/polyphonic scene), as well as the number of events per class. Isolated examples in the training set will be annotated with start time, end time and event label for all sound events, while for the synthetic mixtures annotations are provided automatically by the event sequence synthesizer.

Challenge setup

Task 2 consists of two public subsets: a training dataset and a development dataset. The training dataset consists of 20 isolated sound segments per event class. The development dataset consists of 18 2min recordings, in various noise and event density conditions (see the README.txt file in the dataset folder for more details).

Participants are not allowed to use external data for system development. Manipulation of provided data is allowed. Participants are allowed to use any combination of the training and development datasets for training their systems.

Download

** Development dataset **

Task 2, train and development datasets (120 MB)

Task 2, evaluation dataset (314 MB)

Submission

Detailed information for the challenge submission can found from submission page. One should submit single .txt file per evaluated audio recording. The output file should contain a list of detected events, specified by the onset, offset and the event ID separated by a tab. Format:

[event onset in seconds (float)][tab][event offset in seconds (float)][tab][event ID (string)]

Example file

1.387392290    3.262403627    pageturn
5.073560090    5.793378684    knock
...

There should be no additional tab characters anywhere, and there should be no whitespace added after the label, just the newline. The 11 event IDs to be used for the .txt output are: clearthroat, cough, doorslam, drawer, keyboard, keys, knock, laughter, pageturn, phone, speech.

Task rules

Only the provided development dataset can be used to train the submitted system.
The development dataset can be augmented only by mixing data sampled from a pdf; use of real recordings is forbidden.
The evaluation dataset cannot be used to train the submitted system; the use of statistics about the evaluation dataset in the decision making is also forbidden.
Technical report with sufficient description of the system has to be submitted along with the system outputs.

More information on submission process.

Evaluation

Tasks 2 and 3 will use the same metrics. The main metric for the challenge will be Total error rate ER. Error rate will be evaluated in one-second segments over the entire test set. Ranking of submitted systems will be done using this metric. We will also use the onset-only event-based F-measure (with 200ms tolerance) as an additional metric.

Detailed description of metrics used can be found here.

Code for evaluation is available with the baseline system. Use classes:

metrics/DCASE2016_EventDetection_SegmentBasedMetrics.m
metrics/DCASE2016_EventDetection_EventBasedMetrics.m

Results

Rank	Submission Information				Segment-based (overall)
Rank	Code	Author	Affiliation	Technical Report	ER	F1
	Choi_task2_1	Inkyu Choi	Department of Electrical and Computer Engineering and INMC, Seoul National University, Seoul, South Korea	task-sound-event-detection-in-synthetic-audio-results#Choi2016	0.3660	78.7
	DCASE2016 baseline	Emmanouil Benetos	Queen Mary University of London, London, United Kingdom	task-sound-event-detection-in-synthetic-audio-results#Benetos2016	0.8933	37.0
	Giannoulis_task2_1	Panagiotis Giannoulis	School of ECE, National Technical University of Athens, Athens, Greece; Athena Research and Innovation Center, Maroussi, Greece	task-sound-event-detection-in-synthetic-audio-results#Giannoulis2016	0.6774	55.8
	Gutierrez_task2_1	J.M. Gutiérrez-Arriola	Escuela Técnica Superior de Ingeniería y Sistemas de Telecomunicacíon, Universidad Politécnica de Madrid, Madrid, Spain	task-sound-event-detection-in-synthetic-audio-results#Gutirrez-Arriola2016	2.0870	25.0
	Hayashi_task2_1	Tomoki Hayashi	Nagoya University, Nagoya, Japan	task-sound-event-detection-in-synthetic-audio-results#Hayashi2016	0.4082	78.1
	Hayashi_task2_2	Tomoki Hayashi	Nagoya University, Nagoya, Japan	task-sound-event-detection-in-synthetic-audio-results#Hayashi2016	0.4958	76.0
	Komatsu_task2_1	Tatsuya Komatsu	Data Science Research Laboratories, NEC Corporation, Kawasaki, Japan	task-sound-event-detection-in-synthetic-audio-results#Komatsu2016	0.3307	80.2
	Kong_task2_1	Qiuqiang Kong	Centre for Vision, Speech and Signal Processing, University of Surrey, Surrey, United Kingdom	task-sound-event-detection-in-synthetic-audio-results#Kong2016	3.5464	12.6
	Phan_task2_1	Huy Phan	Institute for Signal Processing, University of Luebeck, Luebeck, Germany; Graduate School for Computing in Medicine and Life Sciences, University of Luebeck, Luebeck, Germany	task-sound-event-detection-in-synthetic-audio-results#Phan2016	0.5901	64.8
	Pikrakis_task2_1	Aggelos Pikrakis	Department of Informatics, University of Piraeus, Piraeus, Greece	task-sound-event-detection-in-synthetic-audio-results#Pikrakis2016	0.7499	37.4
	Vu_task2_1	Toan H. Vu	Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan	task-sound-event-detection-in-synthetic-audio-results#Vu2016	0.8979	52.8

Complete results and technical reports can be found at Task 2 result page

Baseline system

A baseline system for the task is provided. The system is meant to implement a basic approach for detecting overlapping acoustic events, and provide some comparison point for the participants while developing their systems.

The baseline system is based on supervised non-negative matrix factorization (NMF), and uses a dictionary of spectral templates for performing detection, which is extracted during the training phase. The output of the NMF system is a non-binary matrix denoting event activation, which is post-processed into a list of detected events.

The baseline system provides also reference implementation of the evaluation metrics (provided by Toni Heittola). The baseline system is provided for Matlab.

Matlab implementation

DCASE2016 Task 2 Matlab baseline, repository
version 1.0.2 (.zip)

Baseline results for development set

System parameters

Input: variable-Q transform spectrogram (60 bins/octave, 10ms step)
NMF with beta-divergence (30 iterations, beta=0.6, activation threshold=1.0)
Postprocessing: 90ms median filter span, up to 5 concurrent events, 60ms minimum event duration

Sound event detection results.
Segment-based overall metrics		Event-based overall metrics
ER	F-score	F-score (onset-only)
0.7859	41.6 %	30.3 %

Citation

When citing challenge task and results please cite the following papers:

Publication

A. Mesaros, T. Heittola, E. Benetos, P. Foster, M. Lagrange, T. Virtanen, and M. D. Plumbley. Detection and classification of acoustic scenes and events: outcome of the DCASE 2016 challenge. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(2):379–393, Feb 2018. doi:10.1109/TASLP.2017.2778423.

PDF

Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge

Abstract

Public evaluation campaigns and datasets promote active development in target research areas, allowing direct comparison of algorithms. The second edition of the challenge on detection and classification of acoustic scenes and events (DCASE 2016) has offered such an opportunity for development of the state-of-the-art methods, and succeeded in drawing together a large number of participants from academic and industrial backgrounds. In this paper, we report on the tasks and outcomes of the DCASE 2016 challenge. The challenge comprised four tasks: acoustic scene classification, sound event detection in synthetic audio, sound event detection in real-life audio, and domestic audio tagging. We present each task in detail and analyze the submitted systems in terms of design and performance. We observe the emergence of deep learning as the most popular classification method, replacing the traditional approaches based on Gaussian mixture models and support vector machines. By contrast, feature representations have not changed substantially throughout the years, as mel frequency-based representations predominate in all tasks. The datasets created for and used in DCASE 2016 are publicly available and are a valuable resource for further research.

Keywords

Acoustics;Event detection;Hidden Markov models;Speech;Speech processing;Tagging;Acoustic scene classification;audio datasets;pattern recognition;sound event detection

PDF

Publication

G. Lafay, E. Benetos, and M. Lagrange. Sound event detection in synthetic audio: analysis of the DCASE 2016 task results. In 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), volume, 11–15. Oct 2017. doi:10.1109/WASPAA.2017.8169985.

PDF

Sound event detection in synthetic audio: Analysis of the DCASE 2016 task results

Abstract

As part of the 2016 public evaluation challenge on Detection and Classification of Acoustic Scenes and Events (DCASE 2016), the second task focused on evaluating sound event detection systems using synthetic mixtures of office sounds. This task, which follows the `Event Detection-Office Synthetic' task of DCASE 2013, studies the behaviour of tested algorithms when facing controlled levels of audio complexity with respect to background noise and polyphony/density, with the added benefit of a very accurate ground truth. This paper presents the task formulation, evaluation metrics, submitted systems, and provides a statistical analysis of the results achieved, with respect to various aspects of the evaluation dataset.

Keywords

acoustic signal detection;acoustic signal processing;audio signal processing;signal classification;statistical analysis;synthetic audio;dcase 2016 task results;2016 public evaluation challenge;Acoustic Scenes;sound event detection systems;synthetic mixtures;office sounds;Event Detection-Office Synthetic task;DCASE 2013;audio complexity;background noise;polyphony/density;task formulation;evaluation metrics;evaluation dataset;submitted systems;statistical analysis;Acoustics;Analysis of variance;Event detection;Image analysis;Measurement;Training;Sound event detection;experimental validation;DCASE;acoustic scene analysis;sound scene analysis

PDF

	Emmanouil Benetos Queen Mary University of London
	Mathieu Lagrange IRCCYN
	Grégoire Lafay IRCCYN

Coordinators

Content

Description

Audio dataset

Recording and annotation procedure

Challenge setup

Download

Submission

Task rules

Evaluation

Results

Baseline system

Matlab implementation

Baseline results for development set

Citation

Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge

Abstract

Keywords

Sound event detection in synthetic audio: Analysis of the DCASE 2016 task results

Abstract

Keywords