Challenge has ended. Full results for this task can be found here
This page collects information from original DCASE2013 Challenge website to document DCASE challenge tasks in an uniform way.

Description

The event detection challenge will address the problem of identifying individual sound events that are prominent in an acoustic scene. Two distinct experiments will take, one for simple acoustic scenes without overlapping sounds and the other using complex scenes in a polyphonic scenario. Three datasets will be used for the task.

Figure 1: Overview of sound event detection system.

Task setup

Subtask OL - Office live

The first dataset for event detection will consist of 3 subsets (for development, training, and testing). The training set will contain instantiations of individual events for every class. The development and testing datasets, denoted as office live (OL), will consist of 1 min recordings of every-day audio events in a number of office environments.

The test data consists of 11 stereo recordings (WAV, 44.1 kHz, 24-bit), lasting between 1 and 3 minues, of scripted sequences containing non-overlapping acoustic events in an office environment. Recordings were made using a Soundfield microphone system, model SPS422B. The test dataset contains events from 16 different classes, which are as follows:

alarm (short alert (beep) sound)
clearthroat (clearing throat)
cough
doorslam (door slam)
drawer
keyboard (keyboard clicks)
keys (keys put on table)
knock (door knock)
laughter
mouse (mouse click)
pageturn (page turning)
pendrop (pen, pencil, or marker touching table surfaces)
phone
printer
speech
switch

Submitted event detection systems can be tuned and trained using the publicly released training and development datasets.

Datasets

Isolated events:

IEEE AASP CASA Challenge - Training Dataset for Event Detection Task (subtasks OL, OS) (678MB)

Event sequences:

IEEE AASP CASA Challenge - Development Dataset for Event Detection Task (subtask OL) (197MB)

Event sequences:

IEEE AASP CASA Challenge - Testing Dataset for Event Detection Task (subtask OL) (328MB)

Subtask OS - Office synthetic

The second dataset will contain artificially sequenced sounds provided by the Analysis-Synthesis team of IRCAM, termed Office Synthetic (OS). The training set will be identical to the one for the first dataset. The development and testing sets will consist of artificial scenes built by sequencing recordings of individual events (different recordings from the ones used for the training dataset) and background recordings provided by C4DM.

The test data consists of mono recordings (WAV, 44.1 kHz) of sequences containing artificially concatenating overlapping acoustic events in an office environment. Original recordings of isolated acoustic events were made using a Soundfield microphone system, model SPS422B. The dataset contains various SNRs of events over background noise (+6, 0, and -6 dB) and different levels of "density" of events (low, medium, and high). The distribution of events in the scene is random, following high-level directives that specify the desired density of events. The average SNR of events over the background noise is also specified upon synthesis and, unlike in the natural scenes, is the same for all event types. The synthesized scenes are mixed down to mono in order to avoid having spatialization inconsistencies between successive occurrences of a same event. The test dataset contains events from 16 different classes, which are as follows:

alarm (short alert (beep) sound)
clearthroat (clearing throat)
cough
doorslam (door slam)
drawer
keyboard (keyboard clicks)
keys (keys put on table)
knock (door knock)
laughter
mouse (mouse click)
pageturn (page turning)
pendrop (pen, pencil, or marker touching table surfaces)
phone
printer
speech
switch

Submitted event detection systems can be tuned and trained using the publicly released training and development datasets.

Datasets

Isolated events:

IEEE AASP CASA Challenge - Training Dataset for Event Detection Task (subtasks OL, OS) (678MB)

Event sequences:

IEEE AASP CASA Challenge - Development Dataset for Event Detection Task (subtask OS) (54MB)

Event sequences:

IEEE AASP CASA Challenge - Testing Dataset for Event Detection Task (subtask OS) (83MB)

Submission

The challenge participants submit an executable for both subtasks.

Submission format

Command line calling format

Executables must accept command-line parameters which specify:

A path to an input .wav file.
A path to an output .txt file.

For example:

>./eventdetection /path/to/input.wav /path/to/output.txt

If parameters need to be set for the program, this can be done, provided the manner in which the parameters is set are well documented by the submitter. So if, for example, your program needs a specified frame rate, set by a -fr flag, an example calling format could be of the form:

>./eventdetection -fr 1024 /path/to/input.wav /path/to/output.txt

where the manner by which, as well as the desired parameters to be used are specified upon submission and in the corresponding readme file bundled with the submission of the algorithm. Programs can use their working directory if they need to keep temporary cache files or internal debugging info.

Output file

The output ASCII file should contain the onset, offset and the event ID separated by a tab, ordered in terms of onset times (onset/offset times in sec):

<onset1>\t<offset1>\t<EventID1>
<onset2>\t<offset2>\t<EventID2>
...

E.g.

1.387392290 3.262403627 pageturn
5.073560090 5.793378684 knock
...

There should be no additional tab characters anywhere, and there should be no whitespace added after the label, just the newline.

Packaging submissions

For Python/R/C/C++/etc submissions, please ensure that the submission can run on the Linux disk image we provide, WITHOUT any additional configuration. You may have modified the virtual machine after downloading it, but we will not be using your modified disk image - we will be running your submission on the standard disk image. This means:

if you have used additional Python/R script libraries, they must be included in your submission bundle, and your script should be able to use them without installing them systemwide.
if you have used any additional C/C++ libraries, they must be statically-linked to your executable.

For Matlab submissions, ensure that the submission can run with the toolboxes and system that the organisers have specified. If you need any particular toolboxes or configuration please contact the organisers as soon as you can. Please aim to make MATLAB submissions compatible across multiple OS (usual problems exist in the file/path separators). All Matlab submissions should be written in the form of a function, e.g. eventdetection(input,output); which can allow calling the script from the command line very easily. Please provide some console output, which can provide a sanity check to the challenge team when running the code. This can be of the form of simply writing out a line corresponding to different stages of your algorithm All submissions should include a README file including the following the information:

Command line calling format for all executables including examples
Number of threads/cores used or whether this should be specified on the command line
Expected memory footprint
Expected runtime
Approximately how much scratch disk space will the submission need to store any feature/cache files?
Any special notice regarding to running your algorithm

Note that the information that you place in the README file is extremely important in ensuring that your submission is evaluated properly.

Time and Hardware limits

Due to the potentially high resource requirements across all participants, hard limits on the runtime of submissions will be imposed. A hard limit of 48 hours will be imposed for each submission

Evaluation

Participating algorithms will be evaluated using frame-based, event-based, and class-wise event-based metrics. The computed metrics will consist of the AEER, precision, recall, and F-measure for the frame-based, event-based, and class-wise event-based evaluations. For the event-based evaluations, both onset-based and onset-offset-based metrics will be computed. In addition, computation times of each participating algorithm will be measured.

Frame-based evaluation is using a 10ms step and metrics are averaged over the duration of the recordings. Main metric is the acoustic event error rate (AEER) also used in the CLEAR evaluations:

\begin{equation*} AEER=\frac{D+I+S} {N} \end{equation*}

where \(N\) is the number of events to detect for the current frame, \(D\) is the number of deletions (missing events), \(I\) is the number of insertions (extra events), and \(S\) is the number of event substitutions. Substitutions is defined as \(S=\min(D,I)\).

More detailed description of used metrics see:

Publication

D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley. Detection and classification of acoustic scenes and events. IEEE Transactions on Multimedia, 17(10):1733–1746, Oct 2015. doi:10.1109/TMM.2015.2428998.

PDF

Detection and Classification of Acoustic Scenes and Events

Abstract

For intelligent systems to make best use of the audio modality, it is important that they can recognize not just speech and music, which have been researched as specific tasks, but also general sounds in everyday environments. To stimulate research in this field we conducted a public research challenge: the IEEE Audio and Acoustic Signal Processing Technical Committee challenge on Detection and Classification of Acoustic Scenes and Events (DCASE). In this paper, we report on the state of the art in automatically classifying audio scenes, and automatically detecting and classifying audio events. We survey prior work as well as the state of the art represented by the submissions to the challenge from various research groups. We also provide detail on the organization of the challenge, so that our experience as challenge hosts may be useful to those organizing challenges in similar domains. We created new audio datasets and baseline systems for the challenge; these, as well as some submitted systems, are publicly available under open licenses, to serve as benchmarks for further research in general-purpose machine listening.

Keywords

acoustic signal processing;knowledge based systems;speech recognition;acoustic scenes detection;acoustic scenes classification;intelligent systems;audio modality;speech recognition;music;IEEE Audio and Acoustic Signal Processing Technical Committee;DCASE;Event detection;Speech;Speech recognition;Music;Microphones;Licenses;Audio databases;event detection;machine intelligence;pattern recognition

PDF

Matlab implementation of metrics:

IEEE AASP D-CASE Challenge Metrics
(.git)

Results

Subtask OL

Rank	Submission Information				Frame-based metrics
Rank	Code	Author	Affiliation	Technical Report	AEER / Frame-based	F1 / Frame-based
	DCASE2013 baseline	Dimitrios Giannoulis	Centre for Digital Music, Queen Mary University of London, London, UK	task-sound-event-detection-results-ol#Giannoulis2013	2.5900	10.7
	CPS	Sameer Chauhan	Electrical Engineering, Cooper Union for the Advancement of Science and Art, New York, USA	task-sound-event-detection-results-ol#Chauhan2013	2.1160	3.8
	DHV	Aleksandr Diment	Tampere University of Technology, Tampere, Finland	task-sound-event-detection-results-ol#Diment2013	3.1280	26.0
	GVV	Jort F Gemmeke	ESAT-PSI, KU Leuven, Heverlee, Belgium	task-sound-event-detection-results-ol#Gemmeke2013	1.0840	31.9
	NR2	Waldo Nogueira	Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain	task-sound-event-detection-results-ol#Nogueira2013	1.8850	34.7
	NVM_1	Maria E. Niessen	AGT International, Darmstadt, Germany	task-sound-event-detection-results-ol#Niessen2013	1.1150	40.9
	NVM_2	Maria E. Niessen	AGT International, Darmstadt, Germany	task-sound-event-detection-results-ol#Niessen2013	1.1020	42.8
	NVM_3	Maria E. Niessen	AGT International, Darmstadt, Germany	task-sound-event-detection-results-ol#Niessen2013	1.2120	45.5
	NVM_4	Maria E. Niessen	AGT International, Darmstadt, Germany	task-sound-event-detection-results-ol#Niessen2013	1.3600	42.9
	SCS_1	Jens Schröder	Project Group Hearing, Speech and Audio Technology, Fraunhofer IDMT, Oldenburg, Germany	task-sound-event-detection-results-ol#Schroeder2013	1.1670	53.0
	SCS_2	Jens Schröder	Project Group Hearing, Speech and Audio Technology, Fraunhofer IDMT, Oldenburg, Germany	task-sound-event-detection-results-ol#Schroeder2013	1.0160	61.5
	VVK	Lode Vuegen	ESAT-PSI, KU Leuven, Heverlee, Belgium; Future Health Department, iMinds, Heverlee, Belgium; MOBILAB, TM Kempen, Geel, Belgium	task-sound-event-detection-results-ol#Vuegen2013	1.0010	43.4

Complete results and technical reports can be found at Subtask OL result page

Subtask OS

Rank	Submission Information				Frame-based metrics
Rank	Code	Author	Affiliation	Technical Report	AEER / Frame-based	F1 / Frame-based
	DCASE2013 baseline	Dimitrios Giannoulis	Centre for Digital Music, Queen Mary University of London, London, UK	task-sound-event-detection-results-os#Giannoulis2013	2.8040	12.8
	DHV	Aleksandr Diment	Tampere University of Technology, Tampere, Finland	task-sound-event-detection-results-os#Diment2013	7.9800	18.7
	GVV	Jort F Gemmeke	ESAT-PSI, KU Leuven, Heverlee, Belgium	task-sound-event-detection-results-os#Gemmeke2013	1.3180	21.3
	VVK	Lode Vuegen	ESAT-PSI, KU Leuven, Heverlee, Belgium; Future Health Department, iMinds, Heverlee, Belgium; MOBILAB, TM Kempen, Geel, Belgium	task-sound-event-detection-results-os#Vuegen2013	1.8880	13.5

Complete results and technical reports can be found at Subtask OS result page

Baseline system

Audio Event Detection baseline system using NMF (MATLAB).

This is an event detection system that you can train on a set of labelled audio files with isolated sound events of various sounds, and then it is able to detect and classify audio activity related to these events from an audio files containing a series of different events and background noise. It is designed with two main aims:

to provide a baseline against which to test more advanced systems;
to provide a simple code example of a system which people are free to build on.

It follows a training/testing framework. A dictionary of spectral basis vectors is learned using NMF on the training data. This dictionary is subsequently set fixed and used to obtain an activation matrix of unlabelled audio files from the development set using NMF decomposition. The activation vectors per class are summed together and thresholded to give the activity for different classes.

In publications using the baseline, cite as:

Publication

D. Giannoulis, D. Stowell, E. Benetos, M. Rossignol, M. Lagrange, and M. D. Plumbley. A database and challenge for acoustic scene classification and event detection. In 21st European Signal Processing Conference (EUSIPCO 2013), volume, 1–5. Sep. 2013. doi:.

PDF

A database and challenge for acoustic scene classification and event detection

Abstract

An increasing number of researchers work in computational auditory scene analysis (CASA). However, a set of tasks, each with a well-defined evaluation framework and commonly used datasets do not yet exist. Thus, it is difficult for results and algorithms to be compared fairly, which hinders research on the field. In this paper we will introduce a newly-launched public evaluation challenge dealing with two closely related tasks of the field: acoustic scene classification and event detection. We give an overview of the tasks involved; describe the processes of creating the dataset; and define the evaluation metrics. Finally, illustrations on results for both tasks using baseline methods applied on this dataset are presented, accompanied by open-source code.

Keywords

acoustic signal processing;feature extraction;Gaussian processes;mixture models;signal classification;computational auditory scene analysis;CASA;public evaluation challenge;acoustic scene classification;event detection;dataset creation;evaluation metrics;baseline methods;open-source code;Event detection;Measurement;Music;Speech;Educational institutions;Hidden Markov models;Computational auditory scene analysis;acoustic scene classification;acoustic event detection

PDF

Matlab implementation

DCASE2013 Task2 Baseline, repository
(.git)

Citation

If you are using the dataset or baseline code please cite the following paper:

Publication

PDF

A database and challenge for acoustic scene classification and event detection

Abstract

Keywords

PDF

When citing challenge task and results please cite the following paper:

Publication

PDF

Detection and Classification of Acoustic Scenes and Events

Abstract

Keywords

PDF

	Dimitrios Giannoulis Queen Mary University of London
	Dan Stowell Queen Mary University of London
	Emmanouil Benetos Queen Mary University of London
	Mathieu Lagrange IRCCYN
	Mathias Rossignol IRCAM
	Mark D. Plumbley Queen Mary University of London

Sound event detection

Coordinators

Description

Task setup

Subtask OL - Office live

Datasets

Subtask OS - Office synthetic

Datasets

Submission

Submission format

Command line calling format

Output file

Packaging submissions

Time and Hardware limits

Evaluation

Detection and Classification of Acoustic Scenes and Events

Abstract

Keywords

Results

Subtask OL

Subtask OS

Baseline system

A database and challenge for acoustic scene classification and event detection

Abstract

Keywords

Matlab implementation

Citation

A database and challenge for acoustic scene classification and event detection

Abstract

Keywords

Detection and Classification of Acoustic Scenes and Events

Abstract

Keywords

Coordinators

Content

Description

Task setup

Subtask OL - Office live

Datasets

Subtask OS - Office synthetic

Datasets

Submission

Submission format

Command line calling format

Output file

Packaging submissions

Time and Hardware limits

Evaluation

Detection and Classification of Acoustic Scenes and Events

Abstract

Keywords

Results

Subtask OL

Subtask OS

Baseline system

A database and challenge for acoustic scene classification and event detection

Abstract

Keywords

Matlab implementation

Citation

A database and challenge for acoustic scene classification and event detection

Abstract

Keywords

Detection and Classification of Acoustic Scenes and Events

Abstract

Keywords