Challenge has ended. Full results for this task can be found here
This page collects information from original DCASE2013 Challenge website to document DCASE challenge tasks in an uniform way.

Description

The scene classification (SC) challenge will address the problem of identifying and classifying acoustic scenes and soundscapes.

The dataset for the scene classification task will consist of 30sec recordings of various acoustic scenes. The dataset will consist of 2 parts each made up of 6 audio recordings for each scene (class). The one will be sent out to the participants as a development set and the second will be kept secret and used for the train/test scene classification task. The list of scenes is: busy street, quiet street, Park, open-air market, bus, subway-train, restaurant, shop/supermarket, office, subway station.

The recording device used for the task is a set of Soundman binaural microphones specifically made so that they imitate a pair of in-ear headphones that the user can wear. The proposed specifications for the recordings are: PCM, 44100 Hz, 16 bit (CD quality).

Figure 1: Overview of acoustic scene classification system.

Audio dataset

The data consists of 30-second audio files (WAV, stereo, 44.1 kHz, 16-bit), recorded using binaural headphones in locations around London at various times in 2012, by three different people. Locations were selected to represent instances of the following 10 classes:

bus
busystreet
office
openairmarket
park
quietstreet
restaurant
supermarket
tube
tubestation

The train/test (private) dataset consists of 10 recordings of each class, making 100 recordings total. This is similar to the publicly-released development dataset.

Download

** Development dataset (public)**

IEEE AASP CASA Challenge - Public Dataset for Scene Classification Task (345MB)

** Train/Test dataset (private)**

IEEE AASP CASA Challenge - Private Dataset for Scene Classification Task (354MB)

In publications using the datasets, cite as:

Publication

D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley. Detection and classification of acoustic scenes and events. IEEE Transactions on Multimedia, 17(10):1733–1746, Oct 2015. doi:10.1109/TMM.2015.2428998.

PDF

Detection and Classification of Acoustic Scenes and Events

Abstract

For intelligent systems to make best use of the audio modality, it is important that they can recognize not just speech and music, which have been researched as specific tasks, but also general sounds in everyday environments. To stimulate research in this field we conducted a public research challenge: the IEEE Audio and Acoustic Signal Processing Technical Committee challenge on Detection and Classification of Acoustic Scenes and Events (DCASE). In this paper, we report on the state of the art in automatically classifying audio scenes, and automatically detecting and classifying audio events. We survey prior work as well as the state of the art represented by the submissions to the challenge from various research groups. We also provide detail on the organization of the challenge, so that our experience as challenge hosts may be useful to those organizing challenges in similar domains. We created new audio datasets and baseline systems for the challenge; these, as well as some submitted systems, are publicly available under open licenses, to serve as benchmarks for further research in general-purpose machine listening.

Keywords

acoustic signal processing;knowledge based systems;speech recognition;acoustic scenes detection;acoustic scenes classification;intelligent systems;audio modality;speech recognition;music;IEEE Audio and Acoustic Signal Processing Technical Committee;DCASE;Event detection;Speech;Speech recognition;Music;Microphones;Licenses;Audio databases;event detection;machine intelligence;pattern recognition

PDF

Submission

The challenge participants submit an executable which accepts training list file and testing list file as a command-line parameter and outputs classification results to specified file.

Submission calling formats

Executables must accept command-line parameters which specify:

A path to a training list file
A path to a test list file
A path to specify where the classification output file will be written
A path to a scratch folder which the executable can optionally use to write temporary data

Executables must NOT write data anywhere except the classification output file and the scratch folder.

A typical entry-point for your submission could be for us to run a command such as one of these:

TrainAndClassify.sh /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt  /path/to/outputListFile.txt

python smacpy.py -q --trainlist /path/to/trainListFile.txt --testlist /path/to/testListFile.txt --outlist /path/to/outputListFile.txt

Input and output file formats

The audio files to be used in these tasks will be specified in simple ASCII list files. The formats for the list files are specified below:

Training list file

The list file passed for model training will be a simple ASCII list file. This file will contain one path per line, followed by a tab character and the class label, again with no header line. I.e.

<example path and filename>\t<class label>

E.g.

/path/to/track1.wav tubestation
/path/to/track2.wav park
...

Test (classification) list file

The list file passed for testing classification will be a simple ASCII list file with one path per line with no header line, and no class label. I.e.

<example path and filename>

E.g.

/path/to/track1.wav
/path/to/track2.wav
...

Classification output file

Participating algorithms should produce a simple ASCII list file identical in format to the Training list file. This file will contain one path per line, followed by a tab character and the scene label, again with no header line. I.e.

<example path and filename>\t<class label>

E.g.

/path/to/track1.wav tubestation
/path/to/track2.wav park
...

There should be no additional tab characters anywhere, and there should be no whitespace added after the label, just the newline.

Packaging submissions

For Python/R/C/C++/etc submissions, please ensure that the submission can run on the Linux disk image we provide, WITHOUT any additional configuration. You may have modified the virtual machine after downloading it, but we will not be using your modified disk image - we will be running your submission on the standard disk image. This means:

if you have used additional Python/R script libraries, they must be included in your submission bundle, and your script should be able to use them without installing them systemwide.
if you have used any additional C/C++ libraries, they must be statically-linked to your executable.

For Matlab submissions, ensure that the submission can run with the toolboxes and system that the organisers have specified. If you need any particular toolboxes or configuration please contact the organisers as soon as you can. Please aim to make MATLAB submissions compatible across multiple OS (usual problems exist in the file/path separators). All Matlab submissions should be written in the form of a function, e.g. eventdetection(input,output); which can allow calling the script from the command line very easily.

Please provide some console output, which can provide a sanity check to the challenge team when running the code. This can be of the form of simply writing out a line corresponding to different stages of your algorithm. All submissions should include a README file including the following the information:

Command line calling format for all executables including examples
Number of threads/cores used or whether this should be specified on the command line
Expected memory footprint
Expected runtime
Approximately how much scratch disk space will the submission need to store any feature/cache files?
Any special notice regarding to running your algorithm

Note that the information that you place in the README file is extremely important in ensuring that your submission is evaluated properly.

Time and Hardware limits

Due to the potentially high resource requirements across all participants, hard limits on the runtime of submissions will be imposed. A hard limit of 48 hours will be imposed for each submission

Evaluation

Participating algorithms will be evaluated with 5-fold stratified cross validation.

The raw classification (identification) accuracy, standard deviation and a confusion matrix for each algorithm will be computed.

In addition computation times of each participating algorithm will be measured.

Matlab implementation of metrics:

IEEE AASP D-CASE Challenge Metrics
(.git)

Results

Rank	Submission Information
Rank	Code	Author	Affiliation	Technical Report	Accuracy with 95% confidence interval
	DCASE2013 baseline	Dan Stowell	Centre for Digital Music, Queen Mary University of London, London, UK	task-acoustic-scene-classification-results#Stowell2013	55.0 (45.2 - 64.8)
	CHR_1	May Chum	Electrical Engineering Department, The Cooper Union, New York, USA	task-acoustic-scene-classification-results#Chum2013	63.0 (53.5 - 72.5)
	CHR_2	May Chum	Electrical Engineering Department, The Cooper Union, New York, USA	task-acoustic-scene-classification-results#Chum2013	65.0 (55.7 - 74.3)
	ELF	Benjamin Elizalde	International Computer Science Institute, Berkeley, USA	task-acoustic-scene-classification-results#Elizalde2013	55.0 (45.2 - 64.8)
	GSR	Jürgen T. Geiger	Institute for Human-Machine Communication, Technische Universität München, München, Germany	task-acoustic-scene-classification-results#Geiger2013	69.0 (59.9 - 78.1)
	KH	Johannes D. Krijnders	INCAS3, Assen, Netherlands	task-acoustic-scene-classification-results#Krijnders2013	55.0 (45.2 - 64.8)
	LTT_1	David Li	Cooper Union, New York, USA	task-acoustic-scene-classification-results#Li2013	72.0 (63.2 - 80.8)
	LTT_2	David Li	Cooper Union, New York, USA	task-acoustic-scene-classification-results#Li2013	70.0 (61.0 - 79.0)
	LTT_3	David Li	Cooper Union, New York, USA	task-acoustic-scene-classification-results#Li2013	67.0 (57.8 - 76.2)
	NHL	Juhan Nam	Stanford University, Stanford, USA	task-acoustic-scene-classification-results#Nam2013	60.0 (50.4 - 69.6)
	NR1_1	Waldo Nogueira	Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain	task-acoustic-scene-classification-results#Nogueira2013	60.0 (50.4 - 69.6)
	NR1_2	Waldo Nogueira	Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain	task-acoustic-scene-classification-results#Nogueira2013	60.0 (50.4 - 69.6)
	NR1_3	Waldo Nogueira	Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain	task-acoustic-scene-classification-results#Nogueira2013	59.0 (49.4 - 68.6)
	OE	Emanuele Olivetti	NeuroInformatics Laboratory, Bruno Kessler Foundation, Trento, Italy; Center for Mind and Brain Sciences, Trento, Italy	task-acoustic-scene-classification-results#Olivetti2013	14.0 (7.2 - 20.8)
	PE	Kailash Patil	Center for Language and Speech Processing, Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA	task-acoustic-scene-classification-results#Patil2013	58.0 (48.3 - 67.7)
	RG	Alain Rakotomamonjy	Center for Language and Speech Processing, Department of Electrical and Computer Engineering, Normandie Universite, Rouen, France	task-acoustic-scene-classification-results#Rakotomamonjy2013	69.0 (59.9 - 78.1)
	RNH_1	Gerard Roma	Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain	task-acoustic-scene-classification-results#Roma2013	71.0 (62.1 - 79.9)
	RNH_2	Gerard Roma	Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain	task-acoustic-scene-classification-results#Roma2013	76.0 (67.6 - 84.4)

Complete results and technical reports can be found at Task 1 result page

Baseline system

The baseline system for the task is provided. System is based on MFCC+GMM approach and and bag-of-frames model. The system is described in detail in Stowell et al. :

Publication

D. Giannoulis, D. Stowell, E. Benetos, M. Rossignol, M. Lagrange, and M. D. Plumbley. A database and challenge for acoustic scene classification and event detection. In 21st European Signal Processing Conference (EUSIPCO 2013), volume, 1–5. Sep. 2013. doi:.

PDF

A database and challenge for acoustic scene classification and event detection

Abstract

An increasing number of researchers work in computational auditory scene analysis (CASA). However, a set of tasks, each with a well-defined evaluation framework and commonly used datasets do not yet exist. Thus, it is difficult for results and algorithms to be compared fairly, which hinders research on the field. In this paper we will introduce a newly-launched public evaluation challenge dealing with two closely related tasks of the field: acoustic scene classification and event detection. We give an overview of the tasks involved; describe the processes of creating the dataset; and define the evaluation metrics. Finally, illustrations on results for both tasks using baseline methods applied on this dataset are presented, accompanied by open-source code.

Keywords

acoustic signal processing;feature extraction;Gaussian processes;mixture models;signal classification;computational auditory scene analysis;CASA;public evaluation challenge;acoustic scene classification;event detection;dataset creation;evaluation metrics;baseline methods;open-source code;Event detection;Measurement;Music;Speech;Educational institutions;Hidden Markov models;Computational auditory scene analysis;acoustic scene classification;acoustic event detection

PDF

Python implementation

DCASE2013 Task1 Baseline, repository
(.git)

Citation

If you are using the dataset or baseline code please cite the following paper:

Publication

PDF

A database and challenge for acoustic scene classification and event detection

Abstract

Keywords

PDF

When citing challenge task and results please cite the following paper:

Publication

PDF

Detection and Classification of Acoustic Scenes and Events

Abstract

Keywords

PDF

	Dimitrios Giannoulis Queen Mary University of London
	Dan Stowell Queen Mary University of London
	Emmanouil Benetos Queen Mary University of London
	Mathieu Lagrange IRCCYN
	Mathias Rossignol IRCAM
	Mark D. Plumbley Queen Mary University of London

Acoustic
scene classification

Coordinators

Description

Audio dataset

Download

Detection and Classification of Acoustic Scenes and Events

Abstract

Keywords

Submission

Submission calling formats

Input and output file formats

Training list file

Test (classification) list file

Classification output file

Packaging submissions

Time and Hardware limits

Evaluation

Results

Baseline system

A database and challenge for acoustic scene classification and event detection

Abstract

Keywords

Python implementation

Citation

A database and challenge for acoustic scene classification and event detection

Abstract

Keywords

Detection and Classification of Acoustic Scenes and Events

Abstract

Keywords

Coordinators

Content

Description

Audio dataset

Download

Detection and Classification of Acoustic Scenes and Events

Abstract

Keywords

Submission

Submission calling formats

Input and output file formats

Training list file

Test (classification) list file

Classification output file

Packaging submissions

Time and Hardware limits

Evaluation

Results

Baseline system

A database and challenge for acoustic scene classification and event detection

Abstract

Keywords

Python implementation

Citation

A database and challenge for acoustic scene classification and event detection

Abstract

Keywords

Detection and Classification of Acoustic Scenes and Events

Abstract

Keywords