Acoustic
scene classification


Task description

Challenge has ended. Full results for this task can be found here
This page collects information from original DCASE2013 Challenge website to document DCASE challenge tasks in an uniform way.

Description

The scene classification (SC) challenge will address the problem of identifying and classifying acoustic scenes and soundscapes.

The dataset for the scene classification task will consist of 30sec recordings of various acoustic scenes. The dataset will consist of 2 parts each made up of 6 audio recordings for each scene (class). The one will be sent out to the participants as a development set and the second will be kept secret and used for the train/test scene classification task. The list of scenes is: busy street, quiet street, Park, open-air market, bus, subway-train, restaurant, shop/supermarket, office, subway station.

The recording device used for the task is a set of Soundman binaural microphones specifically made so that they imitate a pair of in-ear headphones that the user can wear. The proposed specifications for the recordings are: PCM, 44100 Hz, 16 bit (CD quality).

Figure 1: Overview of acoustic scene classification system.

Audio dataset

The data consists of 30-second audio files (WAV, stereo, 44.1 kHz, 16-bit), recorded using binaural headphones in locations around London at various times in 2012, by three different people. Locations were selected to represent instances of the following 10 classes:

  • bus
  • busystreet
  • office
  • openairmarket
  • park
  • quietstreet
  • restaurant
  • supermarket
  • tube
  • tubestation

The train/test (private) dataset consists of 10 recordings of each class, making 100 recordings total. This is similar to the publicly-released development dataset.

Download

** Development dataset (public)**


** Train/Test dataset (private)**


In publications using the datasets, cite as:

Publication

D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley. Detection and classification of acoustic scenes and events. IEEE Transactions on Multimedia, 17(10):1733–1746, Oct 2015. doi:10.1109/TMM.2015.2428998.

PDF

Detection and Classification of Acoustic Scenes and Events

Abstract

For intelligent systems to make best use of the audio modality, it is important that they can recognize not just speech and music, which have been researched as specific tasks, but also general sounds in everyday environments. To stimulate research in this field we conducted a public research challenge: the IEEE Audio and Acoustic Signal Processing Technical Committee challenge on Detection and Classification of Acoustic Scenes and Events (DCASE). In this paper, we report on the state of the art in automatically classifying audio scenes, and automatically detecting and classifying audio events. We survey prior work as well as the state of the art represented by the submissions to the challenge from various research groups. We also provide detail on the organization of the challenge, so that our experience as challenge hosts may be useful to those organizing challenges in similar domains. We created new audio datasets and baseline systems for the challenge; these, as well as some submitted systems, are publicly available under open licenses, to serve as benchmarks for further research in general-purpose machine listening.

Keywords

acoustic signal processing;knowledge based systems;speech recognition;acoustic scenes detection;acoustic scenes classification;intelligent systems;audio modality;speech recognition;music;IEEE Audio and Acoustic Signal Processing Technical Committee;DCASE;Event detection;Speech;Speech recognition;Music;Microphones;Licenses;Audio databases;event detection;machine intelligence;pattern recognition

PDF

Submission

The challenge participants submit an executable which accepts training list file and testing list file as a command-line parameter and outputs classification results to specified file.

Submission calling formats

Executables must accept command-line parameters which specify:

  • A path to a training list file
  • A path to a test list file
  • A path to specify where the classification output file will be written
  • A path to a scratch folder which the executable can optionally use to write temporary data

Executables must NOT write data anywhere except the classification output file and the scratch folder.

A typical entry-point for your submission could be for us to run a command such as one of these:

TrainAndClassify.sh /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt  /path/to/outputListFile.txt

python smacpy.py -q --trainlist /path/to/trainListFile.txt --testlist /path/to/testListFile.txt --outlist /path/to/outputListFile.txt

Input and output file formats

The audio files to be used in these tasks will be specified in simple ASCII list files. The formats for the list files are specified below:

Training list file

The list file passed for model training will be a simple ASCII list file. This file will contain one path per line, followed by a tab character and the class label, again with no header line. I.e.

<example path and filename>\t<class label>

E.g.

/path/to/track1.wav tubestation
/path/to/track2.wav park
...

Test (classification) list file

The list file passed for testing classification will be a simple ASCII list file with one path per line with no header line, and no class label. I.e.

<example path and filename>

E.g.

/path/to/track1.wav
/path/to/track2.wav
...

Classification output file

Participating algorithms should produce a simple ASCII list file identical in format to the Training list file. This file will contain one path per line, followed by a tab character and the scene label, again with no header line. I.e.

<example path and filename>\t<class label>

E.g.

/path/to/track1.wav tubestation
/path/to/track2.wav park
...

There should be no additional tab characters anywhere, and there should be no whitespace added after the label, just the newline.

Packaging submissions

For Python/R/C/C++/etc submissions, please ensure that the submission can run on the Linux disk image we provide, WITHOUT any additional configuration. You may have modified the virtual machine after downloading it, but we will not be using your modified disk image - we will be running your submission on the standard disk image. This means:

  • if you have used additional Python/R script libraries, they must be included in your submission bundle, and your script should be able to use them without installing them systemwide.
  • if you have used any additional C/C++ libraries, they must be statically-linked to your executable.

For Matlab submissions, ensure that the submission can run with the toolboxes and system that the organisers have specified. If you need any particular toolboxes or configuration please contact the organisers as soon as you can. Please aim to make MATLAB submissions compatible across multiple OS (usual problems exist in the file/path separators). All Matlab submissions should be written in the form of a function, e.g. eventdetection(input,output); which can allow calling the script from the command line very easily.

Please provide some console output, which can provide a sanity check to the challenge team when running the code. This can be of the form of simply writing out a line corresponding to different stages of your algorithm. All submissions should include a README file including the following the information:

  • Command line calling format for all executables including examples
  • Number of threads/cores used or whether this should be specified on the command line
  • Expected memory footprint
  • Expected runtime
  • Approximately how much scratch disk space will the submission need to store any feature/cache files?
  • Any special notice regarding to running your algorithm

Note that the information that you place in the README file is extremely important in ensuring that your submission is evaluated properly.

Time and Hardware limits

Due to the potentially high resource requirements across all participants, hard limits on the runtime of submissions will be imposed. A hard limit of 48 hours will be imposed for each submission

Evaluation

Participating algorithms will be evaluated with 5-fold stratified cross validation.

The raw classification (identification) accuracy, standard deviation and a confusion matrix for each algorithm will be computed.

In addition computation times of each participating algorithm will be measured.

Matlab implementation of metrics:


Results

Rank Submission Information
Code Author Affiliation Technical
Report
Accuracy
with 95%
confidence interval
DCASE2013 baseline Dan Stowell Centre for Digital Music, Queen Mary University of London, London, UK task-acoustic-scene-classification-results#Stowell2013 55.0 (45.2 - 64.8)
CHR_1 May Chum Electrical Engineering Department, The Cooper Union, New York, USA task-acoustic-scene-classification-results#Chum2013 63.0 (53.5 - 72.5)
CHR_2 May Chum Electrical Engineering Department, The Cooper Union, New York, USA task-acoustic-scene-classification-results#Chum2013 65.0 (55.7 - 74.3)
ELF Benjamin Elizalde International Computer Science Institute, Berkeley, USA task-acoustic-scene-classification-results#Elizalde2013 55.0 (45.2 - 64.8)
GSR Jürgen T. Geiger Institute for Human-Machine Communication, Technische Universität München, München, Germany task-acoustic-scene-classification-results#Geiger2013 69.0 (59.9 - 78.1)
KH Johannes D. Krijnders INCAS3, Assen, Netherlands task-acoustic-scene-classification-results#Krijnders2013 55.0 (45.2 - 64.8)
LTT_1 David Li Cooper Union, New York, USA task-acoustic-scene-classification-results#Li2013 72.0 (63.2 - 80.8)
LTT_2 David Li Cooper Union, New York, USA task-acoustic-scene-classification-results#Li2013 70.0 (61.0 - 79.0)
LTT_3 David Li Cooper Union, New York, USA task-acoustic-scene-classification-results#Li2013 67.0 (57.8 - 76.2)
NHL Juhan Nam Stanford University, Stanford, USA task-acoustic-scene-classification-results#Nam2013 60.0 (50.4 - 69.6)
NR1_1 Waldo Nogueira Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain task-acoustic-scene-classification-results#Nogueira2013 60.0 (50.4 - 69.6)
NR1_2 Waldo Nogueira Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain task-acoustic-scene-classification-results#Nogueira2013 60.0 (50.4 - 69.6)
NR1_3 Waldo Nogueira Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain task-acoustic-scene-classification-results#Nogueira2013 59.0 (49.4 - 68.6)
OE Emanuele Olivetti NeuroInformatics Laboratory, Bruno Kessler Foundation, Trento, Italy; Center for Mind and Brain Sciences, Trento, Italy task-acoustic-scene-classification-results#Olivetti2013 14.0 (7.2 - 20.8)
PE Kailash Patil Center for Language and Speech Processing, Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA task-acoustic-scene-classification-results#Patil2013 58.0 (48.3 - 67.7)
RG Alain Rakotomamonjy Center for Language and Speech Processing, Department of Electrical and Computer Engineering, Normandie Universite, Rouen, France task-acoustic-scene-classification-results#Rakotomamonjy2013 69.0 (59.9 - 78.1)
RNH_1 Gerard Roma Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain task-acoustic-scene-classification-results#Roma2013 71.0 (62.1 - 79.9)
RNH_2 Gerard Roma Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain task-acoustic-scene-classification-results#Roma2013 76.0 (67.6 - 84.4)


Complete results and technical reports can be found at Task 1 result page

Baseline system

The baseline system for the task is provided. System is based on MFCC+GMM approach and and bag-of-frames model. The system is described in detail in Stowell et al. :

Publication

D. Giannoulis, D. Stowell, E. Benetos, M. Rossignol, M. Lagrange, and M. D. Plumbley. A database and challenge for acoustic scene classification and event detection. In 21st European Signal Processing Conference (EUSIPCO 2013), volume, 1–5. Sep. 2013. doi:.

PDF

A database and challenge for acoustic scene classification and event detection

Abstract

An increasing number of researchers work in computational auditory scene analysis (CASA). However, a set of tasks, each with a well-defined evaluation framework and commonly used datasets do not yet exist. Thus, it is difficult for results and algorithms to be compared fairly, which hinders research on the field. In this paper we will introduce a newly-launched public evaluation challenge dealing with two closely related tasks of the field: acoustic scene classification and event detection. We give an overview of the tasks involved; describe the processes of creating the dataset; and define the evaluation metrics. Finally, illustrations on results for both tasks using baseline methods applied on this dataset are presented, accompanied by open-source code.

Keywords

acoustic signal processing;feature extraction;Gaussian processes;mixture models;signal classification;computational auditory scene analysis;CASA;public evaluation challenge;acoustic scene classification;event detection;dataset creation;evaluation metrics;baseline methods;open-source code;Event detection;Measurement;Music;Speech;Educational institutions;Hidden Markov models;Computational auditory scene analysis;acoustic scene classification;acoustic event detection

PDF

Python implementation


Citation

If you are using the dataset or baseline code please cite the following paper:

Publication

D. Giannoulis, D. Stowell, E. Benetos, M. Rossignol, M. Lagrange, and M. D. Plumbley. A database and challenge for acoustic scene classification and event detection. In 21st European Signal Processing Conference (EUSIPCO 2013), volume, 1–5. Sep. 2013. doi:.

PDF

A database and challenge for acoustic scene classification and event detection

Abstract

An increasing number of researchers work in computational auditory scene analysis (CASA). However, a set of tasks, each with a well-defined evaluation framework and commonly used datasets do not yet exist. Thus, it is difficult for results and algorithms to be compared fairly, which hinders research on the field. In this paper we will introduce a newly-launched public evaluation challenge dealing with two closely related tasks of the field: acoustic scene classification and event detection. We give an overview of the tasks involved; describe the processes of creating the dataset; and define the evaluation metrics. Finally, illustrations on results for both tasks using baseline methods applied on this dataset are presented, accompanied by open-source code.

Keywords

acoustic signal processing;feature extraction;Gaussian processes;mixture models;signal classification;computational auditory scene analysis;CASA;public evaluation challenge;acoustic scene classification;event detection;dataset creation;evaluation metrics;baseline methods;open-source code;Event detection;Measurement;Music;Speech;Educational institutions;Hidden Markov models;Computational auditory scene analysis;acoustic scene classification;acoustic event detection

PDF


When citing challenge task and results please cite the following paper:

Publication

D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley. Detection and classification of acoustic scenes and events. IEEE Transactions on Multimedia, 17(10):1733–1746, Oct 2015. doi:10.1109/TMM.2015.2428998.

PDF

Detection and Classification of Acoustic Scenes and Events

Abstract

For intelligent systems to make best use of the audio modality, it is important that they can recognize not just speech and music, which have been researched as specific tasks, but also general sounds in everyday environments. To stimulate research in this field we conducted a public research challenge: the IEEE Audio and Acoustic Signal Processing Technical Committee challenge on Detection and Classification of Acoustic Scenes and Events (DCASE). In this paper, we report on the state of the art in automatically classifying audio scenes, and automatically detecting and classifying audio events. We survey prior work as well as the state of the art represented by the submissions to the challenge from various research groups. We also provide detail on the organization of the challenge, so that our experience as challenge hosts may be useful to those organizing challenges in similar domains. We created new audio datasets and baseline systems for the challenge; these, as well as some submitted systems, are publicly available under open licenses, to serve as benchmarks for further research in general-purpose machine listening.

Keywords

acoustic signal processing;knowledge based systems;speech recognition;acoustic scenes detection;acoustic scenes classification;intelligent systems;audio modality;speech recognition;music;IEEE Audio and Acoustic Signal Processing Technical Committee;DCASE;Event detection;Speech;Speech recognition;Music;Microphones;Licenses;Audio databases;event detection;machine intelligence;pattern recognition

PDF