Challenge has ended. Full results for this task can be found here
This page collects information from original DCASE2013 Challenge website to document DCASE challenge tasks in an uniform way.
Description
The scene classification (SC) challenge will address the problem of identifying and classifying acoustic scenes and soundscapes.
The dataset for the scene classification task will consist of 30sec recordings of various acoustic scenes. The dataset will consist of 2 parts each made up of 6 audio recordings for each scene (class). The one will be sent out to the participants as a development set and the second will be kept secret and used for the train/test scene classification task. The list of scenes is: busy street, quiet street, Park, open-air market, bus, subway-train, restaurant, shop/supermarket, office, subway station.
The recording device used for the task is a set of Soundman binaural microphones specifically made so that they imitate a pair of in-ear headphones that the user can wear. The proposed specifications for the recordings are: PCM, 44100 Hz, 16 bit (CD quality).
Audio dataset
The data consists of 30-second audio files (WAV, stereo, 44.1 kHz, 16-bit), recorded using binaural headphones in locations around London at various times in 2012, by three different people. Locations were selected to represent instances of the following 10 classes:
bus
busystreet
office
openairmarket
park
quietstreet
restaurant
supermarket
tube
tubestation
The train/test (private) dataset consists of 10 recordings of each class, making 100 recordings total. This is similar to the publicly-released development dataset.
Download
** Development dataset (public)**
** Train/Test dataset (private)**
In publications using the datasets, cite as:
D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley. Detection and classification of acoustic scenes and events. IEEE Transactions on Multimedia, 17(10):1733–1746, Oct 2015. doi:10.1109/TMM.2015.2428998.
Detection and Classification of Acoustic Scenes and Events
Abstract
For intelligent systems to make best use of the audio modality, it is important that they can recognize not just speech and music, which have been researched as specific tasks, but also general sounds in everyday environments. To stimulate research in this field we conducted a public research challenge: the IEEE Audio and Acoustic Signal Processing Technical Committee challenge on Detection and Classification of Acoustic Scenes and Events (DCASE). In this paper, we report on the state of the art in automatically classifying audio scenes, and automatically detecting and classifying audio events. We survey prior work as well as the state of the art represented by the submissions to the challenge from various research groups. We also provide detail on the organization of the challenge, so that our experience as challenge hosts may be useful to those organizing challenges in similar domains. We created new audio datasets and baseline systems for the challenge; these, as well as some submitted systems, are publicly available under open licenses, to serve as benchmarks for further research in general-purpose machine listening.
Keywords
acoustic signal processing;knowledge based systems;speech recognition;acoustic scenes detection;acoustic scenes classification;intelligent systems;audio modality;speech recognition;music;IEEE Audio and Acoustic Signal Processing Technical Committee;DCASE;Event detection;Speech;Speech recognition;Music;Microphones;Licenses;Audio databases;event detection;machine intelligence;pattern recognition
Submission
The challenge participants submit an executable which accepts training list file and testing list file as a command-line parameter and outputs classification results to specified file.
Submission calling formats
Executables must accept command-line parameters which specify:
- A path to a training list file
- A path to a test list file
- A path to specify where the classification output file will be written
- A path to a scratch folder which the executable can optionally use to write temporary data
Executables must NOT write data anywhere except the classification output file and the scratch folder.
A typical entry-point for your submission could be for us to run a command such as one of these:
TrainAndClassify.sh /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
python smacpy.py -q --trainlist /path/to/trainListFile.txt --testlist /path/to/testListFile.txt --outlist /path/to/outputListFile.txt
Input and output file formats
The audio files to be used in these tasks will be specified in simple ASCII list files. The formats for the list files are specified below:
Training list file
The list file passed for model training will be a simple ASCII list file. This file will contain one path per line, followed by a tab character and the class label, again with no header line. I.e.
<example path and filename>\t<class label>
E.g.
/path/to/track1.wav tubestation
/path/to/track2.wav park
...
Test (classification) list file
The list file passed for testing classification will be a simple ASCII list file with one path per line with no header line, and no class label. I.e.
<example path and filename>
E.g.
/path/to/track1.wav
/path/to/track2.wav
...
Classification output file
Participating algorithms should produce a simple ASCII list file identical in format to the Training list file. This file will contain one path per line, followed by a tab character and the scene label, again with no header line. I.e.
<example path and filename>\t<class label>
E.g.
/path/to/track1.wav tubestation
/path/to/track2.wav park
...
There should be no additional tab characters anywhere, and there should be no whitespace added after the label, just the newline.
Packaging submissions
For Python/R/C/C++/etc submissions, please ensure that the submission can run on the Linux disk image we provide, WITHOUT any additional configuration. You may have modified the virtual machine after downloading it, but we will not be using your modified disk image - we will be running your submission on the standard disk image. This means:
- if you have used additional Python/R script libraries, they must be included in your submission bundle, and your script should be able to use them without installing them systemwide.
- if you have used any additional C/C++ libraries, they must be statically-linked to your executable.
For Matlab submissions, ensure that the submission can run with the toolboxes and system that the organisers have specified. If you need any particular toolboxes or configuration please contact the organisers as soon as you can. Please aim to make MATLAB submissions compatible across multiple OS (usual problems exist in the file/path separators). All Matlab submissions should be written in the form of a function, e.g. eventdetection(input,output); which can allow calling the script from the command line very easily.
Please provide some console output, which can provide a sanity check to the challenge team when running the code. This can be of the form of simply writing out a line corresponding to different stages of your algorithm. All submissions should include a README file including the following the information:
- Command line calling format for all executables including examples
- Number of threads/cores used or whether this should be specified on the command line
- Expected memory footprint
- Expected runtime
- Approximately how much scratch disk space will the submission need to store any feature/cache files?
- Any special notice regarding to running your algorithm
Note that the information that you place in the README file is extremely important in ensuring that your submission is evaluated properly.
Time and Hardware limits
Due to the potentially high resource requirements across all participants, hard limits on the runtime of submissions will be imposed. A hard limit of 48 hours will be imposed for each submission
Evaluation
Participating algorithms will be evaluated with 5-fold stratified cross validation.
The raw classification (identification) accuracy, standard deviation and a confusion matrix for each algorithm will be computed.
In addition computation times of each participating algorithm will be measured.
Matlab implementation of metrics:
Results
Rank | Submission Information | ||||
---|---|---|---|---|---|
Code | Author | Affiliation |
Technical Report |
Accuracy with 95% confidence interval |
|
DCASE2013 baseline | Dan Stowell | Centre for Digital Music, Queen Mary University of London, London, UK | task-acoustic-scene-classification-results#Stowell2013 | 55.0 (45.2 - 64.8) | |
CHR_1 | May Chum | Electrical Engineering Department, The Cooper Union, New York, USA | task-acoustic-scene-classification-results#Chum2013 | 63.0 (53.5 - 72.5) | |
CHR_2 | May Chum | Electrical Engineering Department, The Cooper Union, New York, USA | task-acoustic-scene-classification-results#Chum2013 | 65.0 (55.7 - 74.3) | |
ELF | Benjamin Elizalde | International Computer Science Institute, Berkeley, USA | task-acoustic-scene-classification-results#Elizalde2013 | 55.0 (45.2 - 64.8) | |
GSR | Jürgen T. Geiger | Institute for Human-Machine Communication, Technische Universität München, München, Germany | task-acoustic-scene-classification-results#Geiger2013 | 69.0 (59.9 - 78.1) | |
KH | Johannes D. Krijnders | INCAS3, Assen, Netherlands | task-acoustic-scene-classification-results#Krijnders2013 | 55.0 (45.2 - 64.8) | |
LTT_1 | David Li | Cooper Union, New York, USA | task-acoustic-scene-classification-results#Li2013 | 72.0 (63.2 - 80.8) | |
LTT_2 | David Li | Cooper Union, New York, USA | task-acoustic-scene-classification-results#Li2013 | 70.0 (61.0 - 79.0) | |
LTT_3 | David Li | Cooper Union, New York, USA | task-acoustic-scene-classification-results#Li2013 | 67.0 (57.8 - 76.2) | |
NHL | Juhan Nam | Stanford University, Stanford, USA | task-acoustic-scene-classification-results#Nam2013 | 60.0 (50.4 - 69.6) | |
NR1_1 | Waldo Nogueira | Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain | task-acoustic-scene-classification-results#Nogueira2013 | 60.0 (50.4 - 69.6) | |
NR1_2 | Waldo Nogueira | Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain | task-acoustic-scene-classification-results#Nogueira2013 | 60.0 (50.4 - 69.6) | |
NR1_3 | Waldo Nogueira | Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain | task-acoustic-scene-classification-results#Nogueira2013 | 59.0 (49.4 - 68.6) | |
OE | Emanuele Olivetti | NeuroInformatics Laboratory, Bruno Kessler Foundation, Trento, Italy; Center for Mind and Brain Sciences, Trento, Italy | task-acoustic-scene-classification-results#Olivetti2013 | 14.0 (7.2 - 20.8) | |
PE | Kailash Patil | Center for Language and Speech Processing, Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA | task-acoustic-scene-classification-results#Patil2013 | 58.0 (48.3 - 67.7) | |
RG | Alain Rakotomamonjy | Center for Language and Speech Processing, Department of Electrical and Computer Engineering, Normandie Universite, Rouen, France | task-acoustic-scene-classification-results#Rakotomamonjy2013 | 69.0 (59.9 - 78.1) | |
RNH_1 | Gerard Roma | Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain | task-acoustic-scene-classification-results#Roma2013 | 71.0 (62.1 - 79.9) | |
RNH_2 | Gerard Roma | Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain | task-acoustic-scene-classification-results#Roma2013 | 76.0 (67.6 - 84.4) |
Complete results and technical reports can be found at Task 1 result page
Baseline system
The baseline system for the task is provided. System is based on MFCC+GMM approach and and bag-of-frames model. The system is described in detail in Stowell et al. :
D. Giannoulis, D. Stowell, E. Benetos, M. Rossignol, M. Lagrange, and M. D. Plumbley. A database and challenge for acoustic scene classification and event detection. In 21st European Signal Processing Conference (EUSIPCO 2013), volume, 1–5. Sep. 2013. doi:.
A database and challenge for acoustic scene classification and event detection
Abstract
An increasing number of researchers work in computational auditory scene analysis (CASA). However, a set of tasks, each with a well-defined evaluation framework and commonly used datasets do not yet exist. Thus, it is difficult for results and algorithms to be compared fairly, which hinders research on the field. In this paper we will introduce a newly-launched public evaluation challenge dealing with two closely related tasks of the field: acoustic scene classification and event detection. We give an overview of the tasks involved; describe the processes of creating the dataset; and define the evaluation metrics. Finally, illustrations on results for both tasks using baseline methods applied on this dataset are presented, accompanied by open-source code.
Keywords
acoustic signal processing;feature extraction;Gaussian processes;mixture models;signal classification;computational auditory scene analysis;CASA;public evaluation challenge;acoustic scene classification;event detection;dataset creation;evaluation metrics;baseline methods;open-source code;Event detection;Measurement;Music;Speech;Educational institutions;Hidden Markov models;Computational auditory scene analysis;acoustic scene classification;acoustic event detection
Python implementation
Citation
If you are using the dataset or baseline code please cite the following paper:
D. Giannoulis, D. Stowell, E. Benetos, M. Rossignol, M. Lagrange, and M. D. Plumbley. A database and challenge for acoustic scene classification and event detection. In 21st European Signal Processing Conference (EUSIPCO 2013), volume, 1–5. Sep. 2013. doi:.
A database and challenge for acoustic scene classification and event detection
Abstract
An increasing number of researchers work in computational auditory scene analysis (CASA). However, a set of tasks, each with a well-defined evaluation framework and commonly used datasets do not yet exist. Thus, it is difficult for results and algorithms to be compared fairly, which hinders research on the field. In this paper we will introduce a newly-launched public evaluation challenge dealing with two closely related tasks of the field: acoustic scene classification and event detection. We give an overview of the tasks involved; describe the processes of creating the dataset; and define the evaluation metrics. Finally, illustrations on results for both tasks using baseline methods applied on this dataset are presented, accompanied by open-source code.
Keywords
acoustic signal processing;feature extraction;Gaussian processes;mixture models;signal classification;computational auditory scene analysis;CASA;public evaluation challenge;acoustic scene classification;event detection;dataset creation;evaluation metrics;baseline methods;open-source code;Event detection;Measurement;Music;Speech;Educational institutions;Hidden Markov models;Computational auditory scene analysis;acoustic scene classification;acoustic event detection
When citing challenge task and results please cite the following paper:
D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley. Detection and classification of acoustic scenes and events. IEEE Transactions on Multimedia, 17(10):1733–1746, Oct 2015. doi:10.1109/TMM.2015.2428998.
Detection and Classification of Acoustic Scenes and Events
Abstract
For intelligent systems to make best use of the audio modality, it is important that they can recognize not just speech and music, which have been researched as specific tasks, but also general sounds in everyday environments. To stimulate research in this field we conducted a public research challenge: the IEEE Audio and Acoustic Signal Processing Technical Committee challenge on Detection and Classification of Acoustic Scenes and Events (DCASE). In this paper, we report on the state of the art in automatically classifying audio scenes, and automatically detecting and classifying audio events. We survey prior work as well as the state of the art represented by the submissions to the challenge from various research groups. We also provide detail on the organization of the challenge, so that our experience as challenge hosts may be useful to those organizing challenges in similar domains. We created new audio datasets and baseline systems for the challenge; these, as well as some submitted systems, are publicly available under open licenses, to serve as benchmarks for further research in general-purpose machine listening.
Keywords
acoustic signal processing;knowledge based systems;speech recognition;acoustic scenes detection;acoustic scenes classification;intelligent systems;audio modality;speech recognition;music;IEEE Audio and Acoustic Signal Processing Technical Committee;DCASE;Event detection;Speech;Speech recognition;Music;Microphones;Licenses;Audio databases;event detection;machine intelligence;pattern recognition