Official results are shown in original DCASE2013 Challenge website. Purpose of this page is to show results in an uniform way compared to more recent editions of the DCASE Challenge.
Task description
The event detection challenge will address the problem of identifying individual sound events that are prominent in an acoustic scene. Two distinct experiments will take, one for simple acoustic scenes without overlapping sounds and the other using complex scenes in a polyphonic scenario.
More detailed task description can be found in the task description page
Systems
Frame-based results
Rank | Submission Information | Frame-based metrics | ||||||
---|---|---|---|---|---|---|---|---|
Code | Author | Affiliation |
Technical Report |
AEER / Frame-based | F1 / Frame-based | Precision / Frame-based | Recall / Frame-based | |
DCASE2013 baseline | Dimitrios Giannoulis | Centre for Digital Music, Queen Mary University of London, London, UK | 2.5900 | 10.7 | 12.1 | 10.6 | ||
CPS | Sameer Chauhan | Electrical Engineering, Cooper Union for the Advancement of Science and Art, New York, USA | Chauhan2013 | 2.1160 | 3.8 | 9.2 | 3.0 | |
DHV | Aleksandr Diment | Tampere University of Technology, Tampere, Finland | Diment2013 | 3.1280 | 26.0 | 19.8 | 45.3 | |
GVV | Jort F Gemmeke | ESAT-PSI, KU Leuven, Heverlee, Belgium | Gemmeke2013 | 1.0840 | 31.9 | 61.8 | 22.3 | |
NR2 | Waldo Nogueira | Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain | Nogueira2013 | 1.8850 | 34.7 | 37.1 | 35.0 | |
NVM_1 | Maria E. Niessen | AGT International, Darmstadt, Germany | Niessen2013 | 1.1150 | 40.9 | 59.9 | 32.9 | |
NVM_2 | Maria E. Niessen | AGT International, Darmstadt, Germany | Niessen2013 | 1.1020 | 42.8 | 61.1 | 34.3 | |
NVM_3 | Maria E. Niessen | AGT International, Darmstadt, Germany | Niessen2013 | 1.2120 | 45.5 | 57.2 | 38.8 | |
NVM_4 | Maria E. Niessen | AGT International, Darmstadt, Germany | Niessen2013 | 1.3600 | 42.9 | 50.8 | 37.8 | |
SCS_1 | Jens Schröder | Project Group Hearing, Speech and Audio Technology, Fraunhofer IDMT, Oldenburg, Germany | Schroeder2013 | 1.1670 | 53.0 | 59.9 | 48.3 | |
SCS_2 | Jens Schröder | Project Group Hearing, Speech and Audio Technology, Fraunhofer IDMT, Oldenburg, Germany | Schroeder2013 | 1.0160 | 61.5 | 66.2 | 57.8 | |
VVK | Lode Vuegen | ESAT-PSI, KU Leuven, Heverlee, Belgium; Future Health Department, iMinds, Heverlee, Belgium; MOBILAB, TM Kempen, Geel, Belgium | Vuegen2013 | 1.0010 | 43.4 | 68.1 | 32.6 |
Event-based results (onset-only)
Rank | Submission Information | Event-based metrics | Class-wise Event-based metrics | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Code | Author | Affiliation |
Technical Report |
AEER / Event-based | F1 / Event-based | Precision / Event-based | Recall / Event-based | AEER / Class-wise Event-based | F1 / Class-wise Event-based | Precision / Class-wise Event-based | Recall / Class-wise Event-based | |
DCASE2013 baseline | Dimitrios Giannoulis | Centre for Digital Music, Queen Mary University of London, London, UK | 5.9000 | 7.4 | 4.8 | 18.2 | 5.9600 | 9.0 | 7.3 | 21.6 | ||
CPS | Sameer Chauhan | Electrical Engineering, Cooper Union for the Advancement of Science and Art, New York, USA | Chauhan2013 | 2.2850 | 2.2 | 3.2 | 1.9 | 1.8720 | 0.7 | 0.4 | 2.2 | |
DHV | Aleksandr Diment | Tampere University of Technology, Tampere, Finland | Diment2013 | 2.5190 | 26.7 | 22.8 | 33.5 | 2.1820 | 30.7 | 31.0 | 35.9 | |
GVV | Jort F Gemmeke | ESAT-PSI, KU Leuven, Heverlee, Belgium | Gemmeke2013 | 1.7790 | 15.5 | 61.8 | 22.2 | 1.5560 | 13.2 | 14.2 | 13.8 | |
NR2 | Waldo Nogueira | Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain | Nogueira2013 | 3.0760 | 19.2 | 14.8 | 27.7 | 2.8570 | 21.5 | 20.9 | 28.5 | |
NVM_1 | Maria E. Niessen | AGT International, Darmstadt, Germany | Niessen2013 | 1.8640 | 32.6 | 33.9 | 32.2 | 1.6390 | 29.4 | 28.9 | 34.2 | |
NVM_2 | Maria E. Niessen | AGT International, Darmstadt, Germany | Niessen2013 | 1.8520 | 34.2 | 34.9 | 34.2 | 1.6020 | 33.0 | 33.1 | 33.3 | |
NVM_3 | Maria E. Niessen | AGT International, Darmstadt, Germany | Niessen2013 | 1.8270 | 34.5 | 36.1 | 33.8 | 1.5750 | 33.5 | 35.1 | 24.6 | |
NVM_4 | Maria E. Niessen | AGT International, Darmstadt, Germany | Niessen2013 | 1.9060 | 30.5 | 31.8 | 30.1 | 1.6500 | 28.2 | 30.2 | 30.8 | |
SCS_1 | Jens Schröder | Project Group Hearing, Speech and Audio Technology, Fraunhofer IDMT, Oldenburg, Germany | Schroeder2013 | 1.6690 | 39.5 | 41.7 | 37.8 | 1.5790 | 36.3 | 40.6 | 39.6 | |
SCS_2 | Jens Schröder | Project Group Hearing, Speech and Audio Technology, Fraunhofer IDMT, Oldenburg, Germany | Schroeder2013 | 1.6010 | 45.2 | 45.5 | 45.4 | 1.5110 | 41.5 | 43.4 | 46.4 | |
VVK | Lode Vuegen | ESAT-PSI, KU Leuven, Heverlee, Belgium; Future Health Department, iMinds, Heverlee, Belgium; MOBILAB, TM Kempen, Geel, Belgium | Vuegen2013 | 2.0540 | 30.8 | 31.3 | 32.5 | 1.7620 | 24.6 | 22.3 | 33.0 |
Event-based results (onset-offset)
Rank | Submission Information | Event-based metrics | Class-wise Event-based metrics | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Code | Author | Affiliation |
Technical Report |
AEER / Event-based | F1 / Event-based | Precision / Event-based | Recall / Event-based | AEER / Class-wise Event-based | F1 / Class-wise Event-based | Precision / Class-wise Event-based | Recall / Class-wise Event-based | |
DCASE2013 baseline | Dimitrios Giannoulis | Centre for Digital Music, Queen Mary University of London, London, UK | 6.3180 | 1.6 | 1.0 | 4.2 | 6.4620 | 1.9 | 1.4 | 4.9 | ||
CPS | Sameer Chauhan | Electrical Engineering, Cooper Union for the Advancement of Science and Art, New York, USA | Chauhan2013 | 2.3010 | 1.6 | 2.4 | 1.3 | 1.8910 | 0.5 | 0.3 | 1.6 | |
DHV | Aleksandr Diment | Tampere University of Technology, Tampere, Finland | Diment2013 | 2.6760 | 22.4 | 19.1 | 28.2 | 2.3700 | 25.3 | 25.5 | 29.6 | |
GVV | Jort F Gemmeke | ESAT-PSI, KU Leuven, Heverlee, Belgium | Gemmeke2013 | 1.8310 | 13.5 | 21.9 | 12.9 | 1.6060 | 12.0 | 13.2 | 12.1 | |
NR2 | Waldo Nogueira | Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain | Nogueira2013 | 3.2440 | 15.3 | 11.8 | 22.0 | 3.0100 | 17.6 | 16.6 | 23.3 | |
NVM_1 | Maria E. Niessen | AGT International, Darmstadt, Germany | Niessen2013 | 2.0950 | 24.9 | 26.1 | 24.5 | 1.8990 | 21.8 | 21.3 | 25.6 | |
NVM_2 | Maria E. Niessen | AGT International, Darmstadt, Germany | Niessen2013 | 2.0950 | 26.3 | 27.1 | 26.1 | 1.8770 | 24.9 | 24.7 | 28.1 | |
NVM_3 | Maria E. Niessen | AGT International, Darmstadt, Germany | Niessen2013 | 2.0520 | 27.0 | 28.4 | 26.3 | 1.8460 | 24.6 | 25.3 | 27.6 | |
NVM_4 | Maria E. Niessen | AGT International, Darmstadt, Germany | Niessen2013 | 2.0830 | 24.7 | 25.9 | 24.2 | 1.8490 | 21.6 | 23.1 | 24.1 | |
SCS_1 | Jens Schröder | Project Group Hearing, Speech and Audio Technology, Fraunhofer IDMT, Oldenburg, Germany | Schroeder2013 | 1.7490 | 36.7 | 38.9 | 35.1 | 1.6770 | 34.2 | 38.8 | 36.3 | |
SCS_2 | Jens Schröder | Project Group Hearing, Speech and Audio Technology, Fraunhofer IDMT, Oldenburg, Germany | Schroeder2013 | 1.7270 | 41.1 | 41.4 | 41.2 | 1.6460 | 38.3 | 40.6 | 41.9 | |
VVK | Lode Vuegen | ESAT-PSI, KU Leuven, Heverlee, Belgium; Future Health Department, iMinds, Heverlee, Belgium; MOBILAB, TM Kempen, Geel, Belgium | Vuegen2013 | 2.2240 | 25.4 | 25.8 | 26.9 | 1.9490 | 20.4 | 18.8 | 26.8 |
System characteristics
Rank | Code |
Technical Report |
Accuracy (Eval) |
Input |
Sampling rate |
Features | Classifier |
---|---|---|---|---|---|---|---|
DCASE2013 baseline | mono | 44.1kHz | NMF | NMF | |||
CPS | Chauhan2013 | mono | 44.1kHz | loudness, wavelet decomposition coefficients, autocorrelation, spectral centroid, spectral flux, spectral entropy, short time energy, spectral roll-off, MFCC | LRT | ||
DHV | Diment2013 | mono | 44.1kHz | MFCC | HMM | ||
GVV | Gemmeke2013 | mono | 44.1kHz | NMF | HMM | ||
NR2 | Nogueira2013 | mono | 44.1kHz | MFCC | SVM | ||
NVM_1 | Niessen2013 | mono | 44.1kHz | STE, ZCR, flatness, spectral flux, spectral roll-off, spectral flatness, spectral brightness, MFCC, LPC | hierarchical HMM, random forests | ||
NVM_2 | Niessen2013 | mono | 44.1kHz | STE, ZCR, flatness, spectral flux, spectral roll-off, spectral flatness, spectral brightness, MFCC, LPC | hierarchical HMM, random forests | ||
NVM_3 | Niessen2013 | mono | 44.1kHz | STE, ZCR, flatness, spectral flux, spectral roll-off, spectral flatness, spectral brightness, MFCC, LPC | hierarchical HMM, random forests | ||
NVM_4 | Niessen2013 | mono | 44.1kHz | STE, ZCR, flatness, spectral flux, spectral roll-off, spectral flatness, spectral brightness, MFCC, LPC | hierarchical HMM, random forests | ||
SCS_1 | Schroeder2013 | mono | 44.1kHz | Gabor filterbank | HMM | ||
SCS_2 | Schroeder2013 | mono | 44.1kHz | Gabor filterbank | HMM | ||
VVK | Vuegen2013 | mono | 44.1kHz | MFCC | GMM |
Technical reports
Event Detection and Classification
Sameer Chauhan, Sharang Phadke and Christian Sherland
Electrical Engineering, Cooper Union for the Advancement of Science and Art, New York, USA
CPS
Event Detection and Classification
Sameer Chauhan, Sharang Phadke and Christian Sherland
Electrical Engineering, Cooper Union for the Advancement of Science and Art, New York, USA
Abstract
The IEEE AASP Challenge addresses the problem of acoustic event detection and classification in an office environment. Our system performs segmentation and event classification on a continuous stream of acoustic activity in an office using basic feature extraction techniques and a single layer frame-by-frame classifier. We achieve high classification accuracy in noiseless environments, but performance severely deteriorates in noisy environments.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Features | loudness, wavelet decomposition coefficients, autocorrelation, spectral centroid, spectral flux, spectral entropy, short time energy, spectral roll-off, MFCC |
Classifier | LRT |
Sound Event Detection for Office Live and Office Synthetic AASP Challenge
Aleksandr Diment, Toni Heittola and Tuomas Virtanen
Tampere University of Technology, Tampere, Finland
DHV
Sound Event Detection for Office Live and Office Synthetic AASP Challenge
Aleksandr Diment, Toni Heittola and Tuomas Virtanen
Tampere University of Technology, Tampere, Finland
Abstract
We present a sound event detection system based on hidden Markov models. The system is evaluated with development material provided in the AASP Challenge on Detection and Classification of Acoustic Scenes and Events. Two approaches using the same basic detection scheme are presented. First one, developed for acoustic scenes with non-overlapping sound events is evaluated with Office Live development dataset. Second one, developed for acoustic scenes with some degree of overlapping sound events is evaluated with Office Synthetic development dataset.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Features | MFCC |
Classifier | HMM |
An Exemplar-Based NMF Approach for Audio Event Detection
Jort F Gemmeke1, Lode Vuegen1,2,3, Bart Vanrumste2,3,4 and Hugo Van hamme1
1ESAT-PSI, KU Leuven, Heverlee, Belgium, 2Future Health Department, iMinds, Heverlee, Belgium, 3MOBILAB, TM Kempen, Geel, Belgium, 4ESAT-SISTA, KU Leuven, Heverlee, Belgium
GVV
An Exemplar-Based NMF Approach for Audio Event Detection
Jort F Gemmeke1, Lode Vuegen1,2,3, Bart Vanrumste2,3,4 and Hugo Van hamme1
1ESAT-PSI, KU Leuven, Heverlee, Belgium, 2Future Health Department, iMinds, Heverlee, Belgium, 3MOBILAB, TM Kempen, Geel, Belgium, 4ESAT-SISTA, KU Leuven, Heverlee, Belgium
Abstract
We present a novel, exemplar-based method for audio event detection based on non-negative matrix factorisation (NMF). Building on recent work in noise robust automatic speech recognition, we model events as a linear combination of dictionary atoms, and mixtures as a linear combination of overlapping events. The exemplar based dictionary is created by extracting all available training data, artificially augmented by linear time warping at multiple rates. The method is evaluated on the Office Live and Office Synthetic development datasets released by the AASP Challenge on Detection and Classification of Acoustic Scenes and Events.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Features | NMF |
Classifier | HMM |
Hierarchical Sound Event Detection
Maria E. Niessen, Tim L. M Van Kasteren and Andreas Merentitis
AGT International, Darmstadt, Germany
NVM_1 NVM_2 NVM_3 NVM_4
Hierarchical Sound Event Detection
Maria E. Niessen, Tim L. M Van Kasteren and Andreas Merentitis
AGT International, Darmstadt, Germany
Abstract
Environmental sound recognition in real-world conditions is a particularly challenging topic, since it requires significant efforts on both the feature extraction and the classification modeling parts in order to achieve satisfactory results. In the presented work we propose a multi-tier method that employs best of breed techniques at all relevant tasks; initially feature extraction takes place focusing on a broad range of audio features. Following feature extraction a Hierarchical Hidden Markov Model classifier scheme with explicit modeling of the finishing of the state to better detect transitions is developed. Finally, the best result is achieved when ensemble methods are added on top of the previous scheme. Specifically, a variation of Stacking using a Random Forest and the HHMM as level-1 classifiers and a second instance of HHMM as the metaclassifier is selected. Results indicate that this final method is on one hand able to deliver the best overall performance, as well as explore different tradeoffs between classes and metrics (e.g. emphasize on specific metrics or classes that are of higher importance).
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Features | STE, ZCR, flatness, spectral flux, spectral roll-off, spectral flatness, spectral brightness, MFCC, LPC |
Classifier | hierarchical HMM, random forests |
Automatic Event Classification Using Front End Single Channel Noise Reduction, MFCC Features and a Support Vector Machine Classifier
Waldo Nogueira, Gerard Roma and Perfecto Herrera
Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
NR2
Automatic Event Classification Using Front End Single Channel Noise Reduction, MFCC Features and a Support Vector Machine Classifier
Waldo Nogueira, Gerard Roma and Perfecto Herrera
Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
Abstract
This submission to the sub-task scene event classification Office Live of the IEEE AASP Challenge: Detection and Classification of Acoustic Scenes and Events uses first a single channel noise reduction to clean stationary background noise, next mfccs are extracted and finally a support-vector machine classifier is used to classify the events. In this short paper the usage of the implementation as well as a short description of the system is explained.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Features | MFCC |
Classifier | SVM |
Acoustic Event Detection Using Signal Enhancement and Spectro-Temporal Feature Extraction
Jens Schröder1, Benjamin Cauchi1, Marc René Schädler2, Niko Moritz1, Kamil Adiloglu3, Jörn Anemüller1,2, Simon Doclo1,2, Birger Kollmeier1,2,3 and Stefan Goetze1
1Project Group Hearing, Speech and Audio Technology, Fraunhofer IDMT, Oldenburg, Germany, 2Department of Medical Physics and Acoustis, University of Oldenburg, Oldenburg, Germany, 3Hörtech gGmbH, Oldenburg, Germany
SCS_1 SCS_2
Acoustic Event Detection Using Signal Enhancement and Spectro-Temporal Feature Extraction
Jens Schröder1, Benjamin Cauchi1, Marc René Schädler2, Niko Moritz1, Kamil Adiloglu3, Jörn Anemüller1,2, Simon Doclo1,2, Birger Kollmeier1,2,3 and Stefan Goetze1
1Project Group Hearing, Speech and Audio Technology, Fraunhofer IDMT, Oldenburg, Germany, 2Department of Medical Physics and Acoustis, University of Oldenburg, Oldenburg, Germany, 3Hörtech gGmbH, Oldenburg, Germany
Abstract
In this paper, an acoustic event detection system is proposed. It consists of a noise reduction signal enhancement step based on the noise power spectral density estimator proposed in [1] and on the noise suppression by [2], a Gabor filterbank feature extraction stage and a two layer hidden Markov model as back-end classifier. Optimization on the development set yields up to a F-Score of 0.73 on frame based and 0.63 on onset and offset based measure.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Features | Gabor filterbank |
Classifier | HMM |
An MFCC-GMM Approach for Event Detection and Classification
Lode Vuegen1,2,3, Bert Van Den Broeck2,3,4, Peter Karsmakers2,3,4, Jort F Gemmeke1, Bart Vanrumste2,3,4 and Hugo Van hamme1
1ESAT-PSI, KU Leuven, Heverlee, Belgium, 2Future Health Department, iMinds, Heverlee, Belgium, 3MOBILAB, TM Kempen, Geel, Belgium, 4ESAT-SISTA, KU Leuven, Heverlee, Belgium
VVK
An MFCC-GMM Approach for Event Detection and Classification
Lode Vuegen1,2,3, Bert Van Den Broeck2,3,4, Peter Karsmakers2,3,4, Jort F Gemmeke1, Bart Vanrumste2,3,4 and Hugo Van hamme1
1ESAT-PSI, KU Leuven, Heverlee, Belgium, 2Future Health Department, iMinds, Heverlee, Belgium, 3MOBILAB, TM Kempen, Geel, Belgium, 4ESAT-SISTA, KU Leuven, Heverlee, Belgium
Abstract
This abstract explores Gaussian Mixture Models (GMM) estimated from Mel Frequency Cepstral Coefficients (MFCCs) for acoustic event detection and classification. To limit the impact of silence, a shared background model is used. An average Fscore of 48% for the office life subtask is obtained. However, the analysis reveals that the proposed method has difficulties to cope with the large intra-class variations (e.g. time durations, dynamic range, characteristic sounds) in the provided dataset.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Features | MFCC |
Classifier | GMM |