Official results are shown in original DCASE2013 Challenge website. Purpose of this page is to show results in an uniform way compared to more recent editions of the DCASE Challenge.

Task description

The scene classification (SC) challenge will address the problem of identifying and classifying acoustic scenes and soundscapes.

More detailed task description can be found in the task description page

Systems ranking

Submission code	Submission name	Technical Report	Accuracy with 95% confidence interval (Evaluation dataset)
DCASE2013 baseline	Baseline		55.0 (45.2 - 64.8)
CHR_1	CHR_SVM	Chum2013	63.0 (53.5 - 72.5)
CHR_2	CHR_HMM	Chum2013	65.0 (55.7 - 74.3)
ELF	ELF	Elizalde2013	55.0 (45.2 - 64.8)
GSR	GSR	Geiger2013	69.0 (59.9 - 78.1)
KH	KH	Krijnders2013	55.0 (45.2 - 64.8)
LTT_1	LTT_1	Li2013	72.0 (63.2 - 80.8)
LTT_2	LTT_2	Li2013	70.0 (61.0 - 79.0)
LTT_3	LTT_2	Li2013	67.0 (57.8 - 76.2)
NHL	NHL	Nam2013	60.0 (50.4 - 69.6)
NR1_1	NR1_1	Nogueira2013	60.0 (50.4 - 69.6)
NR1_2	NR1_2	Nogueira2013	60.0 (50.4 - 69.6)
NR1_3	NR1_3	Nogueira2013	59.0 (49.4 - 68.6)
OE	OE	Olivetti2013	14.0 (7.2 - 20.8)
PE	PE	Patil2013	58.0 (48.3 - 67.7)
RG	RG	Rakotomamonjy2013	69.0 (59.9 - 78.1)
RNH_1	RNH1	Roma2013	71.0 (62.1 - 79.9)
RNH_2	RNH2	Roma2013	76.0 (67.6 - 84.4)

Teams ranking

Table including only the best performing system per submitting team.

Submission code	Submission name	Technical Report	Accuracy with 95% confidence interval (Evaluation dataset)
DCASE2013 baseline	Baseline		55.0 (45.2 - 64.8)
CHR_2	CHR_HMM	Chum2013	65.0 (55.7 - 74.3)
ELF	ELF	Elizalde2013	55.0 (45.2 - 64.8)
GSR	GSR	Geiger2013	69.0 (59.9 - 78.1)
KH	KH	Krijnders2013	55.0 (45.2 - 64.8)
LTT_1	LTT_1	Li2013	72.0 (63.2 - 80.8)
NHL	NHL	Nam2013	60.0 (50.4 - 69.6)
NR1_1	NR1_1	Nogueira2013	60.0 (50.4 - 69.6)
OE	OE	Olivetti2013	14.0 (7.2 - 20.8)
PE	PE	Patil2013	58.0 (48.3 - 67.7)
RG	RG	Rakotomamonjy2013	69.0 (59.9 - 78.1)
RNH_2	RNH2	Roma2013	76.0 (67.6 - 84.4)

Class-wise performance

Submission code	Submission name	Technical Report	Accuracy (Evaluation dataset)	Bus	Busy street	Office	Open air Market	Park	Quiet street	Restaurant	Supermarket	Tube	Tube station
DCASE2013 baseline	Baseline		55.0	90.0	30.0	80.0	70.0	90.0	40.0	30.0	20.0	60.0	40.0
CHR_1	CHR_SVM	Chum2013	63.0	100.0	80.0	40.0	70.0	30.0	40.0	70.0	50.0	80.0	70.0
CHR_2	CHR_HMM	Chum2013	65.0	90.0	90.0	80.0	60.0	90.0	50.0	40.0	30.0	80.0	40.0
ELF	ELF	Elizalde2013	55.0	50.0	70.0	60.0	100.0	40.0	50.0	70.0	30.0	30.0	50.0
GSR	GSR	Geiger2013	69.0	90.0	90.0	90.0	90.0	70.0	40.0	80.0	50.0	70.0	20.0
KH	KH	Krijnders2013	55.0	60.0	90.0	70.0	50.0	30.0	50.0	50.0	20.0	80.0	50.0
LTT_1	LTT_1	Li2013	72.0	100.0	100.0	70.0	80.0	70.0	50.0	70.0	70.0	70.0	40.0
LTT_2	LTT_2	Li2013	70.0	100.0	100.0	70.0	80.0	70.0	60.0	70.0	40.0	70.0	40.0
LTT_3	LTT_2	Li2013	67.0	100.0	100.0	70.0	80.0	70.0	50.0	60.0	30.0	70.0	40.0
NHL	NHL	Nam2013	60.0	70.0	90.0	80.0	70.0	50.0	60.0	50.0	40.0	30.0	60.0
NR1_1	NR1_1	Nogueira2013	60.0	80.0	80.0	60.0	70.0	70.0	30.0	60.0	80.0	20.0	50.0
NR1_2	NR1_2	Nogueira2013	60.0	80.0	90.0	50.0	80.0	70.0	20.0	90.0	60.0	20.0	40.0
NR1_3	NR1_3	Nogueira2013	59.0	80.0	80.0	50.0	70.0	70.0	20.0	90.0	70.0	20.0	40.0
OE	OE	Olivetti2013	14.0	0.0	10.0	20.0	20.0	10.0	30.0	0.0	20.0	20.0	10.0
PE	PE	Patil2013	58.0	90.0	90.0	50.0	70.0	40.0	60.0	60.0	20.0	40.0	60.0
RG	RG	Rakotomamonjy2013	69.0	100.0	100.0	80.0	80.0	80.0	30.0	50.0	50.0	80.0	40.0
RNH_1	RNH1	Roma2013	71.0	80.0	100.0	60.0	40.0	70.0	60.0	100.0	70.0	70.0	60.0
RNH_2	RNH2	Roma2013	76.0	80.0	100.0	80.0	70.0	70.0	50.0	100.0	80.0	70.0	60.0

System characteristics

Code	Technical Report	Accuracy (Eval)	Input	Sampling rate	Features	Classifier
DCASE2013 baseline		55.0	mono	44.1kHz	MFCC	GMM
CHR_1	Chum2013	63.0	mono	11.025kHz	Magnitude response, Loudness, Spectral sparsity, Temporal sparsity	SVM
CHR_2	Chum2013	65.0	mono	11.025kHz	Magnitude response, Loudness, Spectral sparsity, Temporal sparsity	HMM
ELF	Elizalde2013	55.0	left, right, difference, average	44.1kHz	MFCC	i-vector, pLDA
GSR	Geiger2013	69.0	mono	44.1kHz	openSMILE / emo_large	SVM
KH	Krijnders2013	55.0	mono	44.1kHz	tone-fit representation	SVM
LTT_1	Li2013	72.0	mono	44.1kHz	Wavelet, MFCC	Treebagger, majority vote
LTT_2	Li2013	70.0	mono	44.1kHz	Wavelet, MFCC	Treebagger, majority vote
LTT_3	Li2013	67.0	mono	44.1kHz	Wavelet, MFCC	Treebagger, majority vote
NHL	Nam2013	60.0	mono	44.1kHz	Feature learning, max-pooling	SVM
NR1_1	Nogueira2013	60.0	mono	44.1kHz	MFCC, temporal modulation, event density, binaural features	SVM
NR1_2	Nogueira2013	60.0	mono	44.1kHz	MFCC, temporal modulation, event density, binaural features	SVM
NR1_3	Nogueira2013	59.0	mono	44.1kHz	MFCC, temporal modulation, event density, binaural features	SVM
OE	Olivetti2013	14.0	mono	44.1kHz	Normalized compression distance, Euclidean embedding	Random Forest
PE	Patil2013	58.0	mono	44.1kHz	Spectrotemporal modulation	SVM
RG	Rakotomamonjy2013	69.0	mono	44.1kHz	CQT, HOG	SVM
RNH_1	Roma2013	71.0	mono	44.1kHz	MFCC, Recurrence Quantification Analysis	SVM
RNH_2	Roma2013	76.0	mono	44.1kHz	MFCC, Recurrence Quantification Analysis	SVM

Technical reports

IEEE AASP Scene Classification Challenge Using Hidden Markov Models and Frame Based Classification

May Chum, Ariel Habshush, Abrar Rahman and Christopher Sang

Electrical Engineering Department, The Cooper Union, New York, USA

CHR_1 CHR_2

PDF Code

IEEE AASP Scene Classification Challenge Using Hidden Markov Models and Frame Based Classification

May Chum, Ariel Habshush, Abrar Rahman and Christopher Sang
Electrical Engineering Department, The Cooper Union, New York, USA

Abstract

The IEEE AASP Challenge involves the detection and classification of acoustic scenes and events. The scene classification (SC) challenge consists of 10 different scenes of 10 audio files or length 30 seconds each, totaling a number of 100 audio clips. The list of scenes is: busy street, quiet street, park, open-air market, bus, subway-train, restaurant, shop/supermarket, office, and subway station. The goal is to test on a development set that is composed of audio clips of the same scenes as the training set and determine what scene the audio clips originated from. One of the algorithms presented in this paper to discriminate between these different scenes include the use of hidden Markov models (HMMs) and Gaussian mixture models (GMMs). The features that were used include the following: short time Fourier transform, loudness, and spectral sparsity. Using these features yielded 72% correct classification with 10 fold crossvalidation. The other algorithm implemented uses the same features as before plus temporal sparsity to classify individual frames of an audio clip, then vote on the class. This algorithm achieved 62% accuracy.

System characteristics

Input	mono
Sampling rate	11.025kHz
Features	Magnitude response, Loudness, Spectral sparsity, Temporal sparsity
Classifier	SVM; HMM

Content

Task description

Systems ranking

Teams ranking

Class-wise performance

System characteristics

Technical reports

IEEE AASP Scene Classification Challenge Using Hidden Markov Models and Frame Based Classification

IEEE AASP Scene Classification Challenge Using Hidden Markov Models and Frame Based Classification

Abstract

System characteristics

An I-Vector Based Approach for Audio Scene Detection

An I-Vector Based Approach for Audio Scene Detection

Abstract

System characteristics

Recognising Acoustic Scenes with Large-Scale Audio Feature Extraction and SVM

Recognising Acoustic Scenes with Large-Scale Audio Feature Extraction and SVM

Abstract

System characteristics

A Tone-Fit Feature Representation for Scene Classification

A Tone-Fit Feature Representation for Scene Classification

Abstract

System characteristics

Auditory Scene Classification Using Machine Learning Techniques

Auditory Scene Classification Using Machine Learning Techniques

Abstract

System characteristics

Acoustic Scene Classification Using Sparse Feature Learning and Selective Max-Pooling by Event Detection

Acoustic Scene Classification Using Sparse Feature Learning and Selective Max-Pooling by Event Detection

Abstract

System characteristics

Sound Scene Identification Based on Mfcc, Binaural Features and a Support Vector Machine Classifier

Sound Scene Identification Based on Mfcc, Binaural Features and a Support Vector Machine Classifier

Abstract

System characteristics

The Wonders of the Normalized Compression Dissimilarity Representation

The Wonders of the Normalized Compression Dissimilarity Representation

Abstract

System characteristics

Multiresolution Auditory Representations for Scene Classification

Multiresolution Auditory Representations for Scene Classification

Abstract

System characteristics

Histogram of Gradients of Time-Frequency Representations for Audio Scene Classification

Histogram of Gradients of Time-Frequency Representations for Audio Scene Classification

Abstract

System characteristics

Recurrence Quantification Analysis Features for Auditory Scene Classification

Recurrence Quantification Analysis Features for Auditory Scene Classification

Abstract

System characteristics