Challenge has ended. Full results for this task can be found here

Description

The goal of acoustic scene classification is to classify a test recording into one of the provided predefined classes that characterizes the environment in which it was recorded — for example "park", "home", "office".

Figure 1: Overview of acoustic scene classification system.

Audio dataset

TUT Acoustic Scenes 2017 dataset will be used as development data for the task. The dataset consists of recordings from various acoustic scenes, all having distinct recording locations. For each recording location, 3-5 minute long audio recording was captured. The original recordings were then split into segments with a length of 10 seconds. These audio segments are provided in individual files.

Acoustic scenes for the task (15):

Bus - traveling by bus in the city (vehicle)
Cafe / Restaurant - small cafe/restaurant (indoor)
Car - driving or traveling as a passenger, in the city (vehicle)
City center (outdoor)
Forest path (outdoor)
Grocery store - medium size grocery store (indoor)
Home (indoor)
Lakeside beach (outdoor)
Library (indoor)
Metro station (indoor)
Office - multiple persons, typical work day (indoor)
Residential area (outdoor)
Train (traveling, vehicle)
Tram (traveling, vehicle)
Urban park (outdoor)

Detailed description of acoustic scenes included in the dataset can be found DCASE2016 Task1 page.

The dataset was collected in Finland by Tampere University of Technology between 06/2015 - 01/2017. The data collection has received funding from the European Research Council.

Recording and annotation procedure

For all acoustic scenes, the recordings were captured each in a different location: different streets, different parks, different homes. Recordings were made using a Soundman OKM II Klassik/studio A3, electret binaural microphone and a Roland Edirol R-09 wave recorder using 44.1 kHz sampling rate and 24 bit resolution. The microphones are specifically made to look like headphones, being worn in the ears. As an effect of this, the recorded audio is very similar to the sound that reaches the human auditory system of the person wearing the equipment.

Postprocessing of the recorded data involves aspects related to privacy of recorded individuals. For audio material recorded in private places, written consent was obtained from all people involved. Material recorded in public places does not require such consent, but was screened for content, and privacy infringing segments were eliminated. Microphone failure and audio distortions were annotated, and the annotations are provided with the data. Based on experiments in DCASE 2016, eliminating the error regions in training does not influence the final classification accuracy. The evaluation set does not contain any such audio errors.

Download

In case you are using the provided baseline system, there is no need to download the dataset as the system will automatically download needed dataset for you.

** Development dataset **

TUT Acoustic scenes 2017, development dataset (10.7 GB)

** Evaluation dataset **

TUT Acoustic scenes 2017, evaluation dataset (3.6 GB)

Task setup

TUT Acoustic Scenes 2017 dataset consist of two subsets: development dataset and evaluation dataset. The development dataset consists of the complete TUT Acoustic Scenes 2016 dataset (both development and evaluation data of the 2016 challenge). The partitioning of the data into subsets was done based on the location of the original recordings, so the evaluation dataset contains recordings of similar audio scenes but from different geographical locations. All segments obtained from the same original recording were included into a single subset - either development dataset or evaluation dataset. For each acoustic scene, there are 312 segments (52 minutes of audio) in the development dataset.

A detailed description of the data recording and annotation procedure is available in:

Publication

Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. TUT database for acoustic scene classification and sound event detection. In 24th European Signal Processing Conference 2016 (EUSIPCO 2016). Budapest, Hungary, 2016.

PDF

TUT Database for Acoustic Scene Classification and Sound Event Detection

Abstract

We introduce TUT Acoustic Scenes 2016 database for environmental sound research, consisting ofbinaural recordings from 15 different acoustic environments. A subset of this database, called TUT Sound Events 2016, contains annotations for individual sound events, specifically created for sound event detection. TUT Sound Events 2016 consists of residential area and home environments, and is manually annotated to mark onset, offset and label of sound events. In this paper we present the recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models. The database is publicly released to provide support for algorithm development and common ground for comparison of different techniques.

PDF

Development dataset

A cross-validation setup is provided for the development dataset in order to make results reported with this dataset uniform. The setup consists of four folds distributing the available segments based on location. The folds are provided with the dataset in the directory evaluation setup.

Fold 1 of the provided setup reproduces the DCASE 2016 challenge setup, by using the 2016 development set as training subset and the 2016 evaluation set as test subset.

Important: If you are not using the provided cross-validation setup, pay attention to the segments extracted from same original recordings. Make sure that for each given fold, ALL segments from same location must be either in the training subset OR in the test subset.

Evaluation dataset

Evaluation dataset without ground truth will be released one month before the submission deadline. Full ground truth meta data for it will be published after the DCASE 2017 challenge and workshop are concluded.

Submission

Detailed information for the challenge submission can found on the submission page.

System output should be presented as a single text-file (in CSV format) containing classification result for each audio file in the evaluation set. Result items can be in any order. Format:

[filename (string)][tab][scene label (string)]

Multiple system outputs can be submitted (maximum 4 per participant). If submitting multiple systems, the individual text-files should be packaged into a zip file for submission. Please carefully mark the connection between the submitted files and the corresponding system or system parameters (for example by naming the text file appropriately).

Task rules

These are the general rules valid for all tasks. The same rules and additional information on technical report and submission requirements can be found here. Task specific rules are highlighted with green.

Participants are not allowed to use external data for system development. Data from another task is considered external data.
Manipulation of provided training and development data is allowed.
The development dataset can be augmented without use of external data (e.g. by mixing data sampled from a pdf or using techniques such as pitch shifting or time stretching).
Participants are not allowed to make subjective judgments of the evaluation data, nor to annotate it. The evaluation dataset cannot be used to train the submitted system; the use of statistics about the evaluation data in the decision making is also forbidden.

Evaluation

The scoring of acoustic scene classification will be based on classification accuracy: the number of correctly classified segments among the total number of segments. Each segment is considered an independent test sample.

The evaluation is done automatically in the baseline system. Evaluation is done using sed_eval toolbox.

sed_eval - Evaluation toolbox for Sound Event Detection

Results

Rank	Submission Information
Rank	Code	Author	Affiliation	Technical Report	Accuracy with 95% confidence interval
	Abrol_IITM_task1_1	Vinayak Abrol	Multimedia Analytics and Systems Lab, SCEE, Indian Institute of Technology Mandi, Mandi, India	task-acoustic-scene-classification-results#Abrol2017	65.7 (63.4 - 68.0)
	Amiriparian_AU_task1_1	Shahin Amiriparian	Chair of Complex & Intelligent Systems, Universität Passau, Passau, Germany; Chair of Embedded Intelligence for Health Care, Augsburg University, Augsburg, Germany; Machine Intelligence & Signal Processing Group, Technische Universität München, München, Germany	task-acoustic-scene-classification-results#Amiriparian2017	67.5 (65.3 - 69.8)
	Amiriparian_AU_task1_2	Shahin Amiriparian	Chair of Complex & Intelligent Systems, Universität Passau, Passau, Germany; Chair of Embedded Intelligence for Health Care, Augsburg University, Augsburg, Germany; Machine Intelligence & Signal Processing Group, Technische Universität München, München, Germany	task-acoustic-scene-classification-results#Amiriparian2017a	59.1 (56.7 - 61.5)
	Biho_Sogang_task1_1	Biho Kim	Sogang university, Seoul, Korea	task-acoustic-scene-classification-results#Kim2017	56.5 (54.1 - 59.0)
	Biho_Sogang_task1_2	Biho Kim	Sogang university, Seoul, Korea	task-acoustic-scene-classification-results#Kim2017	60.5 (58.1 - 62.9)
	Bisot_TPT_task1_1	Victor Bisot	Image Data and Signal, Telecom ParisTech, Paris, France	task-acoustic-scene-classification-results#Bisot2017	69.8 (67.6 - 72.1)
	Bisot_TPT_task1_2	Victor Bisot	Image Data and Signal, Telecom ParisTech, Paris, France	task-acoustic-scene-classification-results#Bisot2017	69.6 (67.3 - 71.8)
	Chandrasekhar_IIITH_task1_1	Paseddula Chandrasekhar	Speech Processing Lab, International Institute of Information Technology, Hyderabad, Hyderabad, India	task-acoustic-scene-classification-results#Chandrasekhar2017	45.9 (43.4 - 48.3)
	Chou_SINICA_task1_1	Szu-Yu Chou	Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan; Research Center for IT innovation, Academia Sinica, Taipei, Taiwan	task-acoustic-scene-classification-results#Chou2017	57.1 (54.7 - 59.5)
	Chou_SINICA_task1_2	Szu-Yu Chou	Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan; Research Center for IT innovation, Academia Sinica, Taipei, Taiwan	task-acoustic-scene-classification-results#Chou2017	61.5 (59.2 - 63.9)
	Chou_SINICA_task1_3	Szu-Yu Chou	Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan; Research Center for IT innovation, Academia Sinica, Taipei, Taiwan	task-acoustic-scene-classification-results#Chou2017	59.8 (57.4 - 62.1)
	Chou_SINICA_task1_4	Szu-Yu Chou	Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan; Research Center for IT innovation, Academia Sinica, Taipei, Taiwan	task-acoustic-scene-classification-results#Chou2017	57.1 (54.7 - 59.5)
	Dang_NCU_task1_1	Jia-Ching Wang	Computer Sciene and Information Engineering, National Central University, Taoyuan, Taiwan	task-acoustic-scene-classification-results#Dang2017	62.7 (60.4 - 65.1)
	Dang_NCU_task1_2	Jia-Ching Wang	Computer Sciene and Information Engineering, National Central University, Taoyuan, Taiwan	task-acoustic-scene-classification-results#Dang2017	62.7 (60.4 - 65.1)
	Dang_NCU_task1_3	Jia-Ching Wang	Computer Sciene and Information Engineering, National Central University, Taoyuan, Taiwan	task-acoustic-scene-classification-results#Dang2017	63.7 (61.4 - 66.0)
	Duppada_Seernet_task1_1	Venkatesh Duppada	Data Science, Seernet Technologies, LLC, Mumbai, India	task-acoustic-scene-classification-results#Duppada2017	57.0 (54.6 - 59.4)
	Duppada_Seernet_task1_2	Venkatesh Duppada	Data Science, Seernet Technologies, LLC, Mumbai, India	task-acoustic-scene-classification-results#Duppada2017	59.9 (57.5 - 62.3)
	Duppada_Seernet_task1_3	Venkatesh Duppada	Data Science, Seernet Technologies, LLC, Mumbai, India	task-acoustic-scene-classification-results#Duppada2017	64.1 (61.7 - 66.4)
	Duppada_Seernet_task1_4	Venkatesh Duppada	Data Science, Seernet Technologies, LLC, Mumbai, India	task-acoustic-scene-classification-results#Duppada2017	63.0 (60.7 - 65.4)
	Foleiss_UTFPR_task1_1	Juliano Foleiss	Computing Department, Universidade Tecnologica Federal do Parana, Campo Mourao, Brazil	task-acoustic-scene-classification-results#Foleiss2017	64.5 (62.2 - 66.8)
	Foleiss_UTFPR_task1_2	Juliano Foleiss	Computing Department, Universidade Tecnologica Federal do Parana, Campo Mourao, Brazil	task-acoustic-scene-classification-results#Foleiss2017	66.9 (64.6 - 69.2)
	Fonseca_MTG_task1_1	Eduardo Fonseca	Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain	task-acoustic-scene-classification-results#Fonseca2017	67.3 (65.1 - 69.6)
	Fraile_UPM_task1_1	Ruben Fraile	Group on Acoustics and Multimedia Applicationa, Universidad Politecnica de Madrid, Madrid, Spain	task-acoustic-scene-classification-results#Fraile2017	58.3 (55.9 - 60.7)
	Gong_MTG_task1_1	Rong Gong	Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain	task-acoustic-scene-classification-results#Gong2017	61.2 (58.8 - 63.5)
	Gong_MTG_task1_2	Rong Gong	Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain	task-acoustic-scene-classification-results#Gong2017	61.5 (59.1 - 63.9)
	Gong_MTG_task1_3	Rong Gong	Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain	task-acoustic-scene-classification-results#Gong2017	61.9 (59.5 - 64.2)
	Han_COCAI_task1_1	Yoonchang Han	Cochlear.ai, Seoul, Korea	task-acoustic-scene-classification-results#Han2017	79.9 (78.0 - 81.9)
	Han_COCAI_task1_2	Yoonchang Han	Cochlear.ai, Seoul, Korea	task-acoustic-scene-classification-results#Han2017	79.6 (77.7 - 81.6)
	Han_COCAI_task1_3	Yoonchang Han	Cochlear.ai, Seoul, Korea	task-acoustic-scene-classification-results#Han2017	80.4 (78.4 - 82.3)
	Han_COCAI_task1_4	Yoonchang Han	Cochlear.ai, Seoul, Korea	task-acoustic-scene-classification-results#Han2017	80.3 (78.4 - 82.2)
	Hasan_BUET_task1_1	Taufiq Hasan	Department of Biomedical Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh	task-acoustic-scene-classification-results#Hyder2017	74.1 (72.0 - 76.3)
	Hasan_BUET_task1_2	Taufiq Hasan	Department of Biomedical Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh	task-acoustic-scene-classification-results#Hyder2017	72.2 (70.0 - 74.3)
	Hasan_BUET_task1_3	Taufiq Hasan	Department of Biomedical Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh	task-acoustic-scene-classification-results#Hyder2017	68.6 (66.3 - 70.8)
	Hasan_BUET_task1_4	Taufiq Hasan	Department of Biomedical Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh	task-acoustic-scene-classification-results#Hyder2017	72.0 (69.8 - 74.2)
	DCASE2017 baseline	Toni Heittola	Laboratory of Signal Processing, Tampere University of Technology, Tampere, Finland	task-acoustic-scene-classification-results#Heittola2017	61.0 (58.7 - 63.4)
	Huang_THU_task1_1	Taoan Huang	Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China	task-acoustic-scene-classification-results#Huang2017	65.5 (63.2 - 67.8)
	Huang_THU_task1_2	Taoan Huang	Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China	task-acoustic-scene-classification-results#Huang2017	65.4 (63.1 - 67.7)
	Hussain_NUCES_task1_1	Khalid Hussain	Department of electrical engineering, National University of computer and emerging sciences, Pakistan	task-acoustic-scene-classification-results#Hussain2017	56.7 (54.3 - 59.1)
	Hussain_NUCES_task1_2	Khalid Hussain	Department of electrical engineering, National University of computer and emerging sciences, Pakistan	task-acoustic-scene-classification-results#Hussain2017	59.5 (57.1 - 61.9)
	Hussain_NUCES_task1_3	Khalid Hussain	Department of electrical engineering, National University of computer and emerging sciences, Pakistan	task-acoustic-scene-classification-results#Hussain2017	59.9 (57.5 - 62.3)
	Hussain_NUCES_task1_4	Khalid Hussain	Department of electrical engineering, National University of computer and emerging sciences, Pakistan	task-acoustic-scene-classification-results#Hussain2017	55.4 (52.9 - 57.8)
	Jallet_TUT_task1_1	Hugo Jallet	Laboratory of Signal Processing, Tampere University of Technology, Tampere, Finland	task-acoustic-scene-classification-results#Jallet2017	60.7 (58.4 - 63.1)
	Jallet_TUT_task1_2	Hugo Jallet	Laboratory of Signal Processing, Tampere University of Technology, Tampere, Finland	task-acoustic-scene-classification-results#Jallet2017	61.2 (58.8 - 63.5)
	Jimenez_CMU_task1_1	Abelino Jimenez	Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, USA	task-acoustic-scene-classification-results#Jimenez2017	59.9 (57.6 - 62.3)
	Kukanov_UEF_task1_1	Ivan Kukanov	School of Computing, University of Eastern Finland, Joensuu, Finland; Institute for Infocomm Research, A*Star, Singapore	task-acoustic-scene-classification-results#Kukanov2017	71.7 (69.5 - 73.9)
	Kun_TUM_UAU_UP_task1_1	Qian Kun	MISP group, Technische Universität München, Munich, Germany; Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany; Chair of Complex and Intelligent Systems, University of Passau, Passau, Germany	task-acoustic-scene-classification-results#Kun2017	64.2 (61.9 - 66.5)
	Kun_TUM_UAU_UP_task1_2	Qian Kun	MISP group, Technische Universität München, Munich, Germany; Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany; Chair of Complex and Intelligent Systems, University of Passau, Passau, Germany	task-acoustic-scene-classification-results#Kun2017	64.0 (61.7 - 66.3)
	Lehner_JKU_task1_1	Bernhard Lehner	Department of Computational Perception, Johannes Kepler University, Linz, Austria	task-acoustic-scene-classification-results#Lehner2017	68.7 (66.4 - 71.0)
	Lehner_JKU_task1_2	Bernhard Lehner	Department of Computational Perception, Johannes Kepler University, Linz, Austria	task-acoustic-scene-classification-results#Lehner2017	66.8 (64.5 - 69.1)
	Lehner_JKU_task1_3	Bernhard Lehner	Department of Computational Perception, Johannes Kepler University, Linz, Austria	task-acoustic-scene-classification-results#Lehner2017	64.8 (62.5 - 67.1)
	Lehner_JKU_task1_4	Bernhard Lehner	Department of Computational Perception, Johannes Kepler University, Linz, Austria	task-acoustic-scene-classification-results#Lehner2017	73.8 (71.7 - 76.0)
	Li_SCUT_task1_1	Yanxiong Li	School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China	task-acoustic-scene-classification-results#Li2017	53.7 (51.3 - 56.1)
	Li_SCUT_task1_2	Yanxiong Li	School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China	task-acoustic-scene-classification-results#Li2017	63.6 (61.3 - 66.0)
	Li_SCUT_task1_3	Yanxiong Li	School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China	task-acoustic-scene-classification-results#Li2017	61.7 (59.4 - 64.1)
	Li_SCUT_task1_4	Yanxiong Li	School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China	task-acoustic-scene-classification-results#Li2017	57.8 (55.4 - 60.2)
	Maka_ZUT_task1_1	Tomasz Maka	Faculty of Computer Science and Information Technology, West Pomeranian University of Technology, Szczecin, Szczecin, Poland	task-acoustic-scene-classification-results#Maka2017	47.5 (45.1 - 50.0)
	Mun_KU_task1_1	Seongkyu Mun	Intelligent Signal Processing Laboratory, Korea University, Seoul, South Korea	task-acoustic-scene-classification-results#Mun2017	83.3 (81.5 - 85.1)
	Park_ISPL_task1_1	Hanseok Ko	School of Electrical Engineering, Korea University, Seoul, Republic of Korea	task-acoustic-scene-classification-results#Park2017	72.6 (70.4 - 74.8)
	Phan_UniLuebeck_task1_1	Huy Phan	Institute for Signal Processing, University of Luebeck, Luebeck, Germany	task-acoustic-scene-classification-results#Phan2017	59.0 (56.6 - 61.4)
	Phan_UniLuebeck_task1_2	Huy Phan	Institute for Signal Processing, University of Luebeck, Luebeck, Germany	task-acoustic-scene-classification-results#Phan2017	55.9 (53.5 - 58.3)
	Phan_UniLuebeck_task1_3	Huy Phan	Institute for Signal Processing, University of Luebeck, Luebeck, Germany	task-acoustic-scene-classification-results#Phan2017	58.3 (55.9 - 60.7)
	Phan_UniLuebeck_task1_4	Huy Phan	Institute for Signal Processing, University of Luebeck, Luebeck, Germany	task-acoustic-scene-classification-results#Phan2017	58.0 (55.6 - 60.4)
	Piczak_WUT_task1_1	Karol Piczak	Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland	task-acoustic-scene-classification-results#Piczak2017	70.6 (68.4 - 72.8)
	Piczak_WUT_task1_2	Karol Piczak	Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland	task-acoustic-scene-classification-results#Piczak2017	69.6 (67.3 - 71.8)
	Piczak_WUT_task1_3	Karol Piczak	Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland	task-acoustic-scene-classification-results#Piczak2017	67.7 (65.4 - 69.9)
	Piczak_WUT_task1_4	Karol Piczak	Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland	task-acoustic-scene-classification-results#Piczak2017	62.0 (59.6 - 64.3)
	Rakotomamonjy_UROUEN_task1_1	Alain Rakotomamonjy	LITIS EA4108, Université de Rouen, Saint Etienne du Rouvray, France	task-acoustic-scene-classification-results#Rakotomamonjy2017	61.5 (59.2 - 63.9)
	Rakotomamonjy_UROUEN_task1_2	Alain Rakotomamonjy	LITIS EA4108, Université de Rouen, Saint Etienne du Rouvray, France	task-acoustic-scene-classification-results#Rakotomamonjy2017	62.7 (60.3 - 65.0)
	Rakotomamonjy_UROUEN_task1_3	Alain Rakotomamonjy	LITIS EA4108, Université de Rouen, Saint Etienne du Rouvray, France	task-acoustic-scene-classification-results#Rakotomamonjy2017	62.8 (60.4 - 65.1)
	Schindler_AIT_task1_1	Alexander Schindler	Center for Digital Safety and Security, Austrian Institute of Technology, Vienna, Austria	task-acoustic-scene-classification-results#Schindler2017	61.7 (59.4 - 64.1)
	Schindler_AIT_task1_2	Alexander Schindler	Center for Digital Safety and Security, Austrian Institute of Technology, Vienna, Austria	task-acoustic-scene-classification-results#Schindler2017	61.7 (59.4 - 64.1)
	Vafeiadis_CERTH_task1_1	Anastasios Vafeiadis	Information Technologies Institute, Center for Research & Technology Hellas, Thessaloniki, Greece	task-acoustic-scene-classification-results#Vafeiadis2017	61.0 (58.6 - 63.4)
	Vafeiadis_CERTH_task1_2	Anastasios Vafeiadis	Information Technologies Institute, Center for Research & Technology Hellas, Thessaloniki, Greece	task-acoustic-scene-classification-results#Vafeiadis2017	49.5 (47.1 - 51.9)
	Vij_UIET_task1_1	Dinesh Vij	Computer Science and Engineering, University Institute of Engineering and Technology, Panjab University, Chandigarh, India	task-acoustic-scene-classification-results#Vij2017	61.2 (58.9 - 63.6)
	Vij_UIET_task1_2	Dinesh Vij	Computer Science and Engineering, University Institute of Engineering and Technology, Panjab University, Chandigarh, India	task-acoustic-scene-classification-results#Vij2017	57.5 (55.1 - 59.9)
	Vij_UIET_task1_3	Dinesh Vij	Computer Science and Engineering, University Institute of Engineering and Technology, Panjab University, Chandigarh, India	task-acoustic-scene-classification-results#Vij2017	59.6 (57.2 - 62.0)
	Vij_UIET_task1_4	Dinesh Vij	Computer Science and Engineering, University Institute of Engineering and Technology, Panjab University, Chandigarh, India	task-acoustic-scene-classification-results#Vij2017	65.0 (62.7 - 67.3)
	Waldekar_IITKGP_task1_1	Shefali Waldekar	Electronics and Electrical Communication Engineering Dept., Indian Institute of Technology Kharagpur, Kharagpur, India	task-acoustic-scene-classification-results#Waldekar2017	67.0 (64.7 - 69.3)
	Waldekar_IITKGP_task1_2	Shefali Waldekar	Electronics and Electrical Communication Engineering Dept., Indian Institute of Technology Kharagpur, Kharagpur, India	task-acoustic-scene-classification-results#Waldekar2017	64.9 (62.6 - 67.2)
	Xing_SCNU_task1_1	Xing Xiaotao	School of Computer, South China Normal University, Guangzhou, China	task-acoustic-scene-classification-results#Weiping2017	74.8 (72.6 - 76.9)
	Xing_SCNU_task1_2	Xing Xiaotao	School of Computer, South China Normal University, Guangzhou, China	task-acoustic-scene-classification-results#Weiping2017	77.7 (75.7 - 79.7)
	Xu_NUDT_task1_1	Jinwei Xu	Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China	task-acoustic-scene-classification-results#Xu2017	68.5 (66.2 - 70.7)
	Xu_NUDT_task1_2	Jinwei Xu	Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China	task-acoustic-scene-classification-results#Xu2017	67.5 (65.3 - 69.8)
	Xu_PKU_task1_1	Xiaoshuo Xu	Institute of Computer Science and Technology, Peking University, Beijing, China	task-acoustic-scene-classification-results#Xu2017a	65.9 (63.6 - 68.2)
	Xu_PKU_task1_2	Xiaoshuo Xu	Institute of Computer Science and Technology, Peking University, Beijing, China	task-acoustic-scene-classification-results#Xu2017a	66.7 (64.4 - 69.0)
	Xu_PKU_task1_3	Xiaoshuo Xu	Institute of Computer Science and Technology, Peking University, Beijing, China	task-acoustic-scene-classification-results#Xu2017a	64.6 (62.3 - 67.0)
	Yang_WHU_TASK1_1	Yuhong Yang	National Engineering Research Center for Multimedia Software, Wuhan University, Hubei, China; Collaborative Innovation Center of Geospatial Technology, Wuhan, China	task-acoustic-scene-classification-results#Lu2017	61.5 (59.2 - 63.9)
	Yang_WHU_TASK1_2	Yuhong Yang	National Engineering Research Center for Multimedia Software, Wuhan University, Hubei, China; Collaborative Innovation Center of Geospatial Technology, Wuhan, China	task-acoustic-scene-classification-results#Lu2017	65.2 (62.9 - 67.6)
	Yang_WHU_TASK1_3	Yuhong Yang	National Engineering Research Center for Multimedia Software, Wuhan University, Hubei, China; Collaborative Innovation Center of Geospatial Technology, Wuhan, China	task-acoustic-scene-classification-results#Lu2017	62.8 (60.5 - 65.2)
	Yang_WHU_TASK1_4	Yuhong Yang	National Engineering Research Center for Multimedia Software, Wuhan University, Hubei, China; Collaborative Innovation Center of Geospatial Technology, Wuhan, China	task-acoustic-scene-classification-results#Lu2017	63.6 (61.3 - 66.0)
	Yu_UOS_task1_1	Yu Ha-Jin	School of Computer Science, University of Seoul, Seoul, Republic of South Korea	task-acoustic-scene-classification-results#Jee-Weon2017	67.0 (64.7 - 69.3)
	Yu_UOS_task1_2	Yu Ha-Jin	School of Computer Science, University of Seoul, Seoul, Republic of South Korea	task-acoustic-scene-classification-results#Jee-Weon2017	66.2 (63.9 - 68.5)
	Yu_UOS_task1_3	Yu Ha-Jin	School of Computer Science, University of Seoul, Seoul, Republic of South Korea	task-acoustic-scene-classification-results#Jee-Weon2017	67.3 (65.1 - 69.6)
	Yu_UOS_task1_4	Yu Ha-Jin	School of Computer Science, University of Seoul, Seoul, Republic of South Korea	task-acoustic-scene-classification-results#Jee-Weon2017	70.6 (68.3 - 72.8)
	Zhao_ADSC_task1_1	Shengkui Zhao	Illinois at Singapore, Advanced Digital Sciences Center, Singapore	task-acoustic-scene-classification-results#Zhao2017	70.0 (67.8 - 72.2)
	Zhao_ADSC_task1_2	Shengkui Zhao	Illinois at Singapore, Advanced Digital Sciences Center, Singapore	task-acoustic-scene-classification-results#Zhao2017	67.9 (65.6 - 70.2)
	Zhao_UAU_UP_task1_1	Ren Zhao	Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany; Chair of Complex and Intelligent Systems, University of Passau, Passau, Germany	task-acoustic-scene-classification-results#Zhao2017a	63.8 (61.5 - 66.2)

Complete results and technical reports can be found here.

Baseline system

The baseline system for the task is provided. The system is meant to implement a basic approach for acoustic scene classification, and provide some comparison point for the participants while developing their systems. The baseline systems for all tasks share the code base, implementing quite similar approach for all tasks. The baseline system will download the needed datasets and produces the results below when ran with the default parameters.

The baseline system is based on a multilayer perceptron architecture using log mel-band energies as features. A 5-frame context is used, resulting in a feature vector length of 200. Using these features, a neural network containing two dense layers of 50 hidden units per layer and 20% dropout is trained for 200 epochs. Classification decision is based on the network output layer which is of softmax type. A detailed description is available in the baseline system documentation. The baseline system includes evaluation of results using accuracy as metric.

The baseline system is implemented using Python (version 2.7 and 3.6). Participants are allowed to build their system on top of the given baseline system. The system has all needed functionality for dataset handling, storing / accessing features and models, and evaluating the results, making the adaptation for one's needs rather easy. The baseline system is also a good starting point for entry level researchers.

Python implementation

DCASE2017 Baseline, repository

Results for TUT Acoustic scenes 2017, development dataset

Evaluation setup

4-fold cross-validation, average classification accuracy over folds
15 acoustic scene classes
Classification unit: one file (10 seconds of audio).
Python 2.7.13 used

System parameters

Frame size: 40 ms (with 50% hop size)
Feature vector: 40 log mel-band energies in 5 consecutive frames = 200 values
MLP: 2 layers x 50 hidden units, 20% dropout, 200 epochs (using early stopping criteria, monitoring started after 100 epoch, 10 epoch patience), learning rate 0.001, softmax output layer
Trained and tested on full audio

Acoustic scene classification results, averaged over evaluation folds.
Acoustic scene	Accuracy
Beach	75.3 %
Bus	71.8 %
Cafe / Restaurant	57.7 %
Car	97.1 %
City center	90.7 %
Forest path	79.5 %
Grocery store	58.7 %
Home	68.6 %
Library	57.1 %
Metro station	91.7 %
Office	99.7 %
Park	70.2 %
Residential area	64.1 %
Train	58.0 %
Tram	81.7 %
Overall accuracy	74.8 %

Citation

If you are using the dataset or baseline code, or want to refer challenge task please cite the following paper:

Publication

A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Raj, and T. Virtanen. DCASE 2017 challenge setup: tasks, datasets and baseline system. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), 85–92. November 2017.

PDF

DCASE 2017 Challenge Setup: Tasks, Datasets and Baseline System

Abstract

DCASE 2017 Challenge consists of four tasks: acoustic scene classification, detection of rare sound events, sound event detection in real-life audio, and large-scale weakly supervised sound event detection for smart cars. This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset. The baseline systems for all tasks rely on the same implementation using multilayer perceptron and log mel-energies, but differ in the structure of the output layer and the decision making process, as well as the evaluation of system output using task specific metrics.

Keywords

Sound scene analysis, Acoustic scene classification, Sound event detection, Audio tagging, Rare sound events

PDF

When citing challenge results please cite the following paper:

Publication

Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. Acoustic scene classification: an overview of DCASE 2017 challenge entries. In 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), 411–415. September 2018. doi:10.1109/IWAENC.2018.8521242.

Acoustic Scene Classification: An Overview of DCASE 2017 Challenge Entries

Abstract

We present an overview of the challenge entries for the Acoustic Scene Classification task of DCASE 2017 Challenge. Being the most popular task of the challenge, acoustic scene classification entries provide a wide variety of approaches for comparison, with a wide performance gap from top to bottom. Analysis of the submissions confirms once more the popularity of deep-learning approaches and mel-frequency representations. Statistical analysis indicates that the top ranked system performed significantly better than the others, and that combinations of top systems are capable of reaching close to perfect performance on the given data.

Keywords

acoustic scene classification, audio classification, DCASE challenge

	Annamaria Mesaros Tampere University of Technology
	Toni Heittola Tampere University of Technology

Coordinators

Content

Description

Audio dataset

Recording and annotation procedure

Download

Task setup

TUT Database for Acoustic Scene Classification and Sound Event Detection

Abstract

Development dataset

Evaluation dataset

Submission

Task rules

Evaluation

Results

Baseline system

Python implementation

Results for TUT Acoustic scenes 2017, development dataset

Citation

DCASE 2017 Challenge Setup: Tasks, Datasets and Baseline System

Abstract

Keywords

Acoustic Scene Classification: An Overview of DCASE 2017 Challenge Entries

Abstract

Keywords