Acoustic
scene classification


Task description

Challenge has ended. Full results for this task can be found here

Description

The goal of acoustic scene classification is to classify a test recording into one of the provided predefined classes that characterizes the environment in which it was recorded — for example "park", "home", "office".

Figure 1: Overview of acoustic scene classification system.

Audio dataset

TUT Acoustic Scenes 2017 dataset will be used as development data for the task. The dataset consists of recordings from various acoustic scenes, all having distinct recording locations. For each recording location, 3-5 minute long audio recording was captured. The original recordings were then split into segments with a length of 10 seconds. These audio segments are provided in individual files.

Acoustic scenes for the task (15):

  • Bus - traveling by bus in the city (vehicle)
  • Cafe / Restaurant - small cafe/restaurant (indoor)
  • Car - driving or traveling as a passenger, in the city (vehicle)
  • City center (outdoor)
  • Forest path (outdoor)
  • Grocery store - medium size grocery store (indoor)
  • Home (indoor)
  • Lakeside beach (outdoor)
  • Library (indoor)
  • Metro station (indoor)
  • Office - multiple persons, typical work day (indoor)
  • Residential area (outdoor)
  • Train (traveling, vehicle)
  • Tram (traveling, vehicle)
  • Urban park (outdoor)

Detailed description of acoustic scenes included in the dataset can be found DCASE2016 Task1 page.

The dataset was collected in Finland by Tampere University of Technology between 06/2015 - 01/2017. The data collection has received funding from the European Research Council.

ERC

Recording and annotation procedure

For all acoustic scenes, the recordings were captured each in a different location: different streets, different parks, different homes. Recordings were made using a Soundman OKM II Klassik/studio A3, electret binaural microphone and a Roland Edirol R-09 wave recorder using 44.1 kHz sampling rate and 24 bit resolution. The microphones are specifically made to look like headphones, being worn in the ears. As an effect of this, the recorded audio is very similar to the sound that reaches the human auditory system of the person wearing the equipment.

Postprocessing of the recorded data involves aspects related to privacy of recorded individuals. For audio material recorded in private places, written consent was obtained from all people involved. Material recorded in public places does not require such consent, but was screened for content, and privacy infringing segments were eliminated. Microphone failure and audio distortions were annotated, and the annotations are provided with the data. Based on experiments in DCASE 2016, eliminating the error regions in training does not influence the final classification accuracy. The evaluation set does not contain any such audio errors.

Download

In case you are using the provided baseline system, there is no need to download the dataset as the system will automatically download needed dataset for you.

** Development dataset **


** Evaluation dataset **


Task setup

TUT Acoustic Scenes 2017 dataset consist of two subsets: development dataset and evaluation dataset. The development dataset consists of the complete TUT Acoustic Scenes 2016 dataset (both development and evaluation data of the 2016 challenge). The partitioning of the data into subsets was done based on the location of the original recordings, so the evaluation dataset contains recordings of similar audio scenes but from different geographical locations. All segments obtained from the same original recording were included into a single subset - either development dataset or evaluation dataset. For each acoustic scene, there are 312 segments (52 minutes of audio) in the development dataset.

A detailed description of the data recording and annotation procedure is available in:

Publication

Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. TUT database for acoustic scene classification and sound event detection. In 24th European Signal Processing Conference 2016 (EUSIPCO 2016). Budapest, Hungary, 2016.

PDF

TUT Database for Acoustic Scene Classification and Sound Event Detection

Abstract

We introduce TUT Acoustic Scenes 2016 database for environmental sound research, consisting ofbinaural recordings from 15 different acoustic environments. A subset of this database, called TUT Sound Events 2016, contains annotations for individual sound events, specifically created for sound event detection. TUT Sound Events 2016 consists of residential area and home environments, and is manually annotated to mark onset, offset and label of sound events. In this paper we present the recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models. The database is publicly released to provide support for algorithm development and common ground for comparison of different techniques.

PDF

Development dataset

A cross-validation setup is provided for the development dataset in order to make results reported with this dataset uniform. The setup consists of four folds distributing the available segments based on location. The folds are provided with the dataset in the directory evaluation setup.

Fold 1 of the provided setup reproduces the DCASE 2016 challenge setup, by using the 2016 development set as training subset and the 2016 evaluation set as test subset.

Important: If you are not using the provided cross-validation setup, pay attention to the segments extracted from same original recordings. Make sure that for each given fold, ALL segments from same location must be either in the training subset OR in the test subset.

Evaluation dataset

Evaluation dataset without ground truth will be released one month before the submission deadline. Full ground truth meta data for it will be published after the DCASE 2017 challenge and workshop are concluded.

Submission

Detailed information for the challenge submission can found on the submission page.

System output should be presented as a single text-file (in CSV format) containing classification result for each audio file in the evaluation set. Result items can be in any order. Format:

[filename (string)][tab][scene label (string)]

Multiple system outputs can be submitted (maximum 4 per participant). If submitting multiple systems, the individual text-files should be packaged into a zip file for submission. Please carefully mark the connection between the submitted files and the corresponding system or system parameters (for example by naming the text file appropriately).

Task rules

These are the general rules valid for all tasks. The same rules and additional information on technical report and submission requirements can be found here. Task specific rules are highlighted with green.

  • Participants are not allowed to use external data for system development. Data from another task is considered external data.
  • Manipulation of provided training and development data is allowed.

    The development dataset can be augmented without use of external data (e.g. by mixing data sampled from a pdf or using techniques such as pitch shifting or time stretching).

  • Participants are not allowed to make subjective judgments of the evaluation data, nor to annotate it. The evaluation dataset cannot be used to train the submitted system; the use of statistics about the evaluation data in the decision making is also forbidden.

Evaluation

The scoring of acoustic scene classification will be based on classification accuracy: the number of correctly classified segments among the total number of segments. Each segment is considered an independent test sample.

The evaluation is done automatically in the baseline system. Evaluation is done using sed_eval toolbox.


Results

Rank Submission Information
Code Author Affiliation Technical
Report
Accuracy
with 95%
confidence interval
Abrol_IITM_task1_1 Vinayak Abrol Multimedia Analytics and Systems Lab, SCEE, Indian Institute of Technology Mandi, Mandi, India task-acoustic-scene-classification-results#Abrol2017 65.7 (63.4 - 68.0)
Amiriparian_AU_task1_1 Shahin Amiriparian Chair of Complex & Intelligent Systems, Universität Passau, Passau, Germany; Chair of Embedded Intelligence for Health Care, Augsburg University, Augsburg, Germany; Machine Intelligence & Signal Processing Group, Technische Universität München, München, Germany task-acoustic-scene-classification-results#Amiriparian2017 67.5 (65.3 - 69.8)
Amiriparian_AU_task1_2 Shahin Amiriparian Chair of Complex & Intelligent Systems, Universität Passau, Passau, Germany; Chair of Embedded Intelligence for Health Care, Augsburg University, Augsburg, Germany; Machine Intelligence & Signal Processing Group, Technische Universität München, München, Germany task-acoustic-scene-classification-results#Amiriparian2017a 59.1 (56.7 - 61.5)
Biho_Sogang_task1_1 Biho Kim Sogang university, Seoul, Korea task-acoustic-scene-classification-results#Kim2017 56.5 (54.1 - 59.0)
Biho_Sogang_task1_2 Biho Kim Sogang university, Seoul, Korea task-acoustic-scene-classification-results#Kim2017 60.5 (58.1 - 62.9)
Bisot_TPT_task1_1 Victor Bisot Image Data and Signal, Telecom ParisTech, Paris, France task-acoustic-scene-classification-results#Bisot2017 69.8 (67.6 - 72.1)
Bisot_TPT_task1_2 Victor Bisot Image Data and Signal, Telecom ParisTech, Paris, France task-acoustic-scene-classification-results#Bisot2017 69.6 (67.3 - 71.8)
Chandrasekhar_IIITH_task1_1 Paseddula Chandrasekhar Speech Processing Lab, International Institute of Information Technology, Hyderabad, Hyderabad, India task-acoustic-scene-classification-results#Chandrasekhar2017 45.9 (43.4 - 48.3)
Chou_SINICA_task1_1 Szu-Yu Chou Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan; Research Center for IT innovation, Academia Sinica, Taipei, Taiwan task-acoustic-scene-classification-results#Chou2017 57.1 (54.7 - 59.5)
Chou_SINICA_task1_2 Szu-Yu Chou Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan; Research Center for IT innovation, Academia Sinica, Taipei, Taiwan task-acoustic-scene-classification-results#Chou2017 61.5 (59.2 - 63.9)
Chou_SINICA_task1_3 Szu-Yu Chou Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan; Research Center for IT innovation, Academia Sinica, Taipei, Taiwan task-acoustic-scene-classification-results#Chou2017 59.8 (57.4 - 62.1)
Chou_SINICA_task1_4 Szu-Yu Chou Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan; Research Center for IT innovation, Academia Sinica, Taipei, Taiwan task-acoustic-scene-classification-results#Chou2017 57.1 (54.7 - 59.5)
Dang_NCU_task1_1 Jia-Ching Wang Computer Sciene and Information Engineering, National Central University, Taoyuan, Taiwan task-acoustic-scene-classification-results#Dang2017 62.7 (60.4 - 65.1)
Dang_NCU_task1_2 Jia-Ching Wang Computer Sciene and Information Engineering, National Central University, Taoyuan, Taiwan task-acoustic-scene-classification-results#Dang2017 62.7 (60.4 - 65.1)
Dang_NCU_task1_3 Jia-Ching Wang Computer Sciene and Information Engineering, National Central University, Taoyuan, Taiwan task-acoustic-scene-classification-results#Dang2017 63.7 (61.4 - 66.0)
Duppada_Seernet_task1_1 Venkatesh Duppada Data Science, Seernet Technologies, LLC, Mumbai, India task-acoustic-scene-classification-results#Duppada2017 57.0 (54.6 - 59.4)
Duppada_Seernet_task1_2 Venkatesh Duppada Data Science, Seernet Technologies, LLC, Mumbai, India task-acoustic-scene-classification-results#Duppada2017 59.9 (57.5 - 62.3)
Duppada_Seernet_task1_3 Venkatesh Duppada Data Science, Seernet Technologies, LLC, Mumbai, India task-acoustic-scene-classification-results#Duppada2017 64.1 (61.7 - 66.4)
Duppada_Seernet_task1_4 Venkatesh Duppada Data Science, Seernet Technologies, LLC, Mumbai, India task-acoustic-scene-classification-results#Duppada2017 63.0 (60.7 - 65.4)
Foleiss_UTFPR_task1_1 Juliano Foleiss Computing Department, Universidade Tecnologica Federal do Parana, Campo Mourao, Brazil task-acoustic-scene-classification-results#Foleiss2017 64.5 (62.2 - 66.8)
Foleiss_UTFPR_task1_2 Juliano Foleiss Computing Department, Universidade Tecnologica Federal do Parana, Campo Mourao, Brazil task-acoustic-scene-classification-results#Foleiss2017 66.9 (64.6 - 69.2)
Fonseca_MTG_task1_1 Eduardo Fonseca Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain task-acoustic-scene-classification-results#Fonseca2017 67.3 (65.1 - 69.6)
Fraile_UPM_task1_1 Ruben Fraile Group on Acoustics and Multimedia Applicationa, Universidad Politecnica de Madrid, Madrid, Spain task-acoustic-scene-classification-results#Fraile2017 58.3 (55.9 - 60.7)
Gong_MTG_task1_1 Rong Gong Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain task-acoustic-scene-classification-results#Gong2017 61.2 (58.8 - 63.5)
Gong_MTG_task1_2 Rong Gong Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain task-acoustic-scene-classification-results#Gong2017 61.5 (59.1 - 63.9)
Gong_MTG_task1_3 Rong Gong Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain task-acoustic-scene-classification-results#Gong2017 61.9 (59.5 - 64.2)
Han_COCAI_task1_1 Yoonchang Han Cochlear.ai, Seoul, Korea task-acoustic-scene-classification-results#Han2017 79.9 (78.0 - 81.9)
Han_COCAI_task1_2 Yoonchang Han Cochlear.ai, Seoul, Korea task-acoustic-scene-classification-results#Han2017 79.6 (77.7 - 81.6)
Han_COCAI_task1_3 Yoonchang Han Cochlear.ai, Seoul, Korea task-acoustic-scene-classification-results#Han2017 80.4 (78.4 - 82.3)
Han_COCAI_task1_4 Yoonchang Han Cochlear.ai, Seoul, Korea task-acoustic-scene-classification-results#Han2017 80.3 (78.4 - 82.2)
Hasan_BUET_task1_1 Taufiq Hasan Department of Biomedical Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh task-acoustic-scene-classification-results#Hyder2017 74.1 (72.0 - 76.3)
Hasan_BUET_task1_2 Taufiq Hasan Department of Biomedical Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh task-acoustic-scene-classification-results#Hyder2017 72.2 (70.0 - 74.3)
Hasan_BUET_task1_3 Taufiq Hasan Department of Biomedical Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh task-acoustic-scene-classification-results#Hyder2017 68.6 (66.3 - 70.8)
Hasan_BUET_task1_4 Taufiq Hasan Department of Biomedical Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh task-acoustic-scene-classification-results#Hyder2017 72.0 (69.8 - 74.2)
DCASE2017 baseline Toni Heittola Laboratory of Signal Processing, Tampere University of Technology, Tampere, Finland task-acoustic-scene-classification-results#Heittola2017 61.0 (58.7 - 63.4)
Huang_THU_task1_1 Taoan Huang Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China task-acoustic-scene-classification-results#Huang2017 65.5 (63.2 - 67.8)
Huang_THU_task1_2 Taoan Huang Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China task-acoustic-scene-classification-results#Huang2017 65.4 (63.1 - 67.7)
Hussain_NUCES_task1_1 Khalid Hussain Department of electrical engineering, National University of computer and emerging sciences, Pakistan task-acoustic-scene-classification-results#Hussain2017 56.7 (54.3 - 59.1)
Hussain_NUCES_task1_2 Khalid Hussain Department of electrical engineering, National University of computer and emerging sciences, Pakistan task-acoustic-scene-classification-results#Hussain2017 59.5 (57.1 - 61.9)
Hussain_NUCES_task1_3 Khalid Hussain Department of electrical engineering, National University of computer and emerging sciences, Pakistan task-acoustic-scene-classification-results#Hussain2017 59.9 (57.5 - 62.3)
Hussain_NUCES_task1_4 Khalid Hussain Department of electrical engineering, National University of computer and emerging sciences, Pakistan task-acoustic-scene-classification-results#Hussain2017 55.4 (52.9 - 57.8)
Jallet_TUT_task1_1 Hugo Jallet Laboratory of Signal Processing, Tampere University of Technology, Tampere, Finland task-acoustic-scene-classification-results#Jallet2017 60.7 (58.4 - 63.1)
Jallet_TUT_task1_2 Hugo Jallet Laboratory of Signal Processing, Tampere University of Technology, Tampere, Finland task-acoustic-scene-classification-results#Jallet2017 61.2 (58.8 - 63.5)
Jimenez_CMU_task1_1 Abelino Jimenez Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, USA task-acoustic-scene-classification-results#Jimenez2017 59.9 (57.6 - 62.3)
Kukanov_UEF_task1_1 Ivan Kukanov School of Computing, University of Eastern Finland, Joensuu, Finland; Institute for Infocomm Research, A*Star, Singapore task-acoustic-scene-classification-results#Kukanov2017 71.7 (69.5 - 73.9)
Kun_TUM_UAU_UP_task1_1 Qian Kun MISP group, Technische Universität München, Munich, Germany; Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany; Chair of Complex and Intelligent Systems, University of Passau, Passau, Germany task-acoustic-scene-classification-results#Kun2017 64.2 (61.9 - 66.5)
Kun_TUM_UAU_UP_task1_2 Qian Kun MISP group, Technische Universität München, Munich, Germany; Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany; Chair of Complex and Intelligent Systems, University of Passau, Passau, Germany task-acoustic-scene-classification-results#Kun2017 64.0 (61.7 - 66.3)
Lehner_JKU_task1_1 Bernhard Lehner Department of Computational Perception, Johannes Kepler University, Linz, Austria task-acoustic-scene-classification-results#Lehner2017 68.7 (66.4 - 71.0)
Lehner_JKU_task1_2 Bernhard Lehner Department of Computational Perception, Johannes Kepler University, Linz, Austria task-acoustic-scene-classification-results#Lehner2017 66.8 (64.5 - 69.1)
Lehner_JKU_task1_3 Bernhard Lehner Department of Computational Perception, Johannes Kepler University, Linz, Austria task-acoustic-scene-classification-results#Lehner2017 64.8 (62.5 - 67.1)
Lehner_JKU_task1_4 Bernhard Lehner Department of Computational Perception, Johannes Kepler University, Linz, Austria task-acoustic-scene-classification-results#Lehner2017 73.8 (71.7 - 76.0)
Li_SCUT_task1_1 Yanxiong Li School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China task-acoustic-scene-classification-results#Li2017 53.7 (51.3 - 56.1)
Li_SCUT_task1_2 Yanxiong Li School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China task-acoustic-scene-classification-results#Li2017 63.6 (61.3 - 66.0)
Li_SCUT_task1_3 Yanxiong Li School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China task-acoustic-scene-classification-results#Li2017 61.7 (59.4 - 64.1)
Li_SCUT_task1_4 Yanxiong Li School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China task-acoustic-scene-classification-results#Li2017 57.8 (55.4 - 60.2)
Maka_ZUT_task1_1 Tomasz Maka Faculty of Computer Science and Information Technology, West Pomeranian University of Technology, Szczecin, Szczecin, Poland task-acoustic-scene-classification-results#Maka2017 47.5 (45.1 - 50.0)
Mun_KU_task1_1 Seongkyu Mun Intelligent Signal Processing Laboratory, Korea University, Seoul, South Korea task-acoustic-scene-classification-results#Mun2017 83.3 (81.5 - 85.1)
Park_ISPL_task1_1 Hanseok Ko School of Electrical Engineering, Korea University, Seoul, Republic of Korea task-acoustic-scene-classification-results#Park2017 72.6 (70.4 - 74.8)
Phan_UniLuebeck_task1_1 Huy Phan Institute for Signal Processing, University of Luebeck, Luebeck, Germany task-acoustic-scene-classification-results#Phan2017 59.0 (56.6 - 61.4)
Phan_UniLuebeck_task1_2 Huy Phan Institute for Signal Processing, University of Luebeck, Luebeck, Germany task-acoustic-scene-classification-results#Phan2017 55.9 (53.5 - 58.3)
Phan_UniLuebeck_task1_3 Huy Phan Institute for Signal Processing, University of Luebeck, Luebeck, Germany task-acoustic-scene-classification-results#Phan2017 58.3 (55.9 - 60.7)
Phan_UniLuebeck_task1_4 Huy Phan Institute for Signal Processing, University of Luebeck, Luebeck, Germany task-acoustic-scene-classification-results#Phan2017 58.0 (55.6 - 60.4)
Piczak_WUT_task1_1 Karol Piczak Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland task-acoustic-scene-classification-results#Piczak2017 70.6 (68.4 - 72.8)
Piczak_WUT_task1_2 Karol Piczak Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland task-acoustic-scene-classification-results#Piczak2017 69.6 (67.3 - 71.8)
Piczak_WUT_task1_3 Karol Piczak Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland task-acoustic-scene-classification-results#Piczak2017 67.7 (65.4 - 69.9)
Piczak_WUT_task1_4 Karol Piczak Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland task-acoustic-scene-classification-results#Piczak2017 62.0 (59.6 - 64.3)
Rakotomamonjy_UROUEN_task1_1 Alain Rakotomamonjy LITIS EA4108, Université de Rouen, Saint Etienne du Rouvray, France task-acoustic-scene-classification-results#Rakotomamonjy2017 61.5 (59.2 - 63.9)
Rakotomamonjy_UROUEN_task1_2 Alain Rakotomamonjy LITIS EA4108, Université de Rouen, Saint Etienne du Rouvray, France task-acoustic-scene-classification-results#Rakotomamonjy2017 62.7 (60.3 - 65.0)
Rakotomamonjy_UROUEN_task1_3 Alain Rakotomamonjy LITIS EA4108, Université de Rouen, Saint Etienne du Rouvray, France task-acoustic-scene-classification-results#Rakotomamonjy2017 62.8 (60.4 - 65.1)
Schindler_AIT_task1_1 Alexander Schindler Center for Digital Safety and Security, Austrian Institute of Technology, Vienna, Austria task-acoustic-scene-classification-results#Schindler2017 61.7 (59.4 - 64.1)
Schindler_AIT_task1_2 Alexander Schindler Center for Digital Safety and Security, Austrian Institute of Technology, Vienna, Austria task-acoustic-scene-classification-results#Schindler2017 61.7 (59.4 - 64.1)
Vafeiadis_CERTH_task1_1 Anastasios Vafeiadis Information Technologies Institute, Center for Research & Technology Hellas, Thessaloniki, Greece task-acoustic-scene-classification-results#Vafeiadis2017 61.0 (58.6 - 63.4)
Vafeiadis_CERTH_task1_2 Anastasios Vafeiadis Information Technologies Institute, Center for Research & Technology Hellas, Thessaloniki, Greece task-acoustic-scene-classification-results#Vafeiadis2017 49.5 (47.1 - 51.9)
Vij_UIET_task1_1 Dinesh Vij Computer Science and Engineering, University Institute of Engineering and Technology, Panjab University, Chandigarh, India task-acoustic-scene-classification-results#Vij2017 61.2 (58.9 - 63.6)
Vij_UIET_task1_2 Dinesh Vij Computer Science and Engineering, University Institute of Engineering and Technology, Panjab University, Chandigarh, India task-acoustic-scene-classification-results#Vij2017 57.5 (55.1 - 59.9)
Vij_UIET_task1_3 Dinesh Vij Computer Science and Engineering, University Institute of Engineering and Technology, Panjab University, Chandigarh, India task-acoustic-scene-classification-results#Vij2017 59.6 (57.2 - 62.0)
Vij_UIET_task1_4 Dinesh Vij Computer Science and Engineering, University Institute of Engineering and Technology, Panjab University, Chandigarh, India task-acoustic-scene-classification-results#Vij2017 65.0 (62.7 - 67.3)
Waldekar_IITKGP_task1_1 Shefali Waldekar Electronics and Electrical Communication Engineering Dept., Indian Institute of Technology Kharagpur, Kharagpur, India task-acoustic-scene-classification-results#Waldekar2017 67.0 (64.7 - 69.3)
Waldekar_IITKGP_task1_2 Shefali Waldekar Electronics and Electrical Communication Engineering Dept., Indian Institute of Technology Kharagpur, Kharagpur, India task-acoustic-scene-classification-results#Waldekar2017 64.9 (62.6 - 67.2)
Xing_SCNU_task1_1 Xing Xiaotao School of Computer, South China Normal University, Guangzhou, China task-acoustic-scene-classification-results#Weiping2017 74.8 (72.6 - 76.9)
Xing_SCNU_task1_2 Xing Xiaotao School of Computer, South China Normal University, Guangzhou, China task-acoustic-scene-classification-results#Weiping2017 77.7 (75.7 - 79.7)
Xu_NUDT_task1_1 Jinwei Xu Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China task-acoustic-scene-classification-results#Xu2017 68.5 (66.2 - 70.7)
Xu_NUDT_task1_2 Jinwei Xu Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China task-acoustic-scene-classification-results#Xu2017 67.5 (65.3 - 69.8)
Xu_PKU_task1_1 Xiaoshuo Xu Institute of Computer Science and Technology, Peking University, Beijing, China task-acoustic-scene-classification-results#Xu2017a 65.9 (63.6 - 68.2)
Xu_PKU_task1_2 Xiaoshuo Xu Institute of Computer Science and Technology, Peking University, Beijing, China task-acoustic-scene-classification-results#Xu2017a 66.7 (64.4 - 69.0)
Xu_PKU_task1_3 Xiaoshuo Xu Institute of Computer Science and Technology, Peking University, Beijing, China task-acoustic-scene-classification-results#Xu2017a 64.6 (62.3 - 67.0)
Yang_WHU_TASK1_1 Yuhong Yang National Engineering Research Center for Multimedia Software, Wuhan University, Hubei, China; Collaborative Innovation Center of Geospatial Technology, Wuhan, China task-acoustic-scene-classification-results#Lu2017 61.5 (59.2 - 63.9)
Yang_WHU_TASK1_2 Yuhong Yang National Engineering Research Center for Multimedia Software, Wuhan University, Hubei, China; Collaborative Innovation Center of Geospatial Technology, Wuhan, China task-acoustic-scene-classification-results#Lu2017 65.2 (62.9 - 67.6)
Yang_WHU_TASK1_3 Yuhong Yang National Engineering Research Center for Multimedia Software, Wuhan University, Hubei, China; Collaborative Innovation Center of Geospatial Technology, Wuhan, China task-acoustic-scene-classification-results#Lu2017 62.8 (60.5 - 65.2)
Yang_WHU_TASK1_4 Yuhong Yang National Engineering Research Center for Multimedia Software, Wuhan University, Hubei, China; Collaborative Innovation Center of Geospatial Technology, Wuhan, China task-acoustic-scene-classification-results#Lu2017 63.6 (61.3 - 66.0)
Yu_UOS_task1_1 Yu Ha-Jin School of Computer Science, University of Seoul, Seoul, Republic of South Korea task-acoustic-scene-classification-results#Jee-Weon2017 67.0 (64.7 - 69.3)
Yu_UOS_task1_2 Yu Ha-Jin School of Computer Science, University of Seoul, Seoul, Republic of South Korea task-acoustic-scene-classification-results#Jee-Weon2017 66.2 (63.9 - 68.5)
Yu_UOS_task1_3 Yu Ha-Jin School of Computer Science, University of Seoul, Seoul, Republic of South Korea task-acoustic-scene-classification-results#Jee-Weon2017 67.3 (65.1 - 69.6)
Yu_UOS_task1_4 Yu Ha-Jin School of Computer Science, University of Seoul, Seoul, Republic of South Korea task-acoustic-scene-classification-results#Jee-Weon2017 70.6 (68.3 - 72.8)
Zhao_ADSC_task1_1 Shengkui Zhao Illinois at Singapore, Advanced Digital Sciences Center, Singapore task-acoustic-scene-classification-results#Zhao2017 70.0 (67.8 - 72.2)
Zhao_ADSC_task1_2 Shengkui Zhao Illinois at Singapore, Advanced Digital Sciences Center, Singapore task-acoustic-scene-classification-results#Zhao2017 67.9 (65.6 - 70.2)
Zhao_UAU_UP_task1_1 Ren Zhao Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany; Chair of Complex and Intelligent Systems, University of Passau, Passau, Germany task-acoustic-scene-classification-results#Zhao2017a 63.8 (61.5 - 66.2)


Complete results and technical reports can be found here.

Baseline system

The baseline system for the task is provided. The system is meant to implement a basic approach for acoustic scene classification, and provide some comparison point for the participants while developing their systems. The baseline systems for all tasks share the code base, implementing quite similar approach for all tasks. The baseline system will download the needed datasets and produces the results below when ran with the default parameters.

The baseline system is based on a multilayer perceptron architecture using log mel-band energies as features. A 5-frame context is used, resulting in a feature vector length of 200. Using these features, a neural network containing two dense layers of 50 hidden units per layer and 20% dropout is trained for 200 epochs. Classification decision is based on the network output layer which is of softmax type. A detailed description is available in the baseline system documentation. The baseline system includes evaluation of results using accuracy as metric.

The baseline system is implemented using Python (version 2.7 and 3.6). Participants are allowed to build their system on top of the given baseline system. The system has all needed functionality for dataset handling, storing / accessing features and models, and evaluating the results, making the adaptation for one's needs rather easy. The baseline system is also a good starting point for entry level researchers.

Python implementation


Results for TUT Acoustic scenes 2017, development dataset

Evaluation setup

  • 4-fold cross-validation, average classification accuracy over folds
  • 15 acoustic scene classes
  • Classification unit: one file (10 seconds of audio).
  • Python 2.7.13 used

System parameters

  • Frame size: 40 ms (with 50% hop size)
  • Feature vector: 40 log mel-band energies in 5 consecutive frames = 200 values
  • MLP: 2 layers x 50 hidden units, 20% dropout, 200 epochs (using early stopping criteria, monitoring started after 100 epoch, 10 epoch patience), learning rate 0.001, softmax output layer
  • Trained and tested on full audio
Acoustic scene classification results, averaged over evaluation folds.
Acoustic scene Accuracy
Beach 75.3 %
Bus 71.8 %
Cafe / Restaurant 57.7 %
Car 97.1 %
City center 90.7 %
Forest path 79.5 %
Grocery store 58.7 %
Home 68.6 %
Library 57.1 %
Metro station 91.7 %
Office 99.7 %
Park 70.2 %
Residential area 64.1 %
Train 58.0 %
Tram 81.7 %
Overall accuracy 74.8 %

Citation

If you are using the dataset or baseline code, or want to refer challenge task please cite the following paper:

Publication

A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Raj, and T. Virtanen. DCASE 2017 challenge setup: tasks, datasets and baseline system. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), 85–92. November 2017.

PDF

DCASE 2017 Challenge Setup: Tasks, Datasets and Baseline System

Abstract

DCASE 2017 Challenge consists of four tasks: acoustic scene classification, detection of rare sound events, sound event detection in real-life audio, and large-scale weakly supervised sound event detection for smart cars. This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset. The baseline systems for all tasks rely on the same implementation using multilayer perceptron and log mel-energies, but differ in the structure of the output layer and the decision making process, as well as the evaluation of system output using task specific metrics.

Keywords

Sound scene analysis, Acoustic scene classification, Sound event detection, Audio tagging, Rare sound events

PDF


When citing challenge results please cite the following paper:

Publication

Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. Acoustic scene classification: an overview of DCASE 2017 challenge entries. In 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), 411–415. September 2018. doi:10.1109/IWAENC.2018.8521242.

Acoustic Scene Classification: An Overview of DCASE 2017 Challenge Entries

Abstract

We present an overview of the challenge entries for the Acoustic Scene Classification task of DCASE 2017 Challenge. Being the most popular task of the challenge, acoustic scene classification entries provide a wide variety of approaches for comparison, with a wide performance gap from top to bottom. Analysis of the submissions confirms once more the popularity of deep-learning approaches and mel-frequency representations. Statistical analysis indicates that the top ranked system performed significantly better than the others, and that combinations of top systems are capable of reaching close to perfect performance on the given data.

Keywords

acoustic scene classification, audio classification, DCASE challenge