Challenge has ended. Full results for this task can be found here
Description
The goal of acoustic scene classification is to classify a test recording into one of the provided predefined classes that characterizes the environment in which it was recorded — for example "park", "home", "office".
Audio dataset
TUT Acoustic Scenes 2017 dataset will be used as development data for the task. The dataset consists of recordings from various acoustic scenes, all having distinct recording locations. For each recording location, 3-5 minute long audio recording was captured. The original recordings were then split into segments with a length of 10 seconds. These audio segments are provided in individual files.
Acoustic scenes for the task (15):
- Bus - traveling by bus in the city (vehicle)
- Cafe / Restaurant - small cafe/restaurant (indoor)
- Car - driving or traveling as a passenger, in the city (vehicle)
- City center (outdoor)
- Forest path (outdoor)
- Grocery store - medium size grocery store (indoor)
- Home (indoor)
- Lakeside beach (outdoor)
- Library (indoor)
- Metro station (indoor)
- Office - multiple persons, typical work day (indoor)
- Residential area (outdoor)
- Train (traveling, vehicle)
- Tram (traveling, vehicle)
- Urban park (outdoor)
Detailed description of acoustic scenes included in the dataset can be found DCASE2016 Task1 page.
The dataset was collected in Finland by Tampere University of Technology between 06/2015 - 01/2017. The data collection has received funding from the European Research Council.
Recording and annotation procedure
For all acoustic scenes, the recordings were captured each in a different location: different streets, different parks, different homes. Recordings were made using a Soundman OKM II Klassik/studio A3, electret binaural microphone and a Roland Edirol R-09 wave recorder using 44.1 kHz sampling rate and 24 bit resolution. The microphones are specifically made to look like headphones, being worn in the ears. As an effect of this, the recorded audio is very similar to the sound that reaches the human auditory system of the person wearing the equipment.
Postprocessing of the recorded data involves aspects related to privacy of recorded individuals. For audio material recorded in private places, written consent was obtained from all people involved. Material recorded in public places does not require such consent, but was screened for content, and privacy infringing segments were eliminated. Microphone failure and audio distortions were annotated, and the annotations are provided with the data. Based on experiments in DCASE 2016, eliminating the error regions in training does not influence the final classification accuracy. The evaluation set does not contain any such audio errors.
Download
In case you are using the provided baseline system, there is no need to download the dataset as the system will automatically download needed dataset for you.
** Development dataset **
** Evaluation dataset **
Task setup
TUT Acoustic Scenes 2017 dataset consist of two subsets: development dataset and evaluation dataset. The development dataset consists of the complete TUT Acoustic Scenes 2016 dataset (both development and evaluation data of the 2016 challenge). The partitioning of the data into subsets was done based on the location of the original recordings, so the evaluation dataset contains recordings of similar audio scenes but from different geographical locations. All segments obtained from the same original recording were included into a single subset - either development dataset or evaluation dataset. For each acoustic scene, there are 312 segments (52 minutes of audio) in the development dataset.
A detailed description of the data recording and annotation procedure is available in:
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. TUT database for acoustic scene classification and sound event detection. In 24th European Signal Processing Conference 2016 (EUSIPCO 2016). Budapest, Hungary, 2016.
TUT Database for Acoustic Scene Classification and Sound Event Detection
Abstract
We introduce TUT Acoustic Scenes 2016 database for environmental sound research, consisting ofbinaural recordings from 15 different acoustic environments. A subset of this database, called TUT Sound Events 2016, contains annotations for individual sound events, specifically created for sound event detection. TUT Sound Events 2016 consists of residential area and home environments, and is manually annotated to mark onset, offset and label of sound events. In this paper we present the recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models. The database is publicly released to provide support for algorithm development and common ground for comparison of different techniques.
Development dataset
A cross-validation setup is provided for the development dataset in order to make results reported with this dataset uniform. The setup consists of four folds distributing the available segments based on location. The folds are provided with the dataset in the directory evaluation setup
.
Fold 1 of the provided setup reproduces the DCASE 2016 challenge setup, by using the 2016 development set as training subset and the 2016 evaluation set as test subset.
Important: If you are not using the provided cross-validation setup, pay attention to the segments extracted from same original recordings. Make sure that for each given fold, ALL segments from same location must be either in the training subset OR in the test subset.
Evaluation dataset
Evaluation dataset without ground truth will be released one month before the submission deadline. Full ground truth meta data for it will be published after the DCASE 2017 challenge and workshop are concluded.
Submission
Detailed information for the challenge submission can found on the submission page.
System output should be presented as a single text-file (in CSV format) containing classification result for each audio file in the evaluation set. Result items can be in any order. Format:
[filename (string)][tab][scene label (string)]
Multiple system outputs can be submitted (maximum 4 per participant). If submitting multiple systems, the individual text-files should be packaged into a zip file for submission. Please carefully mark the connection between the submitted files and the corresponding system or system parameters (for example by naming the text file appropriately).
Task rules
These are the general rules valid for all tasks. The same rules and additional information on technical report and submission requirements can be found here. Task specific rules are highlighted with green.
- Participants are not allowed to use external data for system development. Data from another task is considered external data.
- Manipulation of provided training and development data is allowed.
The development dataset can be augmented without use of external data (e.g. by mixing data sampled from a pdf or using techniques such as pitch shifting or time stretching).
- Participants are not allowed to make subjective judgments of the evaluation data, nor to annotate it. The evaluation dataset cannot be used to train the submitted system; the use of statistics about the evaluation data in the decision making is also forbidden.
Evaluation
The scoring of acoustic scene classification will be based on classification accuracy: the number of correctly classified segments among the total number of segments. Each segment is considered an independent test sample.
The evaluation is done automatically in the baseline system. Evaluation is done using sed_eval toolbox.
Results
Rank | Submission Information | ||||
---|---|---|---|---|---|
Code | Author | Affiliation |
Technical Report |
Accuracy with 95% confidence interval |
|
Abrol_IITM_task1_1 | Vinayak Abrol | Multimedia Analytics and Systems Lab, SCEE, Indian Institute of Technology Mandi, Mandi, India | task-acoustic-scene-classification-results#Abrol2017 | 65.7 (63.4 - 68.0) | |
Amiriparian_AU_task1_1 | Shahin Amiriparian | Chair of Complex & Intelligent Systems, Universität Passau, Passau, Germany; Chair of Embedded Intelligence for Health Care, Augsburg University, Augsburg, Germany; Machine Intelligence & Signal Processing Group, Technische Universität München, München, Germany | task-acoustic-scene-classification-results#Amiriparian2017 | 67.5 (65.3 - 69.8) | |
Amiriparian_AU_task1_2 | Shahin Amiriparian | Chair of Complex & Intelligent Systems, Universität Passau, Passau, Germany; Chair of Embedded Intelligence for Health Care, Augsburg University, Augsburg, Germany; Machine Intelligence & Signal Processing Group, Technische Universität München, München, Germany | task-acoustic-scene-classification-results#Amiriparian2017a | 59.1 (56.7 - 61.5) | |
Biho_Sogang_task1_1 | Biho Kim | Sogang university, Seoul, Korea | task-acoustic-scene-classification-results#Kim2017 | 56.5 (54.1 - 59.0) | |
Biho_Sogang_task1_2 | Biho Kim | Sogang university, Seoul, Korea | task-acoustic-scene-classification-results#Kim2017 | 60.5 (58.1 - 62.9) | |
Bisot_TPT_task1_1 | Victor Bisot | Image Data and Signal, Telecom ParisTech, Paris, France | task-acoustic-scene-classification-results#Bisot2017 | 69.8 (67.6 - 72.1) | |
Bisot_TPT_task1_2 | Victor Bisot | Image Data and Signal, Telecom ParisTech, Paris, France | task-acoustic-scene-classification-results#Bisot2017 | 69.6 (67.3 - 71.8) | |
Chandrasekhar_IIITH_task1_1 | Paseddula Chandrasekhar | Speech Processing Lab, International Institute of Information Technology, Hyderabad, Hyderabad, India | task-acoustic-scene-classification-results#Chandrasekhar2017 | 45.9 (43.4 - 48.3) | |
Chou_SINICA_task1_1 | Szu-Yu Chou | Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan; Research Center for IT innovation, Academia Sinica, Taipei, Taiwan | task-acoustic-scene-classification-results#Chou2017 | 57.1 (54.7 - 59.5) | |
Chou_SINICA_task1_2 | Szu-Yu Chou | Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan; Research Center for IT innovation, Academia Sinica, Taipei, Taiwan | task-acoustic-scene-classification-results#Chou2017 | 61.5 (59.2 - 63.9) | |
Chou_SINICA_task1_3 | Szu-Yu Chou | Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan; Research Center for IT innovation, Academia Sinica, Taipei, Taiwan | task-acoustic-scene-classification-results#Chou2017 | 59.8 (57.4 - 62.1) | |
Chou_SINICA_task1_4 | Szu-Yu Chou | Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan; Research Center for IT innovation, Academia Sinica, Taipei, Taiwan | task-acoustic-scene-classification-results#Chou2017 | 57.1 (54.7 - 59.5) | |
Dang_NCU_task1_1 | Jia-Ching Wang | Computer Sciene and Information Engineering, National Central University, Taoyuan, Taiwan | task-acoustic-scene-classification-results#Dang2017 | 62.7 (60.4 - 65.1) | |
Dang_NCU_task1_2 | Jia-Ching Wang | Computer Sciene and Information Engineering, National Central University, Taoyuan, Taiwan | task-acoustic-scene-classification-results#Dang2017 | 62.7 (60.4 - 65.1) | |
Dang_NCU_task1_3 | Jia-Ching Wang | Computer Sciene and Information Engineering, National Central University, Taoyuan, Taiwan | task-acoustic-scene-classification-results#Dang2017 | 63.7 (61.4 - 66.0) | |
Duppada_Seernet_task1_1 | Venkatesh Duppada | Data Science, Seernet Technologies, LLC, Mumbai, India | task-acoustic-scene-classification-results#Duppada2017 | 57.0 (54.6 - 59.4) | |
Duppada_Seernet_task1_2 | Venkatesh Duppada | Data Science, Seernet Technologies, LLC, Mumbai, India | task-acoustic-scene-classification-results#Duppada2017 | 59.9 (57.5 - 62.3) | |
Duppada_Seernet_task1_3 | Venkatesh Duppada | Data Science, Seernet Technologies, LLC, Mumbai, India | task-acoustic-scene-classification-results#Duppada2017 | 64.1 (61.7 - 66.4) | |
Duppada_Seernet_task1_4 | Venkatesh Duppada | Data Science, Seernet Technologies, LLC, Mumbai, India | task-acoustic-scene-classification-results#Duppada2017 | 63.0 (60.7 - 65.4) | |
Foleiss_UTFPR_task1_1 | Juliano Foleiss | Computing Department, Universidade Tecnologica Federal do Parana, Campo Mourao, Brazil | task-acoustic-scene-classification-results#Foleiss2017 | 64.5 (62.2 - 66.8) | |
Foleiss_UTFPR_task1_2 | Juliano Foleiss | Computing Department, Universidade Tecnologica Federal do Parana, Campo Mourao, Brazil | task-acoustic-scene-classification-results#Foleiss2017 | 66.9 (64.6 - 69.2) | |
Fonseca_MTG_task1_1 | Eduardo Fonseca | Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain | task-acoustic-scene-classification-results#Fonseca2017 | 67.3 (65.1 - 69.6) | |
Fraile_UPM_task1_1 | Ruben Fraile | Group on Acoustics and Multimedia Applicationa, Universidad Politecnica de Madrid, Madrid, Spain | task-acoustic-scene-classification-results#Fraile2017 | 58.3 (55.9 - 60.7) | |
Gong_MTG_task1_1 | Rong Gong | Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain | task-acoustic-scene-classification-results#Gong2017 | 61.2 (58.8 - 63.5) | |
Gong_MTG_task1_2 | Rong Gong | Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain | task-acoustic-scene-classification-results#Gong2017 | 61.5 (59.1 - 63.9) | |
Gong_MTG_task1_3 | Rong Gong | Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain | task-acoustic-scene-classification-results#Gong2017 | 61.9 (59.5 - 64.2) | |
Han_COCAI_task1_1 | Yoonchang Han | Cochlear.ai, Seoul, Korea | task-acoustic-scene-classification-results#Han2017 | 79.9 (78.0 - 81.9) | |
Han_COCAI_task1_2 | Yoonchang Han | Cochlear.ai, Seoul, Korea | task-acoustic-scene-classification-results#Han2017 | 79.6 (77.7 - 81.6) | |
Han_COCAI_task1_3 | Yoonchang Han | Cochlear.ai, Seoul, Korea | task-acoustic-scene-classification-results#Han2017 | 80.4 (78.4 - 82.3) | |
Han_COCAI_task1_4 | Yoonchang Han | Cochlear.ai, Seoul, Korea | task-acoustic-scene-classification-results#Han2017 | 80.3 (78.4 - 82.2) | |
Hasan_BUET_task1_1 | Taufiq Hasan | Department of Biomedical Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh | task-acoustic-scene-classification-results#Hyder2017 | 74.1 (72.0 - 76.3) | |
Hasan_BUET_task1_2 | Taufiq Hasan | Department of Biomedical Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh | task-acoustic-scene-classification-results#Hyder2017 | 72.2 (70.0 - 74.3) | |
Hasan_BUET_task1_3 | Taufiq Hasan | Department of Biomedical Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh | task-acoustic-scene-classification-results#Hyder2017 | 68.6 (66.3 - 70.8) | |
Hasan_BUET_task1_4 | Taufiq Hasan | Department of Biomedical Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh | task-acoustic-scene-classification-results#Hyder2017 | 72.0 (69.8 - 74.2) | |
DCASE2017 baseline | Toni Heittola | Laboratory of Signal Processing, Tampere University of Technology, Tampere, Finland | task-acoustic-scene-classification-results#Heittola2017 | 61.0 (58.7 - 63.4) | |
Huang_THU_task1_1 | Taoan Huang | Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China | task-acoustic-scene-classification-results#Huang2017 | 65.5 (63.2 - 67.8) | |
Huang_THU_task1_2 | Taoan Huang | Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China | task-acoustic-scene-classification-results#Huang2017 | 65.4 (63.1 - 67.7) | |
Hussain_NUCES_task1_1 | Khalid Hussain | Department of electrical engineering, National University of computer and emerging sciences, Pakistan | task-acoustic-scene-classification-results#Hussain2017 | 56.7 (54.3 - 59.1) | |
Hussain_NUCES_task1_2 | Khalid Hussain | Department of electrical engineering, National University of computer and emerging sciences, Pakistan | task-acoustic-scene-classification-results#Hussain2017 | 59.5 (57.1 - 61.9) | |
Hussain_NUCES_task1_3 | Khalid Hussain | Department of electrical engineering, National University of computer and emerging sciences, Pakistan | task-acoustic-scene-classification-results#Hussain2017 | 59.9 (57.5 - 62.3) | |
Hussain_NUCES_task1_4 | Khalid Hussain | Department of electrical engineering, National University of computer and emerging sciences, Pakistan | task-acoustic-scene-classification-results#Hussain2017 | 55.4 (52.9 - 57.8) | |
Jallet_TUT_task1_1 | Hugo Jallet | Laboratory of Signal Processing, Tampere University of Technology, Tampere, Finland | task-acoustic-scene-classification-results#Jallet2017 | 60.7 (58.4 - 63.1) | |
Jallet_TUT_task1_2 | Hugo Jallet | Laboratory of Signal Processing, Tampere University of Technology, Tampere, Finland | task-acoustic-scene-classification-results#Jallet2017 | 61.2 (58.8 - 63.5) | |
Jimenez_CMU_task1_1 | Abelino Jimenez | Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, USA | task-acoustic-scene-classification-results#Jimenez2017 | 59.9 (57.6 - 62.3) | |
Kukanov_UEF_task1_1 | Ivan Kukanov | School of Computing, University of Eastern Finland, Joensuu, Finland; Institute for Infocomm Research, A*Star, Singapore | task-acoustic-scene-classification-results#Kukanov2017 | 71.7 (69.5 - 73.9) | |
Kun_TUM_UAU_UP_task1_1 | Qian Kun | MISP group, Technische Universität München, Munich, Germany; Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany; Chair of Complex and Intelligent Systems, University of Passau, Passau, Germany | task-acoustic-scene-classification-results#Kun2017 | 64.2 (61.9 - 66.5) | |
Kun_TUM_UAU_UP_task1_2 | Qian Kun | MISP group, Technische Universität München, Munich, Germany; Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany; Chair of Complex and Intelligent Systems, University of Passau, Passau, Germany | task-acoustic-scene-classification-results#Kun2017 | 64.0 (61.7 - 66.3) | |
Lehner_JKU_task1_1 | Bernhard Lehner | Department of Computational Perception, Johannes Kepler University, Linz, Austria | task-acoustic-scene-classification-results#Lehner2017 | 68.7 (66.4 - 71.0) | |
Lehner_JKU_task1_2 | Bernhard Lehner | Department of Computational Perception, Johannes Kepler University, Linz, Austria | task-acoustic-scene-classification-results#Lehner2017 | 66.8 (64.5 - 69.1) | |
Lehner_JKU_task1_3 | Bernhard Lehner | Department of Computational Perception, Johannes Kepler University, Linz, Austria | task-acoustic-scene-classification-results#Lehner2017 | 64.8 (62.5 - 67.1) | |
Lehner_JKU_task1_4 | Bernhard Lehner | Department of Computational Perception, Johannes Kepler University, Linz, Austria | task-acoustic-scene-classification-results#Lehner2017 | 73.8 (71.7 - 76.0) | |
Li_SCUT_task1_1 | Yanxiong Li | School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China | task-acoustic-scene-classification-results#Li2017 | 53.7 (51.3 - 56.1) | |
Li_SCUT_task1_2 | Yanxiong Li | School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China | task-acoustic-scene-classification-results#Li2017 | 63.6 (61.3 - 66.0) | |
Li_SCUT_task1_3 | Yanxiong Li | School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China | task-acoustic-scene-classification-results#Li2017 | 61.7 (59.4 - 64.1) | |
Li_SCUT_task1_4 | Yanxiong Li | School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China | task-acoustic-scene-classification-results#Li2017 | 57.8 (55.4 - 60.2) | |
Maka_ZUT_task1_1 | Tomasz Maka | Faculty of Computer Science and Information Technology, West Pomeranian University of Technology, Szczecin, Szczecin, Poland | task-acoustic-scene-classification-results#Maka2017 | 47.5 (45.1 - 50.0) | |
Mun_KU_task1_1 | Seongkyu Mun | Intelligent Signal Processing Laboratory, Korea University, Seoul, South Korea | task-acoustic-scene-classification-results#Mun2017 | 83.3 (81.5 - 85.1) | |
Park_ISPL_task1_1 | Hanseok Ko | School of Electrical Engineering, Korea University, Seoul, Republic of Korea | task-acoustic-scene-classification-results#Park2017 | 72.6 (70.4 - 74.8) | |
Phan_UniLuebeck_task1_1 | Huy Phan | Institute for Signal Processing, University of Luebeck, Luebeck, Germany | task-acoustic-scene-classification-results#Phan2017 | 59.0 (56.6 - 61.4) | |
Phan_UniLuebeck_task1_2 | Huy Phan | Institute for Signal Processing, University of Luebeck, Luebeck, Germany | task-acoustic-scene-classification-results#Phan2017 | 55.9 (53.5 - 58.3) | |
Phan_UniLuebeck_task1_3 | Huy Phan | Institute for Signal Processing, University of Luebeck, Luebeck, Germany | task-acoustic-scene-classification-results#Phan2017 | 58.3 (55.9 - 60.7) | |
Phan_UniLuebeck_task1_4 | Huy Phan | Institute for Signal Processing, University of Luebeck, Luebeck, Germany | task-acoustic-scene-classification-results#Phan2017 | 58.0 (55.6 - 60.4) | |
Piczak_WUT_task1_1 | Karol Piczak | Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland | task-acoustic-scene-classification-results#Piczak2017 | 70.6 (68.4 - 72.8) | |
Piczak_WUT_task1_2 | Karol Piczak | Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland | task-acoustic-scene-classification-results#Piczak2017 | 69.6 (67.3 - 71.8) | |
Piczak_WUT_task1_3 | Karol Piczak | Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland | task-acoustic-scene-classification-results#Piczak2017 | 67.7 (65.4 - 69.9) | |
Piczak_WUT_task1_4 | Karol Piczak | Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland | task-acoustic-scene-classification-results#Piczak2017 | 62.0 (59.6 - 64.3) | |
Rakotomamonjy_UROUEN_task1_1 | Alain Rakotomamonjy | LITIS EA4108, Université de Rouen, Saint Etienne du Rouvray, France | task-acoustic-scene-classification-results#Rakotomamonjy2017 | 61.5 (59.2 - 63.9) | |
Rakotomamonjy_UROUEN_task1_2 | Alain Rakotomamonjy | LITIS EA4108, Université de Rouen, Saint Etienne du Rouvray, France | task-acoustic-scene-classification-results#Rakotomamonjy2017 | 62.7 (60.3 - 65.0) | |
Rakotomamonjy_UROUEN_task1_3 | Alain Rakotomamonjy | LITIS EA4108, Université de Rouen, Saint Etienne du Rouvray, France | task-acoustic-scene-classification-results#Rakotomamonjy2017 | 62.8 (60.4 - 65.1) | |
Schindler_AIT_task1_1 | Alexander Schindler | Center for Digital Safety and Security, Austrian Institute of Technology, Vienna, Austria | task-acoustic-scene-classification-results#Schindler2017 | 61.7 (59.4 - 64.1) | |
Schindler_AIT_task1_2 | Alexander Schindler | Center for Digital Safety and Security, Austrian Institute of Technology, Vienna, Austria | task-acoustic-scene-classification-results#Schindler2017 | 61.7 (59.4 - 64.1) | |
Vafeiadis_CERTH_task1_1 | Anastasios Vafeiadis | Information Technologies Institute, Center for Research & Technology Hellas, Thessaloniki, Greece | task-acoustic-scene-classification-results#Vafeiadis2017 | 61.0 (58.6 - 63.4) | |
Vafeiadis_CERTH_task1_2 | Anastasios Vafeiadis | Information Technologies Institute, Center for Research & Technology Hellas, Thessaloniki, Greece | task-acoustic-scene-classification-results#Vafeiadis2017 | 49.5 (47.1 - 51.9) | |
Vij_UIET_task1_1 | Dinesh Vij | Computer Science and Engineering, University Institute of Engineering and Technology, Panjab University, Chandigarh, India | task-acoustic-scene-classification-results#Vij2017 | 61.2 (58.9 - 63.6) | |
Vij_UIET_task1_2 | Dinesh Vij | Computer Science and Engineering, University Institute of Engineering and Technology, Panjab University, Chandigarh, India | task-acoustic-scene-classification-results#Vij2017 | 57.5 (55.1 - 59.9) | |
Vij_UIET_task1_3 | Dinesh Vij | Computer Science and Engineering, University Institute of Engineering and Technology, Panjab University, Chandigarh, India | task-acoustic-scene-classification-results#Vij2017 | 59.6 (57.2 - 62.0) | |
Vij_UIET_task1_4 | Dinesh Vij | Computer Science and Engineering, University Institute of Engineering and Technology, Panjab University, Chandigarh, India | task-acoustic-scene-classification-results#Vij2017 | 65.0 (62.7 - 67.3) | |
Waldekar_IITKGP_task1_1 | Shefali Waldekar | Electronics and Electrical Communication Engineering Dept., Indian Institute of Technology Kharagpur, Kharagpur, India | task-acoustic-scene-classification-results#Waldekar2017 | 67.0 (64.7 - 69.3) | |
Waldekar_IITKGP_task1_2 | Shefali Waldekar | Electronics and Electrical Communication Engineering Dept., Indian Institute of Technology Kharagpur, Kharagpur, India | task-acoustic-scene-classification-results#Waldekar2017 | 64.9 (62.6 - 67.2) | |
Xing_SCNU_task1_1 | Xing Xiaotao | School of Computer, South China Normal University, Guangzhou, China | task-acoustic-scene-classification-results#Weiping2017 | 74.8 (72.6 - 76.9) | |
Xing_SCNU_task1_2 | Xing Xiaotao | School of Computer, South China Normal University, Guangzhou, China | task-acoustic-scene-classification-results#Weiping2017 | 77.7 (75.7 - 79.7) | |
Xu_NUDT_task1_1 | Jinwei Xu | Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China | task-acoustic-scene-classification-results#Xu2017 | 68.5 (66.2 - 70.7) | |
Xu_NUDT_task1_2 | Jinwei Xu | Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China | task-acoustic-scene-classification-results#Xu2017 | 67.5 (65.3 - 69.8) | |
Xu_PKU_task1_1 | Xiaoshuo Xu | Institute of Computer Science and Technology, Peking University, Beijing, China | task-acoustic-scene-classification-results#Xu2017a | 65.9 (63.6 - 68.2) | |
Xu_PKU_task1_2 | Xiaoshuo Xu | Institute of Computer Science and Technology, Peking University, Beijing, China | task-acoustic-scene-classification-results#Xu2017a | 66.7 (64.4 - 69.0) | |
Xu_PKU_task1_3 | Xiaoshuo Xu | Institute of Computer Science and Technology, Peking University, Beijing, China | task-acoustic-scene-classification-results#Xu2017a | 64.6 (62.3 - 67.0) | |
Yang_WHU_TASK1_1 | Yuhong Yang | National Engineering Research Center for Multimedia Software, Wuhan University, Hubei, China; Collaborative Innovation Center of Geospatial Technology, Wuhan, China | task-acoustic-scene-classification-results#Lu2017 | 61.5 (59.2 - 63.9) | |
Yang_WHU_TASK1_2 | Yuhong Yang | National Engineering Research Center for Multimedia Software, Wuhan University, Hubei, China; Collaborative Innovation Center of Geospatial Technology, Wuhan, China | task-acoustic-scene-classification-results#Lu2017 | 65.2 (62.9 - 67.6) | |
Yang_WHU_TASK1_3 | Yuhong Yang | National Engineering Research Center for Multimedia Software, Wuhan University, Hubei, China; Collaborative Innovation Center of Geospatial Technology, Wuhan, China | task-acoustic-scene-classification-results#Lu2017 | 62.8 (60.5 - 65.2) | |
Yang_WHU_TASK1_4 | Yuhong Yang | National Engineering Research Center for Multimedia Software, Wuhan University, Hubei, China; Collaborative Innovation Center of Geospatial Technology, Wuhan, China | task-acoustic-scene-classification-results#Lu2017 | 63.6 (61.3 - 66.0) | |
Yu_UOS_task1_1 | Yu Ha-Jin | School of Computer Science, University of Seoul, Seoul, Republic of South Korea | task-acoustic-scene-classification-results#Jee-Weon2017 | 67.0 (64.7 - 69.3) | |
Yu_UOS_task1_2 | Yu Ha-Jin | School of Computer Science, University of Seoul, Seoul, Republic of South Korea | task-acoustic-scene-classification-results#Jee-Weon2017 | 66.2 (63.9 - 68.5) | |
Yu_UOS_task1_3 | Yu Ha-Jin | School of Computer Science, University of Seoul, Seoul, Republic of South Korea | task-acoustic-scene-classification-results#Jee-Weon2017 | 67.3 (65.1 - 69.6) | |
Yu_UOS_task1_4 | Yu Ha-Jin | School of Computer Science, University of Seoul, Seoul, Republic of South Korea | task-acoustic-scene-classification-results#Jee-Weon2017 | 70.6 (68.3 - 72.8) | |
Zhao_ADSC_task1_1 | Shengkui Zhao | Illinois at Singapore, Advanced Digital Sciences Center, Singapore | task-acoustic-scene-classification-results#Zhao2017 | 70.0 (67.8 - 72.2) | |
Zhao_ADSC_task1_2 | Shengkui Zhao | Illinois at Singapore, Advanced Digital Sciences Center, Singapore | task-acoustic-scene-classification-results#Zhao2017 | 67.9 (65.6 - 70.2) | |
Zhao_UAU_UP_task1_1 | Ren Zhao | Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany; Chair of Complex and Intelligent Systems, University of Passau, Passau, Germany | task-acoustic-scene-classification-results#Zhao2017a | 63.8 (61.5 - 66.2) |
Complete results and technical reports can be found here.
Baseline system
The baseline system for the task is provided. The system is meant to implement a basic approach for acoustic scene classification, and provide some comparison point for the participants while developing their systems. The baseline systems for all tasks share the code base, implementing quite similar approach for all tasks. The baseline system will download the needed datasets and produces the results below when ran with the default parameters.
The baseline system is based on a multilayer perceptron architecture using log mel-band energies as features. A 5-frame context is used, resulting in a feature vector length of 200. Using these features, a neural network containing two dense layers of 50 hidden units per layer and 20% dropout is trained for 200 epochs. Classification decision is based on the network output layer which is of softmax type. A detailed description is available in the baseline system documentation. The baseline system includes evaluation of results using accuracy as metric.
The baseline system is implemented using Python (version 2.7 and 3.6). Participants are allowed to build their system on top of the given baseline system. The system has all needed functionality for dataset handling, storing / accessing features and models, and evaluating the results, making the adaptation for one's needs rather easy. The baseline system is also a good starting point for entry level researchers.
Python implementation
Results for TUT Acoustic scenes 2017, development dataset
Evaluation setup
- 4-fold cross-validation, average classification accuracy over folds
- 15 acoustic scene classes
- Classification unit: one file (10 seconds of audio).
- Python 2.7.13 used
System parameters
- Frame size: 40 ms (with 50% hop size)
- Feature vector: 40 log mel-band energies in 5 consecutive frames = 200 values
- MLP: 2 layers x 50 hidden units, 20% dropout, 200 epochs (using early stopping criteria, monitoring started after 100 epoch, 10 epoch patience), learning rate 0.001, softmax output layer
- Trained and tested on full audio
Acoustic scene | Accuracy |
---|---|
Beach | 75.3 % |
Bus | 71.8 % |
Cafe / Restaurant | 57.7 % |
Car | 97.1 % |
City center | 90.7 % |
Forest path | 79.5 % |
Grocery store | 58.7 % |
Home | 68.6 % |
Library | 57.1 % |
Metro station | 91.7 % |
Office | 99.7 % |
Park | 70.2 % |
Residential area | 64.1 % |
Train | 58.0 % |
Tram | 81.7 % |
Overall accuracy | 74.8 % |
Citation
If you are using the dataset or baseline code, or want to refer challenge task please cite the following paper:
A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Raj, and T. Virtanen. DCASE 2017 challenge setup: tasks, datasets and baseline system. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), 85–92. November 2017.
DCASE 2017 Challenge Setup: Tasks, Datasets and Baseline System
Abstract
DCASE 2017 Challenge consists of four tasks: acoustic scene classification, detection of rare sound events, sound event detection in real-life audio, and large-scale weakly supervised sound event detection for smart cars. This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset. The baseline systems for all tasks rely on the same implementation using multilayer perceptron and log mel-energies, but differ in the structure of the output layer and the decision making process, as well as the evaluation of system output using task specific metrics.
Keywords
Sound scene analysis, Acoustic scene classification, Sound event detection, Audio tagging, Rare sound events
When citing challenge results please cite the following paper:
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. Acoustic scene classification: an overview of DCASE 2017 challenge entries. In 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), 411–415. September 2018. doi:10.1109/IWAENC.2018.8521242.
Acoustic Scene Classification: An Overview of DCASE 2017 Challenge Entries
Abstract
We present an overview of the challenge entries for the Acoustic Scene Classification task of DCASE 2017 Challenge. Being the most popular task of the challenge, acoustic scene classification entries provide a wide variety of approaches for comparison, with a wide performance gap from top to bottom. Analysis of the submissions confirms once more the popularity of deep-learning approaches and mel-frequency representations. Statistical analysis indicates that the top ranked system performed significantly better than the others, and that combinations of top systems are capable of reaching close to perfect performance on the given data.
Keywords
acoustic scene classification, audio classification, DCASE challenge