Acoustic scene classification


Task description

The goal of acoustic scene classification is to classify a test recording into one of the provided predefined classes that characterizes the environment in which it was recorded.

Challenge has ended. Full results for this task can be found in subtask specific result pages: Task1A Task1B Task1C

This task comprises three different subtasks that involve system development for three different situations:

A Match Task 1

Acoustic Scene Classification
Subtask A

Classification of data from the same device as the available training data.

B Mismatch Task 1

Acoustic Scene Classification with mismatched recording devices
Subtask B

Classification of data recorded with devices different than the training data.

C OpenSet Task 1

Open set Acoustic Scene Classification
Subtask C

Classification on data that includes classes not encountered in the training data.

Description

The goal of acoustic scene classification is to classify a test recording into one of the provided predefined classes that characterizes the environment in which it was recorded — for example "park", "pedestrian street", "metro station" — or to indicate it is from a different, unknown environment.

Figure 1: Overview of acoustic scene classification system.


Audio dataset

The dataset for this task is the TAU Urban Acoustic Scenes 2019 dataset, consisting of recordings from various acoustic scenes. This dataset extends the TUT Urban Acoustic Scenes 2018 dataset with other 6 cities to a total of 12 large European cities. For each scene class, recordings were done in different locations; for each recording location there are 5-6 minutes of audio. The original recordings were split into segments with a length of 10 seconds that are provided in individual files. Available information about the recordings include the following: acoustic scene class, city, and recording location.

Acoustic scenes (10):

  • Airport - airport
  • Indoor shopping mall - shopping_mall
  • Metro station - metro_station
  • Pedestrian street - street_pedestrian
  • Public square - public_square
  • Street with medium level of traffic - street_traffic
  • Travelling by a tram - tram
  • Travelling by a bus - bus
  • Travelling by an underground metro - metro
  • Urban park - park

Data was recorded in the following cities:

  • Amsterdam
  • Barcelona
  • Helsinki
  • Lisbon
  • London
  • Lyon
  • Madrid
  • Milan
  • Prague
  • Paris
  • Stockholm
  • Vienna

Recording procedure

Recordings were made using four devices that captured audio simultaneously.

The main recording device consists in Soundman OKM II Klassik/studio A3, electret binaural microphone and a Zoom F8 audio recorder using 48kHz sampling rate and 24 bit resolution. The microphones are specifically made to look like headphones, being worn in the ears. As an effect of this, the recorded audio is very similar to the sound that reaches the human auditory system of the person wearing the equipment. This equipment is further referred to as device A.

The other devices are commonly available customer devices: device B is a Samsung Galaxy S7, device C is IPhone SE, and device D is a GoPro Hero5 Session. All simultaneous recordings are time synchronized.

The dataset was collected by Tampere University of Technology between 05/2018 - 11/2018. The data collection received funding from the European Research Council, grant agreement 637422 EVERYSOUND.

ERC

Development and evaluation datasets

Different versions of the dataset are provided depending on the task.

TAU Urban Acoustic Scenes 2019 development dataset contains only material recorded with device A, containing 40 hours of audio, balanced between classes. The data comes from 10 of the 12 cities. TAU Urban Acoustic Scenes 2019 evaluation dataset contains data from all 12 cities.

TAU Urban Acoustic Scenes 2019 Mobile development dataset contains material recorded with devices A, B and C. It is composed of TAU Urban Acoustic Scenes 2019 data recorded with device A, and some amount of parallel audio recorded with devices B and C. Data from device A was resampled and averaged into a single channel, to align with the properties of the data recorded with devices B and C. The dataset contains in total 46 hours of audio (40h + 3h + 3h). TAU Urban Acoustic Scenes 2019 Mobile development dataset contains also data from device D.

TAU Urban Acoustic Scenes 2019 Open set development dataset contains only material recorded with device A, being composed of TAU Urban Acoustic Scenes 2019 and additional audio examples for the open classification problem. The "open" data consists of the "beach" and "office" classes of TUT Acoustic Scenes 2017 dataset and other material recorded in 2019. The dataset contains in total 46 hours of audio (40h + 6h). TAU Urban Acoustic Scenes 2019 Open set evaluation dataset contains data from the 10 known classes, and other unknown ones.

Reference labels

Reference labels are provided only for the development datasets. Reference labels for evaluation dataset or leaderboard dataset will not be released. For publications based on the DCASE challenge data, please use the provided training/test setup of the development set, to allow comparisons. After the challenge, if you want to evaluate your proposed system with official challenge evaluation setup, contact the task coordinators. Task coordinators can provide unofficial scoring for limited amount of system outputs.

Download

Subtask A




Subtask B




Subtask C


Dataset was updated on 12 March 2019 to include train/test setup (version 2). In order to update already downloaded the dataset version 1, update only TAU-urban-acoustic-scenes-2019-openset-development.meta.zip file.



Task setup

For each subtask, a development set is provided, together with a training/test partitioning for system development. Participants are required to report performance of their system using this train/test setup in order to allow comparison of systems on the development set.

Subtask A

A Match Task 1 Acoustic Scene Classification

This subtask is concerned with the basic problem of acoustic scene classification, in which all data (development and evaluation) are recorded with the same device, in this case device A, and contains only data from the 10 known acoustic scene classes. The subtask uses TAU Urban Acoustic Scenes 2019 dataset.

Development dataset

The development dataset consists of recordings from ten cities; the training subset contains recordings from only 9 of the cities, to test the generalization properties of the systems. The training/test subsets are created based on the recording location such that the training subset contains approximately 70% of recording locations from each city. The test subset contains recordings from the rest of the locations, and few locations from the tenth city. Full data from the tenth city is provided, but partly unused in this setup, to reflect the final evaluation setup.
The development set contains 40 hours of data, with 14400 segments (144 per city per acoustic scene class). The training/test setup includes segments from Milan only to the test subset. There are 9185 segments in the training set, 4185 in the test set, and additional 1030 segments from Milan. For complete details on the dataset, check the readme file provided with the data.

Participants are allowed to create their own cross-validation folds or separate validation set. In this case please pay attention to the segments recorded at same location. Location identifier can be found from metadata file provided in the dataset or from audio file names:

[scene label]-[city]-[location id]-[segment id]-[device id].wav

Make sure that all files having same location id are placed on the same side of the evaluation. In this subtask, device id is always a.

Evaluation dataset

The evaluation dataset contains 20 hours of audio data from 12 cities (2 cities not encountered in development set), and it is provided without ground truth. Participants should run their system for this dataset, and submit the classification results (system output) to DCASE2019 Challenge.

Subtask B

B Mismatch Task 1 Acoustic Scene Classification with mismatched recording devices

This subtask is concerned with the situation in which an application will be tested with different devices, possibly not the same as the ones used to record the development data. In this case, evaluation data contains more devices than the development data. The subtask uses TAU Urban Acoustic Scenes 2019 Mobile dataset.

Development dataset

The development set consists of data recorded with 3 devices: A, B and C. This includes all data from the development set of subtask A (40 hours), partitioned in the same way. In addition, parallel recordings are provided from devices B and C, amounting to 3 hours for each. From devices B and C, half of the data is included to the training subset, half to the test subset. The development set contains in total 46 hours of data, with 16560 segments, of which 14400 from device A, 1080 from device B, 1080 from device C. There are 10265 segments in the training set (9185 for device A, 540 for device B, and 540 for device C), 5265 in the test set (4185 for device A, 540 for device B, and 540 device C), and additional 1030 segments from Milan. For complete details on the dataset, check the readme file provided with the data.

Participants are allowed to create their own cross-validation folds or separate validation set. In this case please pay attention to the segments recorded at same location. Location identifier can be found from metadata file provided in the dataset or from audio file names:

[scene label]-[city]-[location id]-[segment id]-[device id].wav

Make sure that all files having same location id are placed on the same side of the evaluation. In this subtask, device id can be a, b or c.

Evaluation dataset

The evaluation dataset contains data from all 4 devices, including device D that was not available in the development set. It contains 30 hours of audio and it is provided without ground truth. Participants should run their system for this dataset, and submit the classification results (system output) to DCASE2019 Challenge.

Subtask C

C OpenSet Task 1 Open set Acoustic Scene Classification

This subtask is concerned with acoustic scene classification where the test recording may be from a different environment than the 10 target classes, in which case it should be classified as "unknown", in a so-called open-set classification setup. The subtask uses TAU Urban Acoustic Scenes 2019 Openset dataset and some additional data providing examples of "unknown" acoustic scenes.

Participants should make good use of external data in order to model the case of scenes not encountered within the training data. The provided examples allow only limited generalization, and may overfit to their original dataset due to lack of sufficient variety.

Figure 1: Overview of acoustic scene classification system capable recognizing unknown scene class.

Development dataset

The development dataset consists of data from the 10 target classes and additional "unknown" class examples. The dataset includes all data from the development set of Subtask A (40 hours), partitioned in the same way. In addition, recordings are provided for modeling and testing the open-set classification task. The unknown class consists of audio examples from TUT Acoustic Scenes 2017 dataset and new material recorded during the collection of TAU Urban Acoustic Scenes 2019 dataset. The development set contains 44 hours of data (40+4), with 15850 segments (14400 of ten scene classes + 1450 unknown class). Complete details on the dataset are provided in the readme file. In addition, correspondence of "unknown" class examples with their original acoustic scenes and file names is provided in meta_unknown.csv.

Participants are allowed to create their own cross-validation folds or separate validation set. In this case please pay attention to the segments recorded at same location. Location identifier can be found from metadata file provided in the dataset or from audio file names:

[scene label]-[city]-[location id]-[segment id]-[device id].wav

Make sure that all files having same location id are placed on the same side of the evaluation. In this subtask, device id is always a.

Evaluation dataset

The evaluation dataset contains 20 hours of audio data, of which part is recorded in one of the 10 known classes, and part in other, unknown environments, different than the ones in the development set. The evaluation dataset is provided without ground truth. Participants should run their system for this dataset, and submit the classification results (system output) to DCASE2019 Challenge.

External data resources

Use of external data is allowed in all subtasks under the following conditions:

  • The used external resource is clearly referenced and freely accessible to any other research group in the world. External data refers to public datasets or trained models. The dataset/models must be public and freely available before 1st of April 2019.
  • Participants submit at least one system without external training data so that we can study the contribution of such resources. The list of external data sources used in training must be clearly indicated in the technical report.
  • Participants inform the organizers in advance about such data sources, so that all competitors know about them and have equal opportunity to use them; please send and email to the task coordinators; we will update the list of external datasets on the webpage accordingly. Once the evaluation set is published, the list of allowed external data resources is locked (no further external sources allowed).
  • It is not allowed to use TUT Acoustic Scenes 2016, TUT Acoustic Scenes 2017 and TUT Urban Acoustic Scenes 2018. These datasets are partially included in the current setup, and additional usage will lead to overfitting.

List of external datasets allowed:

Dataset name Type Added Link
LITIS Rouen audio scene dataset audio 04.03.2019 https://sites.google.com/site/alainrakotomamonjy/home/audio-scene
DCASE2013 Challenge - Public Dataset for Scene Classification Task audio 04.03.2019 https://archive.org/details/dcase2013_scene_classification
DCASE2013 Challenge - Private Dataset for Scene Classification Task audio 04.03.2019 https://archive.org/details/dcase2013_scene_classification_testset
Dares G1 audio 04.03.2019 http://www.daresounds.org/
AudioSet audio 04.03.2019 https://research.google.com/audioset/


Participants cannot suggest data to this list anymore (list locked 27th of May 2019).

Submission

Participants can choose subtasks they participate, there is no requirement to participate all of them. Official challenge submission consists of a technical report and system output for the evaluation data.

System output should be presented as a single text-file (in CSV format, without header row) containing classification result for each audio file in the evaluation set. Result items can be in any order. Format:

[filename (string)][tab][scene label (string)]

Multiple system outputs can be submitted (maximum 4 per participant per subtask). For each system, meta information should be provided in a separate file, containing the task specific information as given in the example here. All files should be packaged into a zip file for submission. Please carefully mark the connection between the submitted files and the corresponding system or system parameters (for example by naming the text file appropriately).

When training the final system for submission, participants can of course use the entire development set. In the technical report, participants should include system results on the training/test setup provided with the development set.

Detailed information for the submission can be found on the Submission page.

Public leaderboards

During the challenge, a public leaderboard will be provided using a separate public evaluation dataset for each subtask. The leaderboards are organized through Kaggle InClass competitions. Leaderboards are meant to serve as a development tool for participants, and does not have an official role in the challenge.

Due to Kaggle / US Government policy, people who are residents of certain countries (Cuba, Iran, Syria, North Korea, and Sudan) are unable to participate in the Kaggle competitions (see Kaggle terms, section 7 What are the rules for competitions on Kaggle?). As DCASE is committed to open science open to everybody, in case these Kaggle restrictions are preventing you from using the Kaggle based leaderboard during the development, please contact task 1 organizers and we will provide similar service outside Kaggle.

A Match Task 1 Subtask A Leaderboard

B Mismatch Task 1 Subtask B Leaderboard

C OpenSet Task 1 Subtask C Leaderboard

The official DCASE challenge submission will not be done through these Kaggle InClass competitions.

Datasets

For public leaderboard submissions, participants should use the official challenge development datasets to train their system as in DCASE challenge. Separate datasets, leaderboard datasets, are released to be used as evaluation datasets in the competitions. These leaderboard datasets consist of a small subset of the official evaluation dataset, with similar properties (distribution). The material amount in the leaderboard dataset is considerably lower than the official evaluation material in the DCASE challenge.

It is not allowed to use the leaderboard datasets to train the systems in any DCASE challenge subtasks or leaderboard competitions.




Task rules

There are general rules valid for all tasks; these, along with information on technical report and submission requirements can be found here.

Task specific rules:

  • Use of external data is allowed, except TUT Acoustic Scenes 2016, TUT Acoustic Scenes 2017, TUT Urban Acoustic Scenes 2018 and leaderboard datasets (DCASE2018 and DCASE2019).
  • Manipulation of provided training and development data is allowed (e.g. by mixing data sampled from a pdf or using techniques such as pitch shifting or time stretching).
  • Participants are not allowed to make subjective judgments of the evaluation data, nor to annotate it. The evaluation dataset cannot be used to train the submitted system; the use of statistics about the evaluation data in the decision making is also forbidden. Separately published leaderboard data is considered as evaluation data as well.
  • Classification decision must be done independently for each test sample.

Evaluation

The scoring of acoustic scene classification will be based on classification accuracy: the number of correctly classified segments among the total number of segments. Each segment is considered an independent test sample. Accuracy will be calculated as average of the class-wise accuracy.

Participants can use sed_eval toolbox for the evaluation:


Ranking

  • Subtask A will use the overall accuracy on the evaluation data.
  • Subtask B will use the overall accuracy on data from devices B and C.
  • Subtask C will use the weighted average of the known classes and unknown class:
\begin{equation} ACC_{weighted} = 0.5 * ACC_{known~classes} + 0.5 * ACC_{unknown~classes} \end{equation}

Results

Subtask A

Rank Submission Information
Code Author Affiliation Technical
Report
Accuracy
with 95%
confidence interval
Bilot_IDG_task1a_1 Valentin Bilot Audio R&D, InterDigital R&D, Rennes, France task-acoustic-scene-classification-results-a#Bilot2019 66.1 (65.0 - 67.2)
Bilot_IDG_task1a_2 Valentin Bilot Audio R&D, InterDigital R&D, Rennes, France task-acoustic-scene-classification-results-a#Bilot2019 67.3 (66.3 - 68.4)
Bilot_IDG_task1a_3 Valentin Bilot Audio R&D, InterDigital R&D, Rennes, France task-acoustic-scene-classification-results-a#Bilot2019 64.5 (63.4 - 65.6)
Bilot_IDG_task1a_4 Valentin Bilot Audio R&D, InterDigital R&D, Rennes, France task-acoustic-scene-classification-results-a#Bilot2019 68.3 (67.3 - 69.4)
Chandrasekhar_IIITH_task1a_1 Chandrasekhar Paseddula International Institute of Information Technology, Hyderabad department:Electronics and Communication Engineering, Hyderabad, India task-acoustic-scene-classification-results-a#Paseddula2019 52.6 (51.4 - 53.7)
DSPLAB_TJU_task1a_1 Jinhua Liang School of Electrical and Information Engineering, TianJin University, Tianjin, China task-acoustic-scene-classification-results-a#Ding2019 66.5 (65.4 - 67.6)
DSPLAB_TJU_task1a_2 Jinhua Liang School of Electrical and Information Engineering, TianJin University, Tianjin, China task-acoustic-scene-classification-results-a#Ding2019 69.6 (68.5 - 70.6)
DSPLAB_TJU_task1a_3 Jinhua Liang School of Electrical and Information Engineering, TianJin University, Tianjin, China task-acoustic-scene-classification-results-a#Ding2019 65.0 (63.9 - 66.1)
DSPLAB_TJU_task1a_4 Jinhua Liang School of Electrical and Information Engineering, TianJin University, Tianjin, China task-acoustic-scene-classification-results-a#Ding2019 69.5 (68.4 - 70.5)
Fmta91_KNToosi_task1a_1 fateme Arabnezhad Computer Engineering Department, Khaje Nasir Toosi, Tehran, Iran task-acoustic-scene-classification-results-a#Arabnezhad2019 76.2 (75.2 - 77.2)
Fraile_UPM_task1a_1 Ruben Fraile CITSEM, Universidad Politecnica de Madrid, Madrid, Spain task-acoustic-scene-classification-results-a#Fraile2019 58.7 (57.6 - 59.9)
DCASE2019 baseline Toni Heittola Computing Sciences, Tampere University, Tampere, Finland task-acoustic-scene-classification-results-a#Heittola2019 63.3 (62.2 - 64.5)
Huang_IL_task1a_1 Paulo Lopez Meyer Intel Labs, Intel Corporation, Zapopan, Jalisco, Mexico task-acoustic-scene-classification-results-a#Huang2019 80.5 (79.6 - 81.4)
Huang_IL_task1a_2 Paulo Lopez Meyer Intel Labs, Intel Corporation, Zapopan, Jalisco, Mexico task-acoustic-scene-classification-results-a#Huang2019 81.1 (80.2 - 82.0)
Huang_IL_task1a_3 Paulo Lopez Meyer Intel Labs, Intel Corporation, Zapopan, Jalisco, Mexico task-acoustic-scene-classification-results-a#Huang2019 81.3 (80.4 - 82.2)
Huang_IL_task1a_4 Paulo Lopez Meyer Intel Labs, Intel Corporation, Zapopan, Jalisco, Mexico task-acoustic-scene-classification-results-a#Huang2019 79.5 (78.6 - 80.5)
Huang_SCNU_task1a_1 Zhenyi Huang School of Computer, South China Normal University, Guangzhou, China task-acoustic-scene-classification-results-a#Huang2019a 79.2 (78.3 - 80.1)
JSNU_WDXY_task1a_1 Xinixn Ma School of Physics and Electronic, Jiangsu Normal University, Xuzhou, China task-acoustic-scene-classification-results-a#Ma2019 72.2 (71.1 - 73.2)
Jung_UOS_task1a_1 Ha-Jin Yu Computing Sciences, Univerisity of Seoul, Seoul, Republic of Korea task-acoustic-scene-classification-results-a#Jung2019 81.1 (80.2 - 82.0)
Jung_UOS_task1a_2 Ha-jin Yu Computing Sciences, Univerisity of Seoul, Seoul, Republic of Korea task-acoustic-scene-classification-results-a#Jung2019 81.2 (80.3 - 82.1)
Jung_UOS_task1a_3 Ha-jin Yu Computing Sciences, Univerisity of Seoul, Seoul, Republic of Korea task-acoustic-scene-classification-results-a#Jung2019 81.0 (80.1 - 81.9)
Jung_UOS_task1a_4 Ha-jin Yu Computing Sciences, Univerisity of Seoul, Seoul, Republic of Korea task-acoustic-scene-classification-results-a#Jung2019 81.2 (80.3 - 82.1)
KK_I2R_task1a_1 Teh KK I2R, A-star, Singapore task-acoustic-scene-classification-results-a#KK2019 76.6 (75.6 - 77.6)
KK_I2R_task1a_2 Teh KK I2R, A-star, Singapore task-acoustic-scene-classification-results-a#KK2019 77.7 (76.7 - 78.6)
KK_I2R_task1a_3 Teh KK I2R, A-star, Singapore task-acoustic-scene-classification-results-a#KK2019 77.2 (76.2 - 78.2)
Kong_SURREY_task1a_1 Qiuqiang Kong Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, England task-acoustic-scene-classification-results-a#Kong2019 70.5 (69.5 - 71.6)
Koutini_CPJKU_task1a_1 Khaled Koutini Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria task-acoustic-scene-classification-results-a#Koutini2019 82.8 (82.0 - 83.7)
Koutini_CPJKU_task1a_2 Khaled Koutini Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria task-acoustic-scene-classification-results-a#Koutini2019 83.7 (82.9 - 84.6)
Koutini_CPJKU_task1a_3 Khaled Koutini Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria task-acoustic-scene-classification-results-a#Koutini2019 83.5 (82.6 - 84.4)
Koutini_CPJKU_task1a_4 Khaled Koutini Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria task-acoustic-scene-classification-results-a#Koutini2019 83.8 (82.9 - 84.6)
LamPham_HCMGroup_task1a_1 Lam Pham School of Computing, University of Kent, Chatham, United Kingdom task-acoustic-scene-classification-results-a#Pham2019 73.9 (72.9 - 74.9)
LamPham_KentGroup_task1a_1 Lam Pham School of Computing, University of Kent, Chatham, United Kingdom task-acoustic-scene-classification-results-a#Pham2019a 76.8 (75.8 - 77.7)
Lei_CQU_task1a_1 Chongqin Lei Intelligent Information Technology and System Lab, CHONGQING UNIVERSITY, Chongqing, China task-acoustic-scene-classification-results-a#Lei2019 75.5 (74.5 - 76.5)
Li_NPU_task1a_1 Ning FangLi Mechanical Engineering, Northwestern Polytechnical University School, 127 West Youyi Road, Xi'an, 710072, China task-acoustic-scene-classification-results-a#FangLi2019 59.9 (58.8 - 61.0)
Li_NPU_task1a_2 Ning FangLi Mechanical Engineering, Northwestern Polytechnical University School, 127 West Youyi Road, Xi'an, 710072, China task-acoustic-scene-classification-results-a#FangLi2019 61.8 (60.7 - 62.9)
Liang_HUST_task1a_1 Han Liang Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, China task-acoustic-scene-classification-results-a#Liang2019 68.2 (67.1 - 69.2)
Liang_HUST_task1a_2 Han Liang Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, China task-acoustic-scene-classification-results-a#Liang2019 66.4 (65.3 - 67.5)
Liu_SCUT_task1a_1 Liu Mingle School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province task-acoustic-scene-classification-results-a#Mingle2019 78.3 (77.4 - 79.3)
Liu_SCUT_task1a_2 Liu Mingle School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province task-acoustic-scene-classification-results-a#Mingle2019 79.9 (79.0 - 80.8)
Liu_SCUT_task1a_3 Liu Mingle School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province task-acoustic-scene-classification-results-a#Mingle2019 78.3 (77.3 - 79.2)
Liu_SCUT_task1a_4 Liu Mingle School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province task-acoustic-scene-classification-results-a#Mingle2019 78.4 (77.4 - 79.3)
MaLiu_BIT_task1a_1 Sifan Ma Laboratory of Modern Communication, Beijing Institute of Technology, Beijing, China task-acoustic-scene-classification-results-a#Ma2019a 72.8 (71.8 - 73.8)
MaLiu_BIT_task1a_2 Wei Liu Laboratory of Modern Communication, Beijing Institute of Technology, Beijing, China task-acoustic-scene-classification-results-a#Liu2019 76.0 (75.1 - 77.0)
MaLiu_BIT_task1a_3 Sifan Ma Laboratory of Modern Communication, Beijing Institute of Technology, Beijing, China task-acoustic-scene-classification-results-a#Ma2019a 73.3 (72.3 - 74.3)
Mars_PRDCSG_task1a_1 Rohith Mars Core Technology Group, Panasonic R&D Center, Singapore, Singapore task-acoustic-scene-classification-results-a#Mars2019 79.3 (78.3 - 80.2)
McDonnell_USA_task1a_1 Mark McDonnell School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia task-acoustic-scene-classification-results-a#Gao2019 80.0 (79.0 - 80.9)
McDonnell_USA_task1a_2 Mark McDonnell School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia task-acoustic-scene-classification-results-a#Gao2019 80.5 (79.6 - 81.4)
McDonnell_USA_task1a_3 Mark McDonnell School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia task-acoustic-scene-classification-results-a#Gao2019 80.4 (79.5 - 81.3)
McDonnell_USA_task1a_4 Mark McDonnell School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia task-acoustic-scene-classification-results-a#Gao2019 80.3 (79.4 - 81.2)
Naranjo-Alcazar_VfyAI_task1a_1 Javier Naranjo-Alcazar Visualfy AI, Visualfy, Benisano, Spain task-acoustic-scene-classification-results-a#Naranjo-Alcazar2019 74.1 (73.1 - 75.2)
Naranjo-Alcazar_VfyAI_task1a_2 Javier Naranjo-Alcazar Visualfy AI, Visualfy, Benisano, Spain task-acoustic-scene-classification-results-a#Naranjo-Alcazar2019 74.2 (73.2 - 75.2)
Naranjo-Alcazar_VfyAI_task1a_3 Javier Naranjo-Alcazar Visualfy AI, Visualfy, Benisano, Spain task-acoustic-scene-classification-results-a#Naranjo-Alcazar2019 74.0 (73.0 - 75.0)
Naranjo-Alcazar_VfyAI_task1a_4 Javier Naranjo-Alcazar Visualfy AI, Visualfy, Benisano, Spain task-acoustic-scene-classification-results-a#Naranjo-Alcazar2019 74.1 (73.1 - 75.1)
Plata_SRPOL_task1a_1 Marcin Plata Data Intelligence Group, Samsung R&D Institute Poland, Warsaw, Poland task-acoustic-scene-classification-results-a#Plata2019 78.8 (77.9 - 79.8)
Plata_SRPOL_task1a_2 Marcin Plata Data Intelligence Group, Samsung R&D Institute Poland, Warsaw, Poland task-acoustic-scene-classification-results-a#Plata2019 79.2 (78.3 - 80.1)
Plata_SRPOL_task1a_3 Marcin Plata Data Intelligence Group, Samsung R&D Institute Poland, Warsaw, Poland task-acoustic-scene-classification-results-a#Plata2019 77.2 (76.3 - 78.2)
Plata_SRPOL_task1a_4 Marcin Plata Data Intelligence Group, Samsung R&D Institute Poland, Warsaw, Poland task-acoustic-scene-classification-results-a#Plata2019 77.9 (77.0 - 78.9)
SSW_ETRI_task1a_1 Suh Sangwon Realistic AV Research Group, Electronics and Telecommunications Research Institute, Daejeon, Korea task-acoustic-scene-classification-results-a#Sangwon2019 66.7 (65.6 - 67.8)
SSW_ETRI_task1a_2 Suh Sangwon Realistic AV Research Group, Electronics and Telecommunications Research Institute, Daejeon, Korea task-acoustic-scene-classification-results-a#Sangwon2019 67.0 (65.9 - 68.1)
SSW_ETRI_task1a_3 Suh Sangwon Realistic AV Research Group, Electronics and Telecommunications Research Institute, Daejeon, Korea task-acoustic-scene-classification-results-a#Sangwon2019 67.6 (66.5 - 68.7)
SSW_ETRI_task1a_4 Suh Sangwon Realistic AV Research Group, Electronics and Telecommunications Research Institute, Daejeon, Korea task-acoustic-scene-classification-results-a#Sangwon2019 67.6 (66.5 - 68.7)
Salvati_DMIF_task1a_1 Daniele Salvati Mathematics, Computer Science and Physics, University of Udine, Udine, Italy task-acoustic-scene-classification-results-a#Salvati2019 68.5 (67.5 - 69.6)
Seo_LGE_task1a_1 Seo Hyeji Advanced Robotics Lab, LG Electronics, Seoul, Korea task-acoustic-scene-classification-results-a#Hyeji2019 81.6 (80.7 - 82.5)
Seo_LGE_task1a_2 Seo Hyeji Advanced Robotics Lab, LG Electronics, Seoul, Korea task-acoustic-scene-classification-results-a#Hyeji2019 82.5 (81.6 - 83.4)
Seo_LGE_task1a_3 Seo Hyeji Advanced Robotics Lab, LG Electronics, Seoul, Korea task-acoustic-scene-classification-results-a#Hyeji2019 81.1 (80.2 - 82.0)
Seo_LGE_task1a_4 Seo Hyeji Advanced Robotics Lab, LG Electronics, Seoul, Korea task-acoustic-scene-classification-results-a#Hyeji2019 82.5 (81.7 - 83.4)
Waldekar_IITKGP_task1a_1 Shefali Waldekar Electronics and Electrical Communication Engineering Dept., Indian Institute of Technology Kharagpur, Kharagpur, India task-acoustic-scene-classification-results-a#Waldekar2019 65.9 (64.8 - 67.0)
Wang_BTBU_task1a_1 Zhuhe Wang Noise and Vibration Laboratory, Beijing Technology and Business University, Beijing, China task-acoustic-scene-classification-results-a#Wang2019 32.2 (31.1 - 33.3)
Wang_NWPU_task1a_1 Mou Wang School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China task-acoustic-scene-classification-results-a#Wang2019a_t1 80.6 (79.7 - 81.5)
Wang_NWPU_task1a_2 Mou Wang School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China task-acoustic-scene-classification-results-a#Wang2019a 80.1 (79.1 - 81.0)
Wang_NWPU_task1a_3 Mou Wang School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China task-acoustic-scene-classification-results-a#Wang2019a 76.6 (75.6 - 77.6)
Wang_NWPU_task1a_4 Mou Wang School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China task-acoustic-scene-classification-results-a#Wang2019a 76.8 (75.8 - 77.8)
Wang_SCUT_task1a_1 Wucheng Wang School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province task-acoustic-scene-classification-results-a#Wang2019b 76.4 (75.4 - 77.4)
Wang_SCUT_task1a_2 Wucheng Wang School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province task-acoustic-scene-classification-results-a#Wang2019b 76.6 (75.6 - 77.5)
Wang_SCUT_task1a_3 Wucheng Wang School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province task-acoustic-scene-classification-results-a#Wang2019b 75.9 (74.9 - 76.9)
Wang_SCUT_task1a_4 Wucheng Wang School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province task-acoustic-scene-classification-results-a#Wang2019b 76.5 (75.5 - 77.5)
Wilkinghoff_FKIE_task1a_1 Kevin Wilkinghoff Communication Systems, Fraunhofer Institute for Communication, Information Processing and Ergonomics, Wachtberg, Germany task-acoustic-scene-classification-results-a#Wilkinghoff2019 74.6 (73.6 - 75.6)
Wilkinghoff_FKIE_task1a_2 Kevin Wilkinghoff Communication Systems, Fraunhofer Institute for Communication, Information Processing and Ergonomics, Wachtberg, Germany task-acoustic-scene-classification-results-a#Wilkinghoff2019 76.2 (75.2 - 77.2)
Wu_CUHK_task1a_1 Yuzhong Wu Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, China task-acoustic-scene-classification-results-a#Wu2019 80.1 (79.1 - 81.0)
Yang_UESTC_task1a_1 Yang Haocong Electronic Engineering, University of Electronic Science and Technology of China, Chengdu, China task-acoustic-scene-classification-results-a#Haocong2019 79.9 (78.9 - 80.8)
Yang_UESTC_task1a_2 Yang Haocong Electronic Engineering, University of Electronic Science and Technology of China, Chengdu, China task-acoustic-scene-classification-results-a#Haocong2019 81.6 (80.7 - 82.5)
Yang_UESTC_task1a_3 Yang Haocong Electronic Engineering, University of Electronic Science and Technology of China, Chengdu, China task-acoustic-scene-classification-results-a#Haocong2019 81.2 (80.3 - 82.1)
Zeinali_BUT_task1a_1 Hossein Zeinali Information Technology, Brno University of Technology, Brno, Czech Republic task-acoustic-scene-classification-results-a#Zeinali2019 78.9 (78.0 - 79.9)
Zeinali_BUT_task1a_2 Hossein Zeinali Information Technology, Brno University of Technology, Brno, Czech Republic task-acoustic-scene-classification-results-a#Zeinali2019 78.9 (77.9 - 79.8)
Zeinali_BUT_task1a_3 Hossein Zeinali Information Technology, Brno University of Technology, Brno, Czech Republic task-acoustic-scene-classification-results-a#Zeinali2019 79.1 (78.1 - 80.0)
Zhang_IOA_task1a_1 Pengyuan Zhang Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Beijing, China task-acoustic-scene-classification-results-a#Chen2019 84.9 (84.1 - 85.7)
Zhang_IOA_task1a_2 Pengyuan Zhang Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Beijing, China task-acoustic-scene-classification-results-a#Chen2019 84.9 (84.1 - 85.8)
Zhang_IOA_task1a_3 Pengyuan Zhang Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Beijing, China task-acoustic-scene-classification-results-a#Chen2019 85.2 (84.4 - 86.0)
Zhang_IOA_task1a_4 Pengyuan Zhang Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Beijing, China task-acoustic-scene-classification-results-a#Chen2019 84.8 (83.9 - 85.6)
Zheng_USTC_task1a_1 Xu Zheng Computing Sciences, University of Science of Techonology of China, Hefei,Anhui,China task-acoustic-scene-classification-results-a#Zheng2019 75.7 (74.7 - 76.7)
Zheng_USTC_task1a_2 Xu Zheng Computing Sciences, University of Science of Techonology of China, Hefei,Anhui,China task-acoustic-scene-classification-results-a#Zheng2019 71.3 (70.3 - 72.4)
Zheng_USTC_task1a_3 Xu Zheng Computing Sciences, University of Science of Techonology of China, Hefei,Anhui,China task-acoustic-scene-classification-results-a#Zheng2019 78.9 (77.9 - 79.8)
Zhou_Kuaiyu_task1a_1 Nai Zhou Beijing Kuaiyu Electronics Co., Ltd., Beijing, China task-acoustic-scene-classification-results-a#Zhou2019_t1 79.8 (78.8 - 80.7)
Zhou_Kuaiyu_task1a_2 Nai Zhou Beijing Kuaiyu Electronics Co., Ltd., Beijing, China task-acoustic-scene-classification-results-a#Zhou2019_t1 79.4 (78.5 - 80.4)
Zhou_Kuaiyu_task1a_3 Nai Zhou Beijing Kuaiyu Electronics Co., Ltd., Beijing, China task-acoustic-scene-classification-results-a#Zhou2019_t1 78.7 (77.7 - 79.6)
Zhu_SSLabBUPT_task1a_1 Houwei Zhu Speech Lab, Samsung Research China-Beijing, Beijing, China task-acoustic-scene-classification-results-a#Zhu2019 79.2 (78.3 - 80.1)
Zhu_SSLabBUPT_task1a_2 Houwei Zhu Speech Lab, Samsung Research China-Beijing, Beijing, China task-acoustic-scene-classification-results-a#Zhu2019 78.8 (77.9 - 79.7)
Zhu_SSLabBUPT_task1a_3 Houwei Zhu Speech Lab, Samsung Research China-Beijing, Beijing, China task-acoustic-scene-classification-results-a#Zhu2019 79.1 (78.2 - 80.1)
Zhu_SSLabBUPT_task1a_4 Houwei Zhu Speech Lab, Samsung Research China-Beijing, Beijing, China task-acoustic-scene-classification-results-a#Zhu2019 78.8 (77.8 - 79.7)


Complete results and technical reports can be found at subtask A results page

Subtask B

Rank Submission Information
Code Author Affiliation Technical
Report
Accuracy
with 95%
confidence interval
Eghbal-zadeh_CPJKU_task1b_1 Khaled Koutini Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria task-acoustic-scene-classification-results-b#Eghbal-zadeh2019 74.5 (73.5 - 75.5)
Eghbal-zadeh_CPJKU_task1b_2 Khaled Koutini Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria task-acoustic-scene-classification-results-b#Eghbal-zadeh2019 74.5 (73.5 - 75.5)
Eghbal-zadeh_CPJKU_task1b_3 Khaled Koutini Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria task-acoustic-scene-classification-results-b#Eghbal-zadeh2019 73.4 (72.4 - 74.5)
Eghbal-zadeh_CPJKU_task1b_4 Khaled Koutini Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria task-acoustic-scene-classification-results-b#Eghbal-zadeh2019 73.4 (72.3 - 74.4)
DCASE2019 baseline Toni Heittola Computing Sciences, Tampere University, Tampere, Finland task-acoustic-scene-classification-results-b#Heittola2019 47.7 (46.5 - 48.8)
Jiang_UESTC_task1b_1 Shengwang Jiang School of Communication and Information Engineering, University of Electronic Science and Technology of China, Chengdu, China task-acoustic-scene-classification-results-b#Jiang2019 70.3 (69.2 - 71.3)
Jiang_UESTC_task1b_2 Shengwang Jiang School of Communication and Information Engineering, University of Electronic Science and Technology of China, Chengdu, China task-acoustic-scene-classification-results-b#Jiang2019 69.9 (68.9 - 71.0)
Jiang_UESTC_task1b_3 Shengwang Jiang School of Communication and Information Engineering, University of Electronic Science and Technology of China, Chengdu, China task-acoustic-scene-classification-results-b#Jiang2019 69.0 (68.0 - 70.1)
Jiang_UESTC_task1b_4 Shengwang Jiang School of Communication and Information Engineering, University of Electronic Science and Technology of China, Chengdu, China task-acoustic-scene-classification-results-b#Jiang2019 69.6 (68.6 - 70.7)
Kong_SURREY_task1b_1 Qiuqiang Kong Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, England task-acoustic-scene-classification-results-b#Kong2019 61.6 (60.4 - 62.7)
Kosmider_SRPOL_task1b_1 Michał Kośmider Artificial Intelligence, Samsung R&D Institute Poland, Warsaw, Poland task-acoustic-scene-classification-results-b#Komider2019 75.1 (74.1 - 76.1)
Kosmider_SRPOL_task1b_2 Michał Kośmider Artificial Intelligence, Samsung R&D Institute Poland, Warsaw, Poland task-acoustic-scene-classification-results-b#Komider2019 75.3 (74.3 - 76.3)
Kosmider_SRPOL_task1b_3 Michał Kośmider Artificial Intelligence, Samsung R&D Institute Poland, Warsaw, Poland task-acoustic-scene-classification-results-b#Komider2019 74.9 (73.9 - 75.9)
Kosmider_SRPOL_task1b_4 Michał Kośmider Artificial Intelligence, Samsung R&D Institute Poland, Warsaw, Poland task-acoustic-scene-classification-results-b#Komider2019 75.2 (74.3 - 76.2)
LamPham_KentGroup_task1b_1 Lam Pham School of Computing, University of Kent, Chatham, United Kingdom task-acoustic-scene-classification-results-b#Pham2019 72.8 (71.8 - 73.8)
McDonnell_USA_task1b_1 Mark McDonnell School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia task-acoustic-scene-classification-results-b#Gao2019 74.2 (73.2 - 75.2)
McDonnell_USA_task1b_2 Mark McDonnell School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia task-acoustic-scene-classification-results-b#Gao2019 74.1 (73.1 - 75.2)
McDonnell_USA_task1b_3 Mark McDonnell School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia task-acoustic-scene-classification-results-b#Gao2019 74.9 (73.9 - 75.9)
McDonnell_USA_task1b_4 Mark McDonnell School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia task-acoustic-scene-classification-results-b#Gao2019 74.4 (73.4 - 75.4)
Primus_CPJKU_task1b_1 Paul Primus Computational Perception, Johannes Kepler University Linz, Linz, Austria task-acoustic-scene-classification-results-b#Primus2019 71.3 (70.2 - 72.3)
Primus_CPJKU_task1b_2 Paul Primus Computational Perception, Johannes Kepler University Linz, Linz, Austria task-acoustic-scene-classification-results-b#Primus2019 73.4 (72.4 - 74.4)
Primus_CPJKU_task1b_3 Paul Primus Computational Perception, Johannes Kepler University Linz, Linz, Austria task-acoustic-scene-classification-results-b#Primus2019 71.6 (70.6 - 72.7)
Primus_CPJKU_task1b_4 Paul Primus Computational Perception, Johannes Kepler University Linz, Linz, Austria task-acoustic-scene-classification-results-b#Primus2019 74.2 (73.2 - 75.2)
Song_HIT_task1b_1 Hongwei Song Computer Sciences and Technology, Harbin Institute of Technology, Harbin, China task-acoustic-scene-classification-results-b#Song2019 67.3 (66.2 - 68.3)
Song_HIT_task1b_2 Hongwei Song Computer Sciences and Technology, Harbin Institute of Technology, Harbin, China task-acoustic-scene-classification-results-b#Song2019 72.2 (71.2 - 73.3)
Song_HIT_task1b_3 Hongwei Song Computer Sciences and Technology, Harbin Institute of Technology, Harbin, China task-acoustic-scene-classification-results-b#Song2019 72.1 (71.1 - 73.1)
Waldekar_IITKGP_task1b_1 Shefali Waldekar Electronics and Electrical Communication Engineering Dept., Indian Institute of Technology Kharagpur, Kharagpur, India task-acoustic-scene-classification-results-b#Waldekar2019 62.1 (60.9 - 63.2)
Wang_NWPU_task1b_1 Rui Wang School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China task-acoustic-scene-classification-results-b#Wang2019 65.7 (64.6 - 66.8)
Wang_NWPU_task1b_2 Rui Wang School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China task-acoustic-scene-classification-results-b#Wang2019 68.5 (67.4 - 69.6)
Wang_NWPU_task1b_3 Rui Wang School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China task-acoustic-scene-classification-results-b#Wang2019 70.3 (69.3 - 71.4)


Complete results and technical reports can be found at subtask B results page

Subtask C

Rank Submission Information
Code Author Affiliation Technical
Report
Accuracy
with 95%
confidence interval
DCASE2019 baseline Toni Heittola Computing Sciences, Tampere University, Tampere, Finland task-acoustic-scene-classification-results-c#Heittola2019 47.6 (47.1 - 48.0)
Kong_SURREY_task1c_1 Qiuqiang Kong Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, England task-acoustic-scene-classification-results-c#Kong2019 50.7 (50.2 - 51.2)
Lehner_SAL_task1c_1 Bernhard Lehner Silicon Austria Labs, JKU, Linz, Austria task-acoustic-scene-classification-results-c#Lehner2019 58.7 (58.1 - 59.2)
Lehner_SAL_task1c_2 Bernhard Lehner Silicon Austria Labs, JKU, Linz, Austria task-acoustic-scene-classification-results-c#Lehner2019 61.3 (60.7 - 61.9)
Lehner_SAL_task1c_3 Bernhard Lehner Silicon Austria Labs, JKU, Linz, Austria task-acoustic-scene-classification-results-c#Lehner2019 60.9 (60.3 - 61.5)
Lehner_SAL_task1c_4 Bernhard Lehner Silicon Austria Labs, JKU, Linz, Austria task-acoustic-scene-classification-results-c#Lehner2019 60.5 (59.9 - 61.1)
McDonnell_USA_task1c_1 Mark McDonnell School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia task-acoustic-scene-classification-results-c#Gao2019 58.2 (57.6 - 58.7)
McDonnell_USA_task1c_2 Mark McDonnell School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia task-acoustic-scene-classification-results-c#Gao2019 58.0 (57.5 - 58.6)
McDonnell_USA_task1c_3 Mark McDonnell School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia task-acoustic-scene-classification-results-c#Gao2019 58.8 (58.2 - 59.4)
McDonnell_USA_task1c_4 Mark McDonnell School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia task-acoustic-scene-classification-results-c#Gao2019 58.4 (57.9 - 59.0)
Rakowski_SRPOL_task1c_1 Alexander Rakowski Audio Intelligence, Samsung R&D Institute Poland, Warsaw, Poland task-acoustic-scene-classification-results-c#Rakowski2019_t1 57.2 (56.6 - 57.8)
Rakowski_SRPOL_task1c_2 Alexander Rakowski Audio Intelligence, Samsung R&D Institute Poland, Warsaw, Poland task-acoustic-scene-classification-results-c#Rakowski2019_t1 57.2 (56.6 - 57.8)
Rakowski_SRPOL_task1c_3 Alexander Rakowski Audio Intelligence, Samsung R&D Institute Poland, Warsaw, Poland task-acoustic-scene-classification-results-c#Rakowski2019_t1 61.6 (61.0 - 62.2)
Rakowski_SRPOL_task1c_4 Michał Kośmider Audio Intelligence, Samsung R&D Institute Poland, Warsaw, Poland task-acoustic-scene-classification-results-c#Rakowski2019_t1 64.4 (63.8 - 65.1)
Wilkinghoff_FKIE_task1c_1 Kevin Wilkinghoff Communication Systems, Fraunhofer Institute for Communication, Information Processing and Ergonomics, Wachtberg, Germany task-acoustic-scene-classification-results-c#Wilkinghoff2019 61.9 (61.3 - 62.5)
Wilkinghoff_FKIE_task1c_2 Kevin Wilkinghoff Communication Systems, Fraunhofer Institute for Communication, Information Processing and Ergonomics, Wachtberg, Germany task-acoustic-scene-classification-results-c#Wilkinghoff2019 62.1 (61.5 - 62.7)
Zhu_SRCBBUPT_task1c_1 Houwei Zhu Speech Lab, Samsung Research China-Beijing, Beijing, China task-acoustic-scene-classification-results-c#Zhu2019 67.2 (66.6 - 67.9)
Zhu_SRCBBUPT_task1c_2 Houwei Zhu Speech Lab, Samsung Research China-Beijing, Beijing, China task-acoustic-scene-classification-results-c#Zhu2019 67.4 (66.8 - 68.1)
Zhu_SRCBBUPT_task1c_3 Houwei Zhu Speech Lab, Samsung Research China-Beijing, Beijing, China task-acoustic-scene-classification-results-c#Zhu2019 66.3 (65.7 - 67.0)
Zhu_SRCBBUPT_task1c_4 Houwei Zhu Speech Lab, Samsung Research China-Beijing, Beijing, China task-acoustic-scene-classification-results-c#Zhu2019 67.1 (66.4 - 67.8)


Complete results and technical reports can be found at subtask C results page

Submissions

Subtask Teams Entries Authors Affiliations
Subtask A 38 98 111 40
Subtask B 10 29 25 10
Subtask C 6 19 19 8
Overall 46 146 120 44

Awards

This task will offer two awards, not necessarily based on the evaluation set performance ranking. These awards aim to encourage contestants to openly publish their code, and to use novel and problem-specific approaches which leverage knowledge of the audio domain. We also highly encourage student authorship.

Reproducible system award

Reproducible system award of 500 USD will be offered for the highest scoring method that is open-source and fully reproducible. For full reproducibility, the authors must provide all the information needed to run the system and achieve the reported performance. The choice of licence is left to the author, but should ideally be selected among the ones approved by the Open Source Initiative.

Judges’ award

Judges’ award of 500 USD will be offered for the method considered by the judges to be the most interesting or innovative. Criteria considered for this award include but are not limited to: originality, complexity, student participation, open-source, etc. Single model approaches are strongly preferred over ensembles; occasionally, small ensembles of different models can be considered, if the approach is innovative.

More information can be found on the Award page.


The awards are sponsored by

Gold sponsor Silver sponsor
Sonos Harman
Bronze sponsors
Cochlear.ai Oticon Sound Intelligence
Technical sponsor
Inria

Baseline system

The baseline system provides a simple entry-level state-of-the-art approach that gives reasonable results in the subtasks of Task 1. The baseline system is built on dcase_util toolbox.

The system has all needed functionality for the dataset handling, acoustic feature storing and accessing, acoustic model training and storing, and evaluation. The modular structure of the system enables participants to modify the system to their needs. The baseline system is a good starting point especially for the entry level researchers to familiarize themselves with the acoustic scene classification problem.

Repository


System description

The baseline system implements a convolutional neural network (CNN) based approach, where log mel-band energies are first extracted for each 10-second signal, and a network consisting of two CNN layers and one fully connected layer is trained to assign scene labels to the audio signals.

The baseline system is built on dcase_util toolbox. The machine learning part of the code in built on Keras (v2.2.2), using TensorFlow (v1.9.0) as backend.

Parameters

Acoustic features

  • Analysis frame 40 ms (50% hop size)
  • Log mel-band energies (40 bands)

Neural network

  • Input shape: 40 * 500 (10 seconds)
  • Architecture:

    • CNN layer #1
      • 2D Convolutional layer (filters: 32, kernel size: 7) + Batch normalization + ReLu activation
      • 2D max pooling (pool size: (5, 5)) + Dropout (rate: 30%)
    • CNN layer #2
      • 2D Convolutional layer (filters: 64, kernel size: 7) + Batch normalization + ReLu activation
      • 2D max pooling (pool size: (4, 100)) + Dropout (rate: 30%)
    • Flatten
    • Dense layer #1
      • Dense layer (units: 100, activation: ReLu )
      • Dropout (rate: 30%)
    • Output layer (activation: softmax/sigmoid)
  • Learning (epochs: 200, batch size: 16, data shuffling between epochs)

    • Optimizer: Adam (learning rate: 0.001)
  • Model selection:

    • Approximately 30% of the original training data is assigned to validation set, split done such that training and validation sets do not have segments from the same location and both sets have data from each city
    • Model performance after each epoch is evaluated on the validation set, and best performing model is selected

For Task 1A and 1B systems, the activation function for the output layer is Softmax and decision is made based on maximum output. For Task 1C, the activation function for the output layer is Sigmoid and decision is made based on threshold value (0.5); if at least one of the class values is over the threshold, the most probable target scene class is chosen, if all values are under the threshold, unknown scene class is chosen.

Results for the development dataset

Results are calculated using TensorFlow in GPU mode (using Nvidia Titan XP GPU card). Because results produced with GPU card are generally non-deterministic, the system was trained and tested 10 times; mean and standard deviation of the performance from these 10 independent trials are shown in the results tables.

Subtask A

Scene label Accuracy
Airport 48.4 %
Bus 62.3 %
Metro 65.1 %
Metro station 54.5 %
Park 83.1 %
Public square 40.7 %
Shopping mall 59.4 %
Street, pedestrian 60.9 %
Street, traffic 86.7 %
Tram 64.0 %
Average 62.5 % (± 0.6)

Note: The reported baseline system performance is not exactly reproducible due to varying setups. However, you should be able obtain very similar results.

Subtask B

Material from all three devices (A, B and C) are used for training amd testing. Results are calculated the same way as for subtask A, with mean and standard deviation of the performance from 10 independent trials shown in the results table.

Remember that ranking in this subtask will be done by devices B and C (third column in this table).

Scene label Device B Device C Average (B,C) Device A
Airport 18.3 % 24.1 % 21.2 % 51.2 %
Bus 40.4 % 70.0 % 55.2 % 68.0 %
Metro 50.7 % 36.1 % 43.4 % 62.4 %
Metro station 28.7% 36.1 % 30.0 % 54.4 %
Park 45.2 % 57.0 % 51.1 % 80.4 %
Public square 22.8 % 11.3 % 17.0 % 35.4 %
Shopping mall 63.5 % 64.8 % 64.2 % 64.4 %
Street, pedestrian 37.0 % 37.6 % 37.3 % 63.3 %
Street, traffic 77.0 % 86.5 % 81.8 % 85.8 %
Tram 12.0 % 12.6 % 12.3 % 52.2 %
Average 39.6 % (± 2.7) 43.1 % (± 2.2) 41.4 % (± 1.7) 61.9 % (± 0.8)

Note: The reported baseline system performance is not exactly reproducible due to varying setups. However, you should be able obtain very similar results.

Subtask C

Scene label Accuracy
Airport 44.1 %
Bus 59.2 %
Metro 51.5 %
Metro station 41.3 %
Park 74.0 %
Public square 34.7 %
Shopping mall 50.9 %
Street, pedestrian 47.5 %
Street, traffic 78.4 %
Tram 60.7 %
Class Average 54.2 %
Unknown 43.1 %
Accuracy (Class Average | Unknown) 48.7 % (± 3.2)

Note: The reported baseline system performance is not exactly reproducible due to varying setups. However, you should be able obtain very similar results.

Citation

If you are participating to this task or using the dataset or baseline code please cite the following paper:

Publication

Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. A multi-device dataset for urban acoustic scene classification. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018), 9–13. November 2018. URL: https://arxiv.org/abs/1807.09840.

PDF

A multi-device dataset for urban acoustic scene classification

Abstract

This paper introduces the acoustic scene classification task of DCASE 2018 Challenge and the TUT Urban Acoustic Scenes 2018 dataset provided for the task, and evaluates the performance of a baseline system in the task. As in previous years of the challenge, the task is defined for classification of short audio samples into one of predefined acoustic scene classes, using a supervised, closed-set classification setup. The newly recorded TUT Urban Acoustic Scenes 2018 dataset consists of ten different acoustic scenes and was recorded in six large European cities, therefore it has a higher acoustic variability than the previous datasets used for this task, and in addition to high-quality binaural recordings, it also includes data recorded with mobile devices. We also present the baseline system consisting of a convolutional neural network and its performance in the subtasks using the recommended cross-validation setup.

Keywords

Acoustic scene classification, DCASE challenge, public datasets, multi-device data

PDF