The goal of acoustic scene classification is to classify a test recording into one of the provided predefined classes that characterizes the environment in which it was recorded.
Challenge has ended. Full results for this task can be found in subtask specific result pages: Task1A Task1B Task1C
This task comprises three different subtasks that involve system development for three different situations:
Acoustic Scene Classification
Subtask A
Classification of data from the same device as the available training data.
Acoustic Scene Classification with mismatched recording devices
Subtask B
Classification of data recorded with devices different than the training data.
Open set Acoustic Scene Classification
Subtask C
Classification on data that includes classes not encountered in the training data.
Description
The goal of acoustic scene classification is to classify a test recording into one of the provided predefined classes that characterizes the environment in which it was recorded — for example "park", "pedestrian street", "metro station" — or to indicate it is from a different, unknown environment.
Audio dataset
The dataset for this task is the TAU Urban Acoustic Scenes 2019 dataset, consisting of recordings from various acoustic scenes. This dataset extends the TUT Urban Acoustic Scenes 2018 dataset with other 6 cities to a total of 12 large European cities. For each scene class, recordings were done in different locations; for each recording location there are 5-6 minutes of audio. The original recordings were split into segments with a length of 10 seconds that are provided in individual files. Available information about the recordings include the following: acoustic scene class, city, and recording location.
Acoustic scenes (10):
- Airport -
airport
- Indoor shopping mall -
shopping_mall
- Metro station -
metro_station
- Pedestrian street -
street_pedestrian
- Public square -
public_square
- Street with medium level of traffic -
street_traffic
- Travelling by a tram -
tram
- Travelling by a bus -
bus
- Travelling by an underground metro -
metro
- Urban park -
park
Data was recorded in the following cities:
- Amsterdam
- Barcelona
- Helsinki
- Lisbon
- London
- Lyon
- Madrid
- Milan
- Prague
- Paris
- Stockholm
- Vienna
Recording procedure
Recordings were made using four devices that captured audio simultaneously.
The main recording device consists in Soundman OKM II Klassik/studio A3, electret binaural microphone and a Zoom F8 audio recorder using 48kHz sampling rate and 24 bit resolution. The microphones are specifically made to look like headphones, being worn in the ears. As an effect of this, the recorded audio is very similar to the sound that reaches the human auditory system of the person wearing the equipment. This equipment is further referred to as device A.
The other devices are commonly available customer devices: device B is a Samsung Galaxy S7, device C is IPhone SE, and device D is a GoPro Hero5 Session. All simultaneous recordings are time synchronized.
The dataset was collected by Tampere University of Technology between 05/2018 - 11/2018. The data collection received funding from the European Research Council, grant agreement 637422 EVERYSOUND.
Development and evaluation datasets
Different versions of the dataset are provided depending on the task.
TAU Urban Acoustic Scenes 2019 development dataset contains only material recorded with device A, containing 40 hours of audio, balanced between classes. The data comes from 10 of the 12 cities. TAU Urban Acoustic Scenes 2019 evaluation dataset contains data from all 12 cities.
TAU Urban Acoustic Scenes 2019 Mobile development dataset contains material recorded with devices A, B and C. It is composed of TAU Urban Acoustic Scenes 2019 data recorded with device A, and some amount of parallel audio recorded with devices B and C. Data from device A was resampled and averaged into a single channel, to align with the properties of the data recorded with devices B and C. The dataset contains in total 46 hours of audio (40h + 3h + 3h). TAU Urban Acoustic Scenes 2019 Mobile evaluation dataset contains also data from device D.
TAU Urban Acoustic Scenes 2019 Open set development dataset contains only material recorded with device A, being composed of TAU Urban Acoustic Scenes 2019 and additional audio examples for the open classification problem. The "open" data consists of the "beach" and "office" classes of TUT Acoustic Scenes 2017 dataset and other material recorded in 2019. The dataset contains in total 46 hours of audio (40h + 6h). TAU Urban Acoustic Scenes 2019 Open set evaluation dataset contains data from the 10 known classes, and other unknown ones.
Reference labels
Reference labels are provided only for the development datasets. Reference labels for evaluation dataset or leaderboard dataset will not be released. For publications based on the DCASE challenge data, please use the provided training/test setup of the development set, to allow comparisons. After the challenge, if you want to evaluate your proposed system with official challenge evaluation setup, contact the task coordinators. Task coordinators can provide unofficial scoring for limited amount of system outputs.
Download
Subtask A
Subtask B
Subtask C
Dataset was updated on 12 March 2019 to include train/test setup (version 2). In order to update already downloaded the dataset version 1, update only TAU-urban-acoustic-scenes-2019-openset-development.meta.zip
file.
Task setup
For each subtask, a development set is provided, together with a training/test partitioning for system development. Participants are required to report performance of their system using this train/test setup in order to allow comparison of systems on the development set.
Subtask A
A Match Acoustic Scene Classification
This subtask is concerned with the basic problem of acoustic scene classification, in which all data (development and evaluation) are recorded with the same device, in this case device A, and contains only data from the 10 known acoustic scene classes. The subtask uses TAU Urban Acoustic Scenes 2019 dataset.
Development dataset
The development dataset consists of recordings from ten cities; the training subset contains recordings from only 9 of the cities, to test the generalization properties of the systems. The training/test subsets are created based on the recording location such that the training subset contains approximately 70% of recording locations from each city. The test subset contains recordings from the rest of the locations, and few locations from the tenth city. Full data from the tenth city is provided, but partly unused in this setup, to reflect the final evaluation setup.
The development set contains 40 hours of data, with 14400 segments (144 per city per acoustic scene class). The training/test setup includes segments from Milan only to the test subset. There are 9185 segments in the training set, 4185 in the test set, and additional 1030 segments from Milan. For complete details on the dataset, check the readme file provided with the data.
Participants are allowed to create their own cross-validation folds or separate validation set. In this case please pay attention to the segments recorded at same location. Location identifier can be found from metadata file provided in the dataset or from audio file names:
[scene label]-[city]-[location id]-[segment id]-[device id].wav
Make sure that all files having same location id are placed on the same side of the evaluation. In this subtask, device id is always a
.
Evaluation dataset
The evaluation dataset contains 20 hours of audio data from 12 cities (2 cities not encountered in development set), and it is provided without ground truth. Participants should run their system for this dataset, and submit the classification results (system output) to DCASE2019 Challenge.
Subtask B
B Mismatch Acoustic Scene Classification with mismatched recording devices
This subtask is concerned with the situation in which an application will be tested with different devices, possibly not the same as the ones used to record the development data. In this case, evaluation data contains more devices than the development data. The subtask uses TAU Urban Acoustic Scenes 2019 Mobile dataset.
Development dataset
The development set consists of data recorded with 3 devices: A, B and C. This includes all data from the development set of subtask A (40 hours), partitioned in the same way. In addition, parallel recordings are provided from devices B and C, amounting to 3 hours for each. From devices B and C, half of the data is included to the training subset, half to the test subset. The development set contains in total 46 hours of data, with 16560 segments, of which 14400 from device A, 1080 from device B, 1080 from device C. There are 10265 segments in the training set (9185 for device A, 540 for device B, and 540 for device C), 5265 in the test set (4185 for device A, 540 for device B, and 540 device C), and additional 1030 segments from Milan. For complete details on the dataset, check the readme file provided with the data.
Participants are allowed to create their own cross-validation folds or separate validation set. In this case please pay attention to the segments recorded at same location. Location identifier can be found from metadata file provided in the dataset or from audio file names:
[scene label]-[city]-[location id]-[segment id]-[device id].wav
Make sure that all files having same location id are placed on the same side of the evaluation. In this subtask, device id can be a
, b
or c
.
Evaluation dataset
The evaluation dataset contains data from all 4 devices, including device D that was not available in the development set. It contains 30 hours of audio and it is provided without ground truth. Participants should run their system for this dataset, and submit the classification results (system output) to DCASE2019 Challenge.
Subtask C
C OpenSet Open set Acoustic Scene Classification
This subtask is concerned with acoustic scene classification where the test recording may be from a different environment than the 10 target classes, in which case it should be classified as "unknown", in a so-called open-set classification setup. The subtask uses TAU Urban Acoustic Scenes 2019 Openset dataset and some additional data providing examples of "unknown" acoustic scenes.
Participants should make good use of external data in order to model the case of scenes not encountered within the training data. The provided examples allow only limited generalization, and may overfit to their original dataset due to lack of sufficient variety.
Development dataset
The development dataset consists of data from the 10 target classes and additional "unknown" class examples. The dataset includes all data from the development set of Subtask A (40 hours), partitioned in the same way. In addition, recordings are provided for modeling and testing the open-set classification task. The unknown class consists of audio examples from TUT Acoustic Scenes 2017 dataset and new material recorded during the collection of TAU Urban Acoustic Scenes 2019 dataset. The development set contains 44 hours of data (40+4), with 15850 segments (14400 of ten scene classes + 1450 unknown class). Complete details on the dataset are provided in the readme file. In addition, correspondence of "unknown" class examples with their original acoustic scenes and file names is provided in meta_unknown.csv.
Participants are allowed to create their own cross-validation folds or separate validation set. In this case please pay attention to the segments recorded at same location. Location identifier can be found from metadata file provided in the dataset or from audio file names:
[scene label]-[city]-[location id]-[segment id]-[device id].wav
Make sure that all files having same location id are placed on the same side of the evaluation. In this subtask, device id is always a
.
Evaluation dataset
The evaluation dataset contains 20 hours of audio data, of which part is recorded in one of the 10 known classes, and part in other, unknown environments, different than the ones in the development set. The evaluation dataset is provided without ground truth. Participants should run their system for this dataset, and submit the classification results (system output) to DCASE2019 Challenge.
External data resources
Use of external data is allowed in all subtasks under the following conditions:
- The used external resource is clearly referenced and freely accessible to any other research group in the world. External data refers to public datasets or trained models. The dataset/models must be public and freely available before 1st of April 2019.
- Participants submit at least one system without external training data so that we can study the contribution of such resources. The list of external data sources used in training must be clearly indicated in the technical report.
- Participants inform the organizers in advance about such data sources, so that all competitors know about them and have equal opportunity to use them; please send and email to the task coordinators; we will update the list of external datasets on the webpage accordingly. Once the evaluation set is published, the list of allowed external data resources is locked (no further external sources allowed).
- It is not allowed to use TUT Acoustic Scenes 2016, TUT Acoustic Scenes 2017 and TUT Urban Acoustic Scenes 2018. These datasets are partially included in the current setup, and additional usage will lead to overfitting.
List of external datasets allowed:
Dataset name | Type | Added | Link |
---|---|---|---|
LITIS Rouen audio scene dataset | audio | 04.03.2019 | https://sites.google.com/site/alainrakotomamonjy/home/audio-scene |
DCASE2013 Challenge - Public Dataset for Scene Classification Task | audio | 04.03.2019 | https://archive.org/details/dcase2013_scene_classification |
DCASE2013 Challenge - Private Dataset for Scene Classification Task | audio | 04.03.2019 | https://archive.org/details/dcase2013_scene_classification_testset |
Dares G1 | audio | 04.03.2019 | http://www.daresounds.org/ |
AudioSet | audio | 04.03.2019 | https://research.google.com/audioset/ |
Participants cannot suggest data to this list anymore (list locked 27th of May 2019).
Submission
Participants can choose subtasks they participate, there is no requirement to participate all of them. Official challenge submission consists of a technical report and system output for the evaluation data.
System output should be presented as a single text-file (in CSV format, without header row) containing classification result for each audio file in the evaluation set. Result items can be in any order. Format:
[filename (string)][tab][scene label (string)]
Multiple system outputs can be submitted (maximum 4 per participant per subtask). For each system, meta information should be provided in a separate file, containing the task specific information as given in the example here. All files should be packaged into a zip file for submission. Please carefully mark the connection between the submitted files and the corresponding system or system parameters (for example by naming the text file appropriately).
When training the final system for submission, participants can of course use the entire development set. In the technical report, participants should include system results on the training/test setup provided with the development set.
Detailed information for the submission can be found on the Submission page.
Public leaderboards
During the challenge, a public leaderboard will be provided using a separate public evaluation dataset for each subtask. The leaderboards are organized through Kaggle InClass competitions. Leaderboards are meant to serve as a development tool for participants, and does not have an official role in the challenge.
Due to Kaggle / US Government policy, people who are residents of certain countries (Cuba, Iran, Syria, North Korea, and Sudan) are unable to participate in the Kaggle competitions (see Kaggle terms, section 7 What are the rules for competitions on Kaggle?). As DCASE is committed to open science open to everybody, in case these Kaggle restrictions are preventing you from using the Kaggle based leaderboard during the development, please contact task 1 organizers and we will provide similar service outside Kaggle.
B Mismatch Subtask B Leaderboard
C OpenSet Subtask C Leaderboard
The official DCASE challenge submission will not be done through these Kaggle InClass competitions.
Datasets
For public leaderboard submissions, participants should use the official challenge development datasets to train their system as in DCASE challenge. Separate datasets, leaderboard datasets, are released to be used as evaluation datasets in the competitions. These leaderboard datasets consist of a small subset of the official evaluation dataset, with similar properties (distribution). The material amount in the leaderboard dataset is considerably lower than the official evaluation material in the DCASE challenge.
It is not allowed to use the leaderboard datasets to train the systems in any DCASE challenge subtasks or leaderboard competitions.
Task rules
There are general rules valid for all tasks; these, along with information on technical report and submission requirements can be found here.
Task specific rules:
- Use of external data is allowed, except TUT Acoustic Scenes 2016, TUT Acoustic Scenes 2017, TUT Urban Acoustic Scenes 2018 and leaderboard datasets (DCASE2018 and DCASE2019).
- Manipulation of provided training and development data is allowed (e.g. by mixing data sampled from a pdf or using techniques such as pitch shifting or time stretching).
- Participants are not allowed to make subjective judgments of the evaluation data, nor to annotate it. The evaluation dataset cannot be used to train the submitted system; the use of statistics about the evaluation data in the decision making is also forbidden. Separately published leaderboard data is considered as evaluation data as well.
- Classification decision must be done independently for each test sample.
Evaluation
The scoring of acoustic scene classification will be based on classification accuracy: the number of correctly classified segments among the total number of segments. Each segment is considered an independent test sample. Accuracy will be calculated as average of the class-wise accuracy.
Participants can use sed_eval toolbox for the evaluation:
Ranking
- Subtask A will use the overall accuracy on the evaluation data.
- Subtask B will use the overall accuracy on data from devices B and C.
- Subtask C will use the weighted average of the known classes and unknown class:
Results
Subtask A
Rank | Submission Information | ||||
---|---|---|---|---|---|
Code | Author | Affiliation |
Technical Report |
Accuracy with 95% confidence interval |
|
Bilot_IDG_task1a_1 | Valentin Bilot | Audio R&D, InterDigital R&D, Rennes, France | task-acoustic-scene-classification-results-a#Bilot2019 | 66.1 (65.0 - 67.2) | |
Bilot_IDG_task1a_2 | Valentin Bilot | Audio R&D, InterDigital R&D, Rennes, France | task-acoustic-scene-classification-results-a#Bilot2019 | 67.3 (66.3 - 68.4) | |
Bilot_IDG_task1a_3 | Valentin Bilot | Audio R&D, InterDigital R&D, Rennes, France | task-acoustic-scene-classification-results-a#Bilot2019 | 64.5 (63.4 - 65.6) | |
Bilot_IDG_task1a_4 | Valentin Bilot | Audio R&D, InterDigital R&D, Rennes, France | task-acoustic-scene-classification-results-a#Bilot2019 | 68.3 (67.3 - 69.4) | |
Chandrasekhar_IIITH_task1a_1 | Chandrasekhar Paseddula | International Institute of Information Technology, Hyderabad department:Electronics and Communication Engineering, Hyderabad, India | task-acoustic-scene-classification-results-a#Paseddula2019 | 52.6 (51.4 - 53.7) | |
DSPLAB_TJU_task1a_1 | Jinhua Liang | School of Electrical and Information Engineering, TianJin University, Tianjin, China | task-acoustic-scene-classification-results-a#Ding2019 | 66.5 (65.4 - 67.6) | |
DSPLAB_TJU_task1a_2 | Jinhua Liang | School of Electrical and Information Engineering, TianJin University, Tianjin, China | task-acoustic-scene-classification-results-a#Ding2019 | 69.6 (68.5 - 70.6) | |
DSPLAB_TJU_task1a_3 | Jinhua Liang | School of Electrical and Information Engineering, TianJin University, Tianjin, China | task-acoustic-scene-classification-results-a#Ding2019 | 65.0 (63.9 - 66.1) | |
DSPLAB_TJU_task1a_4 | Jinhua Liang | School of Electrical and Information Engineering, TianJin University, Tianjin, China | task-acoustic-scene-classification-results-a#Ding2019 | 69.5 (68.4 - 70.5) | |
Fmta91_KNToosi_task1a_1 | fateme Arabnezhad | Computer Engineering Department, Khaje Nasir Toosi, Tehran, Iran | task-acoustic-scene-classification-results-a#Arabnezhad2019 | 76.2 (75.2 - 77.2) | |
Fraile_UPM_task1a_1 | Ruben Fraile | CITSEM, Universidad Politecnica de Madrid, Madrid, Spain | task-acoustic-scene-classification-results-a#Fraile2019 | 58.7 (57.6 - 59.9) | |
DCASE2019 baseline | Toni Heittola | Computing Sciences, Tampere University, Tampere, Finland | task-acoustic-scene-classification-results-a#Heittola2019 | 63.3 (62.2 - 64.5) | |
Huang_IL_task1a_1 | Paulo Lopez Meyer | Intel Labs, Intel Corporation, Zapopan, Jalisco, Mexico | task-acoustic-scene-classification-results-a#Huang2019 | 80.5 (79.6 - 81.4) | |
Huang_IL_task1a_2 | Paulo Lopez Meyer | Intel Labs, Intel Corporation, Zapopan, Jalisco, Mexico | task-acoustic-scene-classification-results-a#Huang2019 | 81.1 (80.2 - 82.0) | |
Huang_IL_task1a_3 | Paulo Lopez Meyer | Intel Labs, Intel Corporation, Zapopan, Jalisco, Mexico | task-acoustic-scene-classification-results-a#Huang2019 | 81.3 (80.4 - 82.2) | |
Huang_IL_task1a_4 | Paulo Lopez Meyer | Intel Labs, Intel Corporation, Zapopan, Jalisco, Mexico | task-acoustic-scene-classification-results-a#Huang2019 | 79.5 (78.6 - 80.5) | |
Huang_SCNU_task1a_1 | Zhenyi Huang | School of Computer, South China Normal University, Guangzhou, China | task-acoustic-scene-classification-results-a#Huang2019a | 79.2 (78.3 - 80.1) | |
JSNU_WDXY_task1a_1 | Xinixn Ma | School of Physics and Electronic, Jiangsu Normal University, Xuzhou, China | task-acoustic-scene-classification-results-a#Ma2019 | 72.2 (71.1 - 73.2) | |
Jung_UOS_task1a_1 | Ha-Jin Yu | Computing Sciences, Univerisity of Seoul, Seoul, Republic of Korea | task-acoustic-scene-classification-results-a#Jung2019 | 81.1 (80.2 - 82.0) | |
Jung_UOS_task1a_2 | Ha-jin Yu | Computing Sciences, Univerisity of Seoul, Seoul, Republic of Korea | task-acoustic-scene-classification-results-a#Jung2019 | 81.2 (80.3 - 82.1) | |
Jung_UOS_task1a_3 | Ha-jin Yu | Computing Sciences, Univerisity of Seoul, Seoul, Republic of Korea | task-acoustic-scene-classification-results-a#Jung2019 | 81.0 (80.1 - 81.9) | |
Jung_UOS_task1a_4 | Ha-jin Yu | Computing Sciences, Univerisity of Seoul, Seoul, Republic of Korea | task-acoustic-scene-classification-results-a#Jung2019 | 81.2 (80.3 - 82.1) | |
KK_I2R_task1a_1 | Teh KK | I2R, A-star, Singapore | task-acoustic-scene-classification-results-a#KK2019 | 76.6 (75.6 - 77.6) | |
KK_I2R_task1a_2 | Teh KK | I2R, A-star, Singapore | task-acoustic-scene-classification-results-a#KK2019 | 77.7 (76.7 - 78.6) | |
KK_I2R_task1a_3 | Teh KK | I2R, A-star, Singapore | task-acoustic-scene-classification-results-a#KK2019 | 77.2 (76.2 - 78.2) | |
Kong_SURREY_task1a_1 | Qiuqiang Kong | Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, England | task-acoustic-scene-classification-results-a#Kong2019 | 70.5 (69.5 - 71.6) | |
Koutini_CPJKU_task1a_1 | Khaled Koutini | Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria | task-acoustic-scene-classification-results-a#Koutini2019 | 82.8 (82.0 - 83.7) | |
Koutini_CPJKU_task1a_2 | Khaled Koutini | Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria | task-acoustic-scene-classification-results-a#Koutini2019 | 83.7 (82.9 - 84.6) | |
Koutini_CPJKU_task1a_3 | Khaled Koutini | Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria | task-acoustic-scene-classification-results-a#Koutini2019 | 83.5 (82.6 - 84.4) | |
Koutini_CPJKU_task1a_4 | Khaled Koutini | Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria | task-acoustic-scene-classification-results-a#Koutini2019 | 83.8 (82.9 - 84.6) | |
LamPham_HCMGroup_task1a_1 | Lam Pham | School of Computing, University of Kent, Chatham, United Kingdom | task-acoustic-scene-classification-results-a#Pham2019 | 73.9 (72.9 - 74.9) | |
LamPham_KentGroup_task1a_1 | Lam Pham | School of Computing, University of Kent, Chatham, United Kingdom | task-acoustic-scene-classification-results-a#Pham2019a | 76.8 (75.8 - 77.7) | |
Lei_CQU_task1a_1 | Chongqin Lei | Intelligent Information Technology and System Lab, CHONGQING UNIVERSITY, Chongqing, China | task-acoustic-scene-classification-results-a#Lei2019 | 75.5 (74.5 - 76.5) | |
Li_NPU_task1a_1 | Ning FangLi | Mechanical Engineering, Northwestern Polytechnical University School, 127 West Youyi Road, Xi'an, 710072, China | task-acoustic-scene-classification-results-a#FangLi2019 | 59.9 (58.8 - 61.0) | |
Li_NPU_task1a_2 | Ning FangLi | Mechanical Engineering, Northwestern Polytechnical University School, 127 West Youyi Road, Xi'an, 710072, China | task-acoustic-scene-classification-results-a#FangLi2019 | 61.8 (60.7 - 62.9) | |
Liang_HUST_task1a_1 | Han Liang | Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, China | task-acoustic-scene-classification-results-a#Liang2019 | 68.2 (67.1 - 69.2) | |
Liang_HUST_task1a_2 | Han Liang | Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, China | task-acoustic-scene-classification-results-a#Liang2019 | 66.4 (65.3 - 67.5) | |
Liu_SCUT_task1a_1 | Liu Mingle | School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province | task-acoustic-scene-classification-results-a#Mingle2019 | 78.3 (77.4 - 79.3) | |
Liu_SCUT_task1a_2 | Liu Mingle | School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province | task-acoustic-scene-classification-results-a#Mingle2019 | 79.9 (79.0 - 80.8) | |
Liu_SCUT_task1a_3 | Liu Mingle | School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province | task-acoustic-scene-classification-results-a#Mingle2019 | 78.3 (77.3 - 79.2) | |
Liu_SCUT_task1a_4 | Liu Mingle | School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province | task-acoustic-scene-classification-results-a#Mingle2019 | 78.4 (77.4 - 79.3) | |
MaLiu_BIT_task1a_1 | Sifan Ma | Laboratory of Modern Communication, Beijing Institute of Technology, Beijing, China | task-acoustic-scene-classification-results-a#Ma2019a | 72.8 (71.8 - 73.8) | |
MaLiu_BIT_task1a_2 | Wei Liu | Laboratory of Modern Communication, Beijing Institute of Technology, Beijing, China | task-acoustic-scene-classification-results-a#Liu2019 | 76.0 (75.1 - 77.0) | |
MaLiu_BIT_task1a_3 | Sifan Ma | Laboratory of Modern Communication, Beijing Institute of Technology, Beijing, China | task-acoustic-scene-classification-results-a#Ma2019a | 73.3 (72.3 - 74.3) | |
Mars_PRDCSG_task1a_1 | Rohith Mars | Core Technology Group, Panasonic R&D Center, Singapore, Singapore | task-acoustic-scene-classification-results-a#Mars2019 | 79.3 (78.3 - 80.2) | |
McDonnell_USA_task1a_1 | Mark McDonnell | School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia | task-acoustic-scene-classification-results-a#Gao2019 | 80.0 (79.0 - 80.9) | |
McDonnell_USA_task1a_2 | Mark McDonnell | School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia | task-acoustic-scene-classification-results-a#Gao2019 | 80.5 (79.6 - 81.4) | |
McDonnell_USA_task1a_3 | Mark McDonnell | School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia | task-acoustic-scene-classification-results-a#Gao2019 | 80.4 (79.5 - 81.3) | |
McDonnell_USA_task1a_4 | Mark McDonnell | School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia | task-acoustic-scene-classification-results-a#Gao2019 | 80.3 (79.4 - 81.2) | |
Naranjo-Alcazar_VfyAI_task1a_1 | Javier Naranjo-Alcazar | Visualfy AI, Visualfy, Benisano, Spain | task-acoustic-scene-classification-results-a#Naranjo-Alcazar2019 | 74.1 (73.1 - 75.2) | |
Naranjo-Alcazar_VfyAI_task1a_2 | Javier Naranjo-Alcazar | Visualfy AI, Visualfy, Benisano, Spain | task-acoustic-scene-classification-results-a#Naranjo-Alcazar2019 | 74.2 (73.2 - 75.2) | |
Naranjo-Alcazar_VfyAI_task1a_3 | Javier Naranjo-Alcazar | Visualfy AI, Visualfy, Benisano, Spain | task-acoustic-scene-classification-results-a#Naranjo-Alcazar2019 | 74.0 (73.0 - 75.0) | |
Naranjo-Alcazar_VfyAI_task1a_4 | Javier Naranjo-Alcazar | Visualfy AI, Visualfy, Benisano, Spain | task-acoustic-scene-classification-results-a#Naranjo-Alcazar2019 | 74.1 (73.1 - 75.1) | |
Plata_SRPOL_task1a_1 | Marcin Plata | Data Intelligence Group, Samsung R&D Institute Poland, Warsaw, Poland | task-acoustic-scene-classification-results-a#Plata2019 | 78.8 (77.9 - 79.8) | |
Plata_SRPOL_task1a_2 | Marcin Plata | Data Intelligence Group, Samsung R&D Institute Poland, Warsaw, Poland | task-acoustic-scene-classification-results-a#Plata2019 | 79.2 (78.3 - 80.1) | |
Plata_SRPOL_task1a_3 | Marcin Plata | Data Intelligence Group, Samsung R&D Institute Poland, Warsaw, Poland | task-acoustic-scene-classification-results-a#Plata2019 | 77.2 (76.3 - 78.2) | |
Plata_SRPOL_task1a_4 | Marcin Plata | Data Intelligence Group, Samsung R&D Institute Poland, Warsaw, Poland | task-acoustic-scene-classification-results-a#Plata2019 | 77.9 (77.0 - 78.9) | |
SSW_ETRI_task1a_1 | Suh Sangwon | Realistic AV Research Group, Electronics and Telecommunications Research Institute, Daejeon, Korea | task-acoustic-scene-classification-results-a#Sangwon2019 | 66.7 (65.6 - 67.8) | |
SSW_ETRI_task1a_2 | Suh Sangwon | Realistic AV Research Group, Electronics and Telecommunications Research Institute, Daejeon, Korea | task-acoustic-scene-classification-results-a#Sangwon2019 | 67.0 (65.9 - 68.1) | |
SSW_ETRI_task1a_3 | Suh Sangwon | Realistic AV Research Group, Electronics and Telecommunications Research Institute, Daejeon, Korea | task-acoustic-scene-classification-results-a#Sangwon2019 | 67.6 (66.5 - 68.7) | |
SSW_ETRI_task1a_4 | Suh Sangwon | Realistic AV Research Group, Electronics and Telecommunications Research Institute, Daejeon, Korea | task-acoustic-scene-classification-results-a#Sangwon2019 | 67.6 (66.5 - 68.7) | |
Salvati_DMIF_task1a_1 | Daniele Salvati | Mathematics, Computer Science and Physics, University of Udine, Udine, Italy | task-acoustic-scene-classification-results-a#Salvati2019 | 68.5 (67.5 - 69.6) | |
Seo_LGE_task1a_1 | Seo Hyeji | Advanced Robotics Lab, LG Electronics, Seoul, Korea | task-acoustic-scene-classification-results-a#Hyeji2019 | 81.6 (80.7 - 82.5) | |
Seo_LGE_task1a_2 | Seo Hyeji | Advanced Robotics Lab, LG Electronics, Seoul, Korea | task-acoustic-scene-classification-results-a#Hyeji2019 | 82.5 (81.6 - 83.4) | |
Seo_LGE_task1a_3 | Seo Hyeji | Advanced Robotics Lab, LG Electronics, Seoul, Korea | task-acoustic-scene-classification-results-a#Hyeji2019 | 81.1 (80.2 - 82.0) | |
Seo_LGE_task1a_4 | Seo Hyeji | Advanced Robotics Lab, LG Electronics, Seoul, Korea | task-acoustic-scene-classification-results-a#Hyeji2019 | 82.5 (81.7 - 83.4) | |
Waldekar_IITKGP_task1a_1 | Shefali Waldekar | Electronics and Electrical Communication Engineering Dept., Indian Institute of Technology Kharagpur, Kharagpur, India | task-acoustic-scene-classification-results-a#Waldekar2019 | 65.9 (64.8 - 67.0) | |
Wang_BTBU_task1a_1 | Zhuhe Wang | Noise and Vibration Laboratory, Beijing Technology and Business University, Beijing, China | task-acoustic-scene-classification-results-a#Wang2019 | 32.2 (31.1 - 33.3) | |
Wang_NWPU_task1a_1 | Mou Wang | School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China | task-acoustic-scene-classification-results-a#Wang2019a_t1 | 80.6 (79.7 - 81.5) | |
Wang_NWPU_task1a_2 | Mou Wang | School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China | task-acoustic-scene-classification-results-a#Wang2019a | 80.1 (79.1 - 81.0) | |
Wang_NWPU_task1a_3 | Mou Wang | School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China | task-acoustic-scene-classification-results-a#Wang2019a | 76.6 (75.6 - 77.6) | |
Wang_NWPU_task1a_4 | Mou Wang | School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China | task-acoustic-scene-classification-results-a#Wang2019a | 76.8 (75.8 - 77.8) | |
Wang_SCUT_task1a_1 | Wucheng Wang | School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province | task-acoustic-scene-classification-results-a#Wang2019b | 76.4 (75.4 - 77.4) | |
Wang_SCUT_task1a_2 | Wucheng Wang | School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province | task-acoustic-scene-classification-results-a#Wang2019b | 76.6 (75.6 - 77.5) | |
Wang_SCUT_task1a_3 | Wucheng Wang | School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province | task-acoustic-scene-classification-results-a#Wang2019b | 75.9 (74.9 - 76.9) | |
Wang_SCUT_task1a_4 | Wucheng Wang | School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province | task-acoustic-scene-classification-results-a#Wang2019b | 76.5 (75.5 - 77.5) | |
Wilkinghoff_FKIE_task1a_1 | Kevin Wilkinghoff | Communication Systems, Fraunhofer Institute for Communication, Information Processing and Ergonomics, Wachtberg, Germany | task-acoustic-scene-classification-results-a#Wilkinghoff2019 | 74.6 (73.6 - 75.6) | |
Wilkinghoff_FKIE_task1a_2 | Kevin Wilkinghoff | Communication Systems, Fraunhofer Institute for Communication, Information Processing and Ergonomics, Wachtberg, Germany | task-acoustic-scene-classification-results-a#Wilkinghoff2019 | 76.2 (75.2 - 77.2) | |
Wu_CUHK_task1a_1 | Yuzhong Wu | Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, China | task-acoustic-scene-classification-results-a#Wu2019 | 80.1 (79.1 - 81.0) | |
Yang_UESTC_task1a_1 | Yang Haocong | Electronic Engineering, University of Electronic Science and Technology of China, Chengdu, China | task-acoustic-scene-classification-results-a#Haocong2019 | 79.9 (78.9 - 80.8) | |
Yang_UESTC_task1a_2 | Yang Haocong | Electronic Engineering, University of Electronic Science and Technology of China, Chengdu, China | task-acoustic-scene-classification-results-a#Haocong2019 | 81.6 (80.7 - 82.5) | |
Yang_UESTC_task1a_3 | Yang Haocong | Electronic Engineering, University of Electronic Science and Technology of China, Chengdu, China | task-acoustic-scene-classification-results-a#Haocong2019 | 81.2 (80.3 - 82.1) | |
Zeinali_BUT_task1a_1 | Hossein Zeinali | Information Technology, Brno University of Technology, Brno, Czech Republic | task-acoustic-scene-classification-results-a#Zeinali2019 | 78.9 (78.0 - 79.9) | |
Zeinali_BUT_task1a_2 | Hossein Zeinali | Information Technology, Brno University of Technology, Brno, Czech Republic | task-acoustic-scene-classification-results-a#Zeinali2019 | 78.9 (77.9 - 79.8) | |
Zeinali_BUT_task1a_3 | Hossein Zeinali | Information Technology, Brno University of Technology, Brno, Czech Republic | task-acoustic-scene-classification-results-a#Zeinali2019 | 79.1 (78.1 - 80.0) | |
Zhang_IOA_task1a_1 | Pengyuan Zhang | Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Beijing, China | task-acoustic-scene-classification-results-a#Chen2019 | 84.9 (84.1 - 85.7) | |
Zhang_IOA_task1a_2 | Pengyuan Zhang | Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Beijing, China | task-acoustic-scene-classification-results-a#Chen2019 | 84.9 (84.1 - 85.8) | |
Zhang_IOA_task1a_3 | Pengyuan Zhang | Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Beijing, China | task-acoustic-scene-classification-results-a#Chen2019 | 85.2 (84.4 - 86.0) | |
Zhang_IOA_task1a_4 | Pengyuan Zhang | Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Beijing, China | task-acoustic-scene-classification-results-a#Chen2019 | 84.8 (83.9 - 85.6) | |
Zheng_USTC_task1a_1 | Xu Zheng | Computing Sciences, University of Science of Techonology of China, Hefei,Anhui,China | task-acoustic-scene-classification-results-a#Zheng2019 | 75.7 (74.7 - 76.7) | |
Zheng_USTC_task1a_2 | Xu Zheng | Computing Sciences, University of Science of Techonology of China, Hefei,Anhui,China | task-acoustic-scene-classification-results-a#Zheng2019 | 71.3 (70.3 - 72.4) | |
Zheng_USTC_task1a_3 | Xu Zheng | Computing Sciences, University of Science of Techonology of China, Hefei,Anhui,China | task-acoustic-scene-classification-results-a#Zheng2019 | 78.9 (77.9 - 79.8) | |
Zhou_Kuaiyu_task1a_1 | Nai Zhou | Beijing Kuaiyu Electronics Co., Ltd., Beijing, China | task-acoustic-scene-classification-results-a#Zhou2019_t1 | 79.8 (78.8 - 80.7) | |
Zhou_Kuaiyu_task1a_2 | Nai Zhou | Beijing Kuaiyu Electronics Co., Ltd., Beijing, China | task-acoustic-scene-classification-results-a#Zhou2019_t1 | 79.4 (78.5 - 80.4) | |
Zhou_Kuaiyu_task1a_3 | Nai Zhou | Beijing Kuaiyu Electronics Co., Ltd., Beijing, China | task-acoustic-scene-classification-results-a#Zhou2019_t1 | 78.7 (77.7 - 79.6) | |
Zhu_SSLabBUPT_task1a_1 | Houwei Zhu | Speech Lab, Samsung Research China-Beijing, Beijing, China | task-acoustic-scene-classification-results-a#Zhu2019 | 79.2 (78.3 - 80.1) | |
Zhu_SSLabBUPT_task1a_2 | Houwei Zhu | Speech Lab, Samsung Research China-Beijing, Beijing, China | task-acoustic-scene-classification-results-a#Zhu2019 | 78.8 (77.9 - 79.7) | |
Zhu_SSLabBUPT_task1a_3 | Houwei Zhu | Speech Lab, Samsung Research China-Beijing, Beijing, China | task-acoustic-scene-classification-results-a#Zhu2019 | 79.1 (78.2 - 80.1) | |
Zhu_SSLabBUPT_task1a_4 | Houwei Zhu | Speech Lab, Samsung Research China-Beijing, Beijing, China | task-acoustic-scene-classification-results-a#Zhu2019 | 78.8 (77.8 - 79.7) |
Complete results and technical reports can be found at subtask A results page
Subtask B
Rank | Submission Information | ||||
---|---|---|---|---|---|
Code | Author | Affiliation |
Technical Report |
Accuracy with 95% confidence interval |
|
Eghbal-zadeh_CPJKU_task1b_1 | Khaled Koutini | Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria | task-acoustic-scene-classification-results-b#Eghbal-zadeh2019 | 74.5 (73.5 - 75.5) | |
Eghbal-zadeh_CPJKU_task1b_2 | Khaled Koutini | Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria | task-acoustic-scene-classification-results-b#Eghbal-zadeh2019 | 74.5 (73.5 - 75.5) | |
Eghbal-zadeh_CPJKU_task1b_3 | Khaled Koutini | Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria | task-acoustic-scene-classification-results-b#Eghbal-zadeh2019 | 73.4 (72.4 - 74.5) | |
Eghbal-zadeh_CPJKU_task1b_4 | Khaled Koutini | Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria | task-acoustic-scene-classification-results-b#Eghbal-zadeh2019 | 73.4 (72.3 - 74.4) | |
DCASE2019 baseline | Toni Heittola | Computing Sciences, Tampere University, Tampere, Finland | task-acoustic-scene-classification-results-b#Heittola2019 | 47.7 (46.5 - 48.8) | |
Jiang_UESTC_task1b_1 | Shengwang Jiang | School of Communication and Information Engineering, University of Electronic Science and Technology of China, Chengdu, China | task-acoustic-scene-classification-results-b#Jiang2019 | 70.3 (69.2 - 71.3) | |
Jiang_UESTC_task1b_2 | Shengwang Jiang | School of Communication and Information Engineering, University of Electronic Science and Technology of China, Chengdu, China | task-acoustic-scene-classification-results-b#Jiang2019 | 69.9 (68.9 - 71.0) | |
Jiang_UESTC_task1b_3 | Shengwang Jiang | School of Communication and Information Engineering, University of Electronic Science and Technology of China, Chengdu, China | task-acoustic-scene-classification-results-b#Jiang2019 | 69.0 (68.0 - 70.1) | |
Jiang_UESTC_task1b_4 | Shengwang Jiang | School of Communication and Information Engineering, University of Electronic Science and Technology of China, Chengdu, China | task-acoustic-scene-classification-results-b#Jiang2019 | 69.6 (68.6 - 70.7) | |
Kong_SURREY_task1b_1 | Qiuqiang Kong | Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, England | task-acoustic-scene-classification-results-b#Kong2019 | 61.6 (60.4 - 62.7) | |
Kosmider_SRPOL_task1b_1 | Michał Kośmider | Artificial Intelligence, Samsung R&D Institute Poland, Warsaw, Poland | task-acoustic-scene-classification-results-b#Komider2019 | 75.1 (74.1 - 76.1) | |
Kosmider_SRPOL_task1b_2 | Michał Kośmider | Artificial Intelligence, Samsung R&D Institute Poland, Warsaw, Poland | task-acoustic-scene-classification-results-b#Komider2019 | 75.3 (74.3 - 76.3) | |
Kosmider_SRPOL_task1b_3 | Michał Kośmider | Artificial Intelligence, Samsung R&D Institute Poland, Warsaw, Poland | task-acoustic-scene-classification-results-b#Komider2019 | 74.9 (73.9 - 75.9) | |
Kosmider_SRPOL_task1b_4 | Michał Kośmider | Artificial Intelligence, Samsung R&D Institute Poland, Warsaw, Poland | task-acoustic-scene-classification-results-b#Komider2019 | 75.2 (74.3 - 76.2) | |
LamPham_KentGroup_task1b_1 | Lam Pham | School of Computing, University of Kent, Chatham, United Kingdom | task-acoustic-scene-classification-results-b#Pham2019 | 72.8 (71.8 - 73.8) | |
McDonnell_USA_task1b_1 | Mark McDonnell | School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia | task-acoustic-scene-classification-results-b#Gao2019 | 74.2 (73.2 - 75.2) | |
McDonnell_USA_task1b_2 | Mark McDonnell | School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia | task-acoustic-scene-classification-results-b#Gao2019 | 74.1 (73.1 - 75.2) | |
McDonnell_USA_task1b_3 | Mark McDonnell | School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia | task-acoustic-scene-classification-results-b#Gao2019 | 74.9 (73.9 - 75.9) | |
McDonnell_USA_task1b_4 | Mark McDonnell | School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia | task-acoustic-scene-classification-results-b#Gao2019 | 74.4 (73.4 - 75.4) | |
Primus_CPJKU_task1b_1 | Paul Primus | Computational Perception, Johannes Kepler University Linz, Linz, Austria | task-acoustic-scene-classification-results-b#Primus2019 | 71.3 (70.2 - 72.3) | |
Primus_CPJKU_task1b_2 | Paul Primus | Computational Perception, Johannes Kepler University Linz, Linz, Austria | task-acoustic-scene-classification-results-b#Primus2019 | 73.4 (72.4 - 74.4) | |
Primus_CPJKU_task1b_3 | Paul Primus | Computational Perception, Johannes Kepler University Linz, Linz, Austria | task-acoustic-scene-classification-results-b#Primus2019 | 71.6 (70.6 - 72.7) | |
Primus_CPJKU_task1b_4 | Paul Primus | Computational Perception, Johannes Kepler University Linz, Linz, Austria | task-acoustic-scene-classification-results-b#Primus2019 | 74.2 (73.2 - 75.2) | |
Song_HIT_task1b_1 | Hongwei Song | Computer Sciences and Technology, Harbin Institute of Technology, Harbin, China | task-acoustic-scene-classification-results-b#Song2019 | 67.3 (66.2 - 68.3) | |
Song_HIT_task1b_2 | Hongwei Song | Computer Sciences and Technology, Harbin Institute of Technology, Harbin, China | task-acoustic-scene-classification-results-b#Song2019 | 72.2 (71.2 - 73.3) | |
Song_HIT_task1b_3 | Hongwei Song | Computer Sciences and Technology, Harbin Institute of Technology, Harbin, China | task-acoustic-scene-classification-results-b#Song2019 | 72.1 (71.1 - 73.1) | |
Waldekar_IITKGP_task1b_1 | Shefali Waldekar | Electronics and Electrical Communication Engineering Dept., Indian Institute of Technology Kharagpur, Kharagpur, India | task-acoustic-scene-classification-results-b#Waldekar2019 | 62.1 (60.9 - 63.2) | |
Wang_NWPU_task1b_1 | Rui Wang | School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China | task-acoustic-scene-classification-results-b#Wang2019 | 65.7 (64.6 - 66.8) | |
Wang_NWPU_task1b_2 | Rui Wang | School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China | task-acoustic-scene-classification-results-b#Wang2019 | 68.5 (67.4 - 69.6) | |
Wang_NWPU_task1b_3 | Rui Wang | School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China | task-acoustic-scene-classification-results-b#Wang2019 | 70.3 (69.3 - 71.4) |
Complete results and technical reports can be found at subtask B results page
Subtask C
Rank | Submission Information | ||||
---|---|---|---|---|---|
Code | Author | Affiliation |
Technical Report |
Accuracy with 95% confidence interval |
|
DCASE2019 baseline | Toni Heittola | Computing Sciences, Tampere University, Tampere, Finland | task-acoustic-scene-classification-results-c#Heittola2019 | 47.6 (47.1 - 48.0) | |
Kong_SURREY_task1c_1 | Qiuqiang Kong | Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, England | task-acoustic-scene-classification-results-c#Kong2019 | 50.7 (50.2 - 51.2) | |
Lehner_SAL_task1c_1 | Bernhard Lehner | Silicon Austria Labs, JKU, Linz, Austria | task-acoustic-scene-classification-results-c#Lehner2019 | 58.7 (58.1 - 59.2) | |
Lehner_SAL_task1c_2 | Bernhard Lehner | Silicon Austria Labs, JKU, Linz, Austria | task-acoustic-scene-classification-results-c#Lehner2019 | 61.3 (60.7 - 61.9) | |
Lehner_SAL_task1c_3 | Bernhard Lehner | Silicon Austria Labs, JKU, Linz, Austria | task-acoustic-scene-classification-results-c#Lehner2019 | 60.9 (60.3 - 61.5) | |
Lehner_SAL_task1c_4 | Bernhard Lehner | Silicon Austria Labs, JKU, Linz, Austria | task-acoustic-scene-classification-results-c#Lehner2019 | 60.5 (59.9 - 61.1) | |
McDonnell_USA_task1c_1 | Mark McDonnell | School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia | task-acoustic-scene-classification-results-c#Gao2019 | 58.2 (57.6 - 58.7) | |
McDonnell_USA_task1c_2 | Mark McDonnell | School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia | task-acoustic-scene-classification-results-c#Gao2019 | 58.0 (57.5 - 58.6) | |
McDonnell_USA_task1c_3 | Mark McDonnell | School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia | task-acoustic-scene-classification-results-c#Gao2019 | 58.8 (58.2 - 59.4) | |
McDonnell_USA_task1c_4 | Mark McDonnell | School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia | task-acoustic-scene-classification-results-c#Gao2019 | 58.4 (57.9 - 59.0) | |
Rakowski_SRPOL_task1c_1 | Alexander Rakowski | Audio Intelligence, Samsung R&D Institute Poland, Warsaw, Poland | task-acoustic-scene-classification-results-c#Rakowski2019_t1 | 57.2 (56.6 - 57.8) | |
Rakowski_SRPOL_task1c_2 | Alexander Rakowski | Audio Intelligence, Samsung R&D Institute Poland, Warsaw, Poland | task-acoustic-scene-classification-results-c#Rakowski2019_t1 | 57.2 (56.6 - 57.8) | |
Rakowski_SRPOL_task1c_3 | Alexander Rakowski | Audio Intelligence, Samsung R&D Institute Poland, Warsaw, Poland | task-acoustic-scene-classification-results-c#Rakowski2019_t1 | 61.6 (61.0 - 62.2) | |
Rakowski_SRPOL_task1c_4 | Michał Kośmider | Audio Intelligence, Samsung R&D Institute Poland, Warsaw, Poland | task-acoustic-scene-classification-results-c#Rakowski2019_t1 | 64.4 (63.8 - 65.1) | |
Wilkinghoff_FKIE_task1c_1 | Kevin Wilkinghoff | Communication Systems, Fraunhofer Institute for Communication, Information Processing and Ergonomics, Wachtberg, Germany | task-acoustic-scene-classification-results-c#Wilkinghoff2019 | 61.9 (61.3 - 62.5) | |
Wilkinghoff_FKIE_task1c_2 | Kevin Wilkinghoff | Communication Systems, Fraunhofer Institute for Communication, Information Processing and Ergonomics, Wachtberg, Germany | task-acoustic-scene-classification-results-c#Wilkinghoff2019 | 62.1 (61.5 - 62.7) | |
Zhu_SRCBBUPT_task1c_1 | Houwei Zhu | Speech Lab, Samsung Research China-Beijing, Beijing, China | task-acoustic-scene-classification-results-c#Zhu2019 | 67.2 (66.6 - 67.9) | |
Zhu_SRCBBUPT_task1c_2 | Houwei Zhu | Speech Lab, Samsung Research China-Beijing, Beijing, China | task-acoustic-scene-classification-results-c#Zhu2019 | 67.4 (66.8 - 68.1) | |
Zhu_SRCBBUPT_task1c_3 | Houwei Zhu | Speech Lab, Samsung Research China-Beijing, Beijing, China | task-acoustic-scene-classification-results-c#Zhu2019 | 66.3 (65.7 - 67.0) | |
Zhu_SRCBBUPT_task1c_4 | Houwei Zhu | Speech Lab, Samsung Research China-Beijing, Beijing, China | task-acoustic-scene-classification-results-c#Zhu2019 | 67.1 (66.4 - 67.8) |
Complete results and technical reports can be found at subtask C results page
Submissions
Subtask | Teams | Entries | Authors | Affiliations |
---|---|---|---|---|
Subtask A | 38 | 98 | 111 | 40 |
Subtask B | 10 | 29 | 25 | 10 |
Subtask C | 6 | 19 | 19 | 8 |
Overall | 46 | 146 | 120 | 44 |
Awards
This task will offer two awards, not necessarily based on the evaluation set performance ranking. These awards aim to encourage contestants to openly publish their code, and to use novel and problem-specific approaches which leverage knowledge of the audio domain. We also highly encourage student authorship.
Reproducible system award
Reproducible system award of 500 USD will be offered for the highest scoring method that is open-source and fully reproducible. For full reproducibility, the authors must provide all the information needed to run the system and achieve the reported performance. The choice of licence is left to the author, but should ideally be selected among the ones approved by the Open Source Initiative.
Judges’ award
Judges’ award of 500 USD will be offered for the method considered by the judges to be the most interesting or innovative. Criteria considered for this award include but are not limited to: originality, complexity, student participation, open-source, etc. Single model approaches are strongly preferred over ensembles; occasionally, small ensembles of different models can be considered, if the approach is innovative.
More information can be found on the Award page.
The awards are sponsored by
Baseline system
The baseline system provides a simple entry-level state-of-the-art approach that gives reasonable results in the subtasks of Task 1. The baseline system is built on dcase_util toolbox.
The system has all needed functionality for the dataset handling, acoustic feature storing and accessing, acoustic model training and storing, and evaluation. The modular structure of the system enables participants to modify the system to their needs. The baseline system is a good starting point especially for the entry level researchers to familiarize themselves with the acoustic scene classification problem.
Repository
System description
The baseline system implements a convolutional neural network (CNN) based approach, where log mel-band energies are first extracted for each 10-second signal, and a network consisting of two CNN layers and one fully connected layer is trained to assign scene labels to the audio signals.
The baseline system is built on dcase_util toolbox. The machine learning part of the code in built on Keras (v2.2.2), using TensorFlow (v1.9.0) as backend.
Parameters
Acoustic features
- Analysis frame 40 ms (50% hop size)
- Log mel-band energies (40 bands)
Neural network
- Input shape: 40 * 500 (10 seconds)
-
Architecture:
- CNN layer #1
- 2D Convolutional layer (filters: 32, kernel size: 7) + Batch normalization + ReLu activation
- 2D max pooling (pool size: (5, 5)) + Dropout (rate: 30%)
- CNN layer #2
- 2D Convolutional layer (filters: 64, kernel size: 7) + Batch normalization + ReLu activation
- 2D max pooling (pool size: (4, 100)) + Dropout (rate: 30%)
- Flatten
- Dense layer #1
- Dense layer (units: 100, activation: ReLu )
- Dropout (rate: 30%)
- Output layer (activation: softmax/sigmoid)
- CNN layer #1
-
Learning (epochs: 200, batch size: 16, data shuffling between epochs)
- Optimizer: Adam (learning rate: 0.001)
-
Model selection:
- Approximately 30% of the original training data is assigned to validation set, split done such that training and validation sets do not have segments from the same location and both sets have data from each city
- Model performance after each epoch is evaluated on the validation set, and best performing model is selected
For Task 1A and 1B systems, the activation function for the output layer is Softmax and decision is made based on maximum output. For Task 1C, the activation function for the output layer is Sigmoid and decision is made based on threshold value (0.5); if at least one of the class values is over the threshold, the most probable target scene class is chosen, if all values are under the threshold, unknown
scene class is chosen.
Results for the development dataset
Results are calculated using TensorFlow in GPU mode (using Nvidia Titan XP GPU card). Because results produced with GPU card are generally non-deterministic, the system was trained and tested 10 times; mean and standard deviation of the performance from these 10 independent trials are shown in the results tables.
Subtask A
Scene label | Accuracy |
---|---|
Airport | 48.4 % |
Bus | 62.3 % |
Metro | 65.1 % |
Metro station | 54.5 % |
Park | 83.1 % |
Public square | 40.7 % |
Shopping mall | 59.4 % |
Street, pedestrian | 60.9 % |
Street, traffic | 86.7 % |
Tram | 64.0 % |
Average | 62.5 % (± 0.6) |
Note: The reported baseline system performance is not exactly reproducible due to varying setups. However, you should be able obtain very similar results.
Subtask B
Material from all three devices (A, B and C) are used for training amd testing. Results are calculated the same way as for subtask A, with mean and standard deviation of the performance from 10 independent trials shown in the results table.
Remember that ranking in this subtask will be done by devices B and C (third column in this table).
Scene label | Device B | Device C | Average (B,C) | Device A |
---|---|---|---|---|
Airport | 18.3 % | 24.1 % | 21.2 % | 51.2 % |
Bus | 40.4 % | 70.0 % | 55.2 % | 68.0 % |
Metro | 50.7 % | 36.1 % | 43.4 % | 62.4 % |
Metro station | 28.7% | 36.1 % | 30.0 % | 54.4 % |
Park | 45.2 % | 57.0 % | 51.1 % | 80.4 % |
Public square | 22.8 % | 11.3 % | 17.0 % | 35.4 % |
Shopping mall | 63.5 % | 64.8 % | 64.2 % | 64.4 % |
Street, pedestrian | 37.0 % | 37.6 % | 37.3 % | 63.3 % |
Street, traffic | 77.0 % | 86.5 % | 81.8 % | 85.8 % |
Tram | 12.0 % | 12.6 % | 12.3 % | 52.2 % |
Average | 39.6 % (± 2.7) | 43.1 % (± 2.2) | 41.4 % (± 1.7) | 61.9 % (± 0.8) |
Note: The reported baseline system performance is not exactly reproducible due to varying setups. However, you should be able obtain very similar results.
Subtask C
Scene label | Accuracy |
---|---|
Airport | 44.1 % |
Bus | 59.2 % |
Metro | 51.5 % |
Metro station | 41.3 % |
Park | 74.0 % |
Public square | 34.7 % |
Shopping mall | 50.9 % |
Street, pedestrian | 47.5 % |
Street, traffic | 78.4 % |
Tram | 60.7 % |
Class Average | 54.2 % |
Unknown | 43.1 % |
Accuracy (Class Average | Unknown) | 48.7 % (± 3.2) |
Note: The reported baseline system performance is not exactly reproducible due to varying setups. However, you should be able obtain very similar results.
Citation
If you are participating to this task or using the dataset or baseline code please cite the following paper:
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. A multi-device dataset for urban acoustic scene classification. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018), 9–13. November 2018. URL: https://arxiv.org/abs/1807.09840.
A multi-device dataset for urban acoustic scene classification
Abstract
This paper introduces the acoustic scene classification task of DCASE 2018 Challenge and the TUT Urban Acoustic Scenes 2018 dataset provided for the task, and evaluates the performance of a baseline system in the task. As in previous years of the challenge, the task is defined for classification of short audio samples into one of predefined acoustic scene classes, using a supervised, closed-set classification setup. The newly recorded TUT Urban Acoustic Scenes 2018 dataset consists of ten different acoustic scenes and was recorded in six large European cities, therefore it has a higher acoustic variability than the previous datasets used for this task, and in addition to high-quality binaural recordings, it also includes data recorded with mobile devices. We also present the baseline system consisting of a convolutional neural network and its performance in the subtasks using the recommended cross-validation setup.
Keywords
Acoustic scene classification, DCASE challenge, public datasets, multi-device data