The goal of acoustic scene classification is to classify a test recording into one of the provided predefined classes that characterizes the environment in which it was recorded.

Challenge has ended. Full results for this task can be found in subtask specific result pages: Task1A Task1B Task1C

This task comprises three different subtasks that involve system development for three different situations:

A Match Task 1

Acoustic Scene Classification
Subtask A

Classification of data from the same device as the available training data.

B Mismatch Task 1

Acoustic Scene Classification with mismatched recording devices
Subtask B

Classification of data recorded with devices different than the training data.

C OpenSet Task 1

Open set Acoustic Scene Classification
Subtask C

Classification on data that includes classes not encountered in the training data.

Description

The goal of acoustic scene classification is to classify a test recording into one of the provided predefined classes that characterizes the environment in which it was recorded — for example "park", "pedestrian street", "metro station" — or to indicate it is from a different, unknown environment.

Figure 1: Overview of acoustic scene classification system.

Audio dataset

The dataset for this task is the TAU Urban Acoustic Scenes 2019 dataset, consisting of recordings from various acoustic scenes. This dataset extends the TUT Urban Acoustic Scenes 2018 dataset with other 6 cities to a total of 12 large European cities. For each scene class, recordings were done in different locations; for each recording location there are 5-6 minutes of audio. The original recordings were split into segments with a length of 10 seconds that are provided in individual files. Available information about the recordings include the following: acoustic scene class, city, and recording location.

Acoustic scenes (10):

Airport - airport
Indoor shopping mall - shopping_mall
Metro station - metro_station
Pedestrian street - street_pedestrian
Public square - public_square
Street with medium level of traffic - street_traffic
Travelling by a tram - tram
Travelling by a bus - bus
Travelling by an underground metro - metro
Urban park - park

Data was recorded in the following cities:

Amsterdam
Barcelona
Helsinki
Lisbon
London
Lyon
Madrid
Milan
Prague
Paris
Stockholm
Vienna

Recording procedure

Recordings were made using four devices that captured audio simultaneously.

The main recording device consists in Soundman OKM II Klassik/studio A3, electret binaural microphone and a Zoom F8 audio recorder using 48kHz sampling rate and 24 bit resolution. The microphones are specifically made to look like headphones, being worn in the ears. As an effect of this, the recorded audio is very similar to the sound that reaches the human auditory system of the person wearing the equipment. This equipment is further referred to as device A.

The other devices are commonly available customer devices: device B is a Samsung Galaxy S7, device C is IPhone SE, and device D is a GoPro Hero5 Session. All simultaneous recordings are time synchronized.

The dataset was collected by Tampere University of Technology between 05/2018 - 11/2018. The data collection received funding from the European Research Council, grant agreement 637422 EVERYSOUND.

Development and evaluation datasets

Different versions of the dataset are provided depending on the task.

TAU Urban Acoustic Scenes 2019 development dataset contains only material recorded with device A, containing 40 hours of audio, balanced between classes. The data comes from 10 of the 12 cities. TAU Urban Acoustic Scenes 2019 evaluation dataset contains data from all 12 cities.

TAU Urban Acoustic Scenes 2019 Mobile development dataset contains material recorded with devices A, B and C. It is composed of TAU Urban Acoustic Scenes 2019 data recorded with device A, and some amount of parallel audio recorded with devices B and C. Data from device A was resampled and averaged into a single channel, to align with the properties of the data recorded with devices B and C. The dataset contains in total 46 hours of audio (40h + 3h + 3h). TAU Urban Acoustic Scenes 2019 Mobile evaluation dataset contains also data from device D.

TAU Urban Acoustic Scenes 2019 Open set development dataset contains only material recorded with device A, being composed of TAU Urban Acoustic Scenes 2019 and additional audio examples for the open classification problem. The "open" data consists of the "beach" and "office" classes of TUT Acoustic Scenes 2017 dataset and other material recorded in 2019. The dataset contains in total 46 hours of audio (40h + 6h). TAU Urban Acoustic Scenes 2019 Open set evaluation dataset contains data from the 10 known classes, and other unknown ones.

Reference labels

Reference labels are provided only for the development datasets. Reference labels for evaluation dataset or leaderboard dataset will not be released. For publications based on the DCASE challenge data, please use the provided training/test setup of the development set, to allow comparisons. After the challenge, if you want to evaluate your proposed system with official challenge evaluation setup, contact the task coordinators. Task coordinators can provide unofficial scoring for limited amount of system outputs.

Download

version 2

Dataset was updated on 12 March 2019 to include train/test setup (version 2). In order to update already downloaded the dataset version 1, update only TAU-urban-acoustic-scenes-2019-openset-development.meta.zip file.

TAU Urban Acoustic Scenes 2019 Openset, Leaderboard dataset (1.4 GB)

TAU Urban Acoustic Scenes 2019 Openset, Evaluation dataset (8.2 GB)

Task setup

For each subtask, a development set is provided, together with a training/test partitioning for system development. Participants are required to report performance of their system using this train/test setup in order to allow comparison of systems on the development set.

Subtask A

A Match Task 1 Acoustic Scene Classification

This subtask is concerned with the basic problem of acoustic scene classification, in which all data (development and evaluation) are recorded with the same device, in this case device A, and contains only data from the 10 known acoustic scene classes. The subtask uses TAU Urban Acoustic Scenes 2019 dataset.

Development dataset

The development dataset consists of recordings from ten cities; the training subset contains recordings from only 9 of the cities, to test the generalization properties of the systems. The training/test subsets are created based on the recording location such that the training subset contains approximately 70% of recording locations from each city. The test subset contains recordings from the rest of the locations, and few locations from the tenth city. Full data from the tenth city is provided, but partly unused in this setup, to reflect the final evaluation setup.
The development set contains 40 hours of data, with 14400 segments (144 per city per acoustic scene class). The training/test setup includes segments from Milan only to the test subset. There are 9185 segments in the training set, 4185 in the test set, and additional 1030 segments from Milan. For complete details on the dataset, check the readme file provided with the data.

Participants are allowed to create their own cross-validation folds or separate validation set. In this case please pay attention to the segments recorded at same location. Location identifier can be found from metadata file provided in the dataset or from audio file names:

[scene label]-[city]-[location id]-[segment id]-[device id].wav

Make sure that all files having same location id are placed on the same side of the evaluation. In this subtask, device id is always a.

Evaluation dataset

The evaluation dataset contains 20 hours of audio data from 12 cities (2 cities not encountered in development set), and it is provided without ground truth. Participants should run their system for this dataset, and submit the classification results (system output) to DCASE2019 Challenge.

Subtask B

B Mismatch Task 1 Acoustic Scene Classification with mismatched recording devices

This subtask is concerned with the situation in which an application will be tested with different devices, possibly not the same as the ones used to record the development data. In this case, evaluation data contains more devices than the development data. The subtask uses TAU Urban Acoustic Scenes 2019 Mobile dataset.

Development dataset

The development set consists of data recorded with 3 devices: A, B and C. This includes all data from the development set of subtask A (40 hours), partitioned in the same way. In addition, parallel recordings are provided from devices B and C, amounting to 3 hours for each. From devices B and C, half of the data is included to the training subset, half to the test subset. The development set contains in total 46 hours of data, with 16560 segments, of which 14400 from device A, 1080 from device B, 1080 from device C. There are 10265 segments in the training set (9185 for device A, 540 for device B, and 540 for device C), 5265 in the test set (4185 for device A, 540 for device B, and 540 device C), and additional 1030 segments from Milan. For complete details on the dataset, check the readme file provided with the data.

[scene label]-[city]-[location id]-[segment id]-[device id].wav

Make sure that all files having same location id are placed on the same side of the evaluation. In this subtask, device id can be a, b or c.

Evaluation dataset

The evaluation dataset contains data from all 4 devices, including device D that was not available in the development set. It contains 30 hours of audio and it is provided without ground truth. Participants should run their system for this dataset, and submit the classification results (system output) to DCASE2019 Challenge.

Subtask C

C OpenSet Task 1 Open set Acoustic Scene Classification

This subtask is concerned with acoustic scene classification where the test recording may be from a different environment than the 10 target classes, in which case it should be classified as "unknown", in a so-called open-set classification setup. The subtask uses TAU Urban Acoustic Scenes 2019 Openset dataset and some additional data providing examples of "unknown" acoustic scenes.

Participants should make good use of external data in order to model the case of scenes not encountered within the training data. The provided examples allow only limited generalization, and may overfit to their original dataset due to lack of sufficient variety.

Figure 1: Overview of acoustic scene classification system capable recognizing unknown scene class.

Development dataset

The development dataset consists of data from the 10 target classes and additional "unknown" class examples. The dataset includes all data from the development set of Subtask A (40 hours), partitioned in the same way. In addition, recordings are provided for modeling and testing the open-set classification task. The unknown class consists of audio examples from TUT Acoustic Scenes 2017 dataset and new material recorded during the collection of TAU Urban Acoustic Scenes 2019 dataset. The development set contains 44 hours of data (40+4), with 15850 segments (14400 of ten scene classes + 1450 unknown class). Complete details on the dataset are provided in the readme file. In addition, correspondence of "unknown" class examples with their original acoustic scenes and file names is provided in meta_unknown.csv.

[scene label]-[city]-[location id]-[segment id]-[device id].wav

Make sure that all files having same location id are placed on the same side of the evaluation. In this subtask, device id is always a.

Evaluation dataset

The evaluation dataset contains 20 hours of audio data, of which part is recorded in one of the 10 known classes, and part in other, unknown environments, different than the ones in the development set. The evaluation dataset is provided without ground truth. Participants should run their system for this dataset, and submit the classification results (system output) to DCASE2019 Challenge.

External data resources

Use of external data is allowed in all subtasks under the following conditions:

The used external resource is clearly referenced and freely accessible to any other research group in the world. External data refers to public datasets or trained models. The dataset/models must be public and freely available before 1st of April 2019.
Participants submit at least one system without external training data so that we can study the contribution of such resources. The list of external data sources used in training must be clearly indicated in the technical report.
Participants inform the organizers in advance about such data sources, so that all competitors know about them and have equal opportunity to use them; please send and email to the task coordinators; we will update the list of external datasets on the webpage accordingly. Once the evaluation set is published, the list of allowed external data resources is locked (no further external sources allowed).
It is not allowed to use TUT Acoustic Scenes 2016, TUT Acoustic Scenes 2017 and TUT Urban Acoustic Scenes 2018. These datasets are partially included in the current setup, and additional usage will lead to overfitting.

List of external datasets allowed:

Dataset name	Type	Added	Link
LITIS Rouen audio scene dataset	audio	04.03.2019	https://sites.google.com/site/alainrakotomamonjy/home/audio-scene
DCASE2013 Challenge - Public Dataset for Scene Classification Task	audio	04.03.2019	https://archive.org/details/dcase2013_scene_classification
DCASE2013 Challenge - Private Dataset for Scene Classification Task	audio	04.03.2019	https://archive.org/details/dcase2013_scene_classification_testset
Dares G1	audio	04.03.2019	http://www.daresounds.org/
AudioSet	audio	04.03.2019	https://research.google.com/audioset/

Participants cannot suggest data to this list anymore (list locked 27th of May 2019).

Submission

Participants can choose subtasks they participate, there is no requirement to participate all of them. Official challenge submission consists of a technical report and system output for the evaluation data.

System output should be presented as a single text-file (in CSV format, without header row) containing classification result for each audio file in the evaluation set. Result items can be in any order. Format:

[filename (string)][tab][scene label (string)]

Multiple system outputs can be submitted (maximum 4 per participant per subtask). For each system, meta information should be provided in a separate file, containing the task specific information as given in the example here. All files should be packaged into a zip file for submission. Please carefully mark the connection between the submitted files and the corresponding system or system parameters (for example by naming the text file appropriately).

When training the final system for submission, participants can of course use the entire development set. In the technical report, participants should include system results on the training/test setup provided with the development set.

Detailed information for the submission can be found on the Submission page.

Public leaderboards

During the challenge, a public leaderboard will be provided using a separate public evaluation dataset for each subtask. The leaderboards are organized through Kaggle InClass competitions. Leaderboards are meant to serve as a development tool for participants, and does not have an official role in the challenge.

Due to Kaggle / US Government policy, people who are residents of certain countries (Cuba, Iran, Syria, North Korea, and Sudan) are unable to participate in the Kaggle competitions (see Kaggle terms, section 7 What are the rules for competitions on Kaggle?). As DCASE is committed to open science open to everybody, in case these Kaggle restrictions are preventing you from using the Kaggle based leaderboard during the development, please contact task 1 organizers and we will provide similar service outside Kaggle.

A Match Task 1 Subtask A Leaderboard

B Mismatch Task 1 Subtask B Leaderboard

C OpenSet Task 1 Subtask C Leaderboard

The official DCASE challenge submission will not be done through these Kaggle InClass competitions.

Datasets

For public leaderboard submissions, participants should use the official challenge development datasets to train their system as in DCASE challenge. Separate datasets, leaderboard datasets, are released to be used as evaluation datasets in the competitions. These leaderboard datasets consist of a small subset of the official evaluation dataset, with similar properties (distribution). The material amount in the leaderboard dataset is considerably lower than the official evaluation material in the DCASE challenge.

It is not allowed to use the leaderboard datasets to train the systems in any DCASE challenge subtasks or leaderboard competitions.

TAU Urban Acoustic Scenes 2019, Leaderboard dataset (3.0 GB)

TAU Urban Acoustic Scenes 2019 Mobile, Leaderboard dataset (1.4 GB)

TAU Urban Acoustic Scenes 2019 Openset, Leaderboard dataset (1.4 GB)

Task rules

There are general rules valid for all tasks; these, along with information on technical report and submission requirements can be found here.

Task specific rules:

Use of external data is allowed, except TUT Acoustic Scenes 2016, TUT Acoustic Scenes 2017, TUT Urban Acoustic Scenes 2018 and leaderboard datasets (DCASE2018 and DCASE2019).
Manipulation of provided training and development data is allowed (e.g. by mixing data sampled from a pdf or using techniques such as pitch shifting or time stretching).
Participants are not allowed to make subjective judgments of the evaluation data, nor to annotate it. The evaluation dataset cannot be used to train the submitted system; the use of statistics about the evaluation data in the decision making is also forbidden. Separately published leaderboard data is considered as evaluation data as well.
Classification decision must be done independently for each test sample.

Evaluation

The scoring of acoustic scene classification will be based on classification accuracy: the number of correctly classified segments among the total number of segments. Each segment is considered an independent test sample. Accuracy will be calculated as average of the class-wise accuracy.

Participants can use sed_eval toolbox for the evaluation:

sed_eval - Evaluation toolbox for Sound Event Detection

Ranking

Subtask A will use the overall accuracy on the evaluation data.
Subtask B will use the overall accuracy on data from devices B and C.
Subtask C will use the weighted average of the known classes and unknown class:

\begin{equation} ACC_{weighted} = 0.5 * ACC_{known~classes} + 0.5 * ACC_{unknown~classes} \end{equation}

Results

Subtask A

Rank	Submission Information
Rank	Code	Author	Affiliation	Technical Report	Accuracy with 95% confidence interval
	Bilot_IDG_task1a_1	Valentin Bilot	Audio R&D, InterDigital R&D, Rennes, France	task-acoustic-scene-classification-results-a#Bilot2019	66.1 (65.0 - 67.2)
	Bilot_IDG_task1a_2	Valentin Bilot	Audio R&D, InterDigital R&D, Rennes, France	task-acoustic-scene-classification-results-a#Bilot2019	67.3 (66.3 - 68.4)
	Bilot_IDG_task1a_3	Valentin Bilot	Audio R&D, InterDigital R&D, Rennes, France	task-acoustic-scene-classification-results-a#Bilot2019	64.5 (63.4 - 65.6)
	Bilot_IDG_task1a_4	Valentin Bilot	Audio R&D, InterDigital R&D, Rennes, France	task-acoustic-scene-classification-results-a#Bilot2019	68.3 (67.3 - 69.4)
	Chandrasekhar_IIITH_task1a_1	Chandrasekhar Paseddula	International Institute of Information Technology, Hyderabad department:Electronics and Communication Engineering, Hyderabad, India	task-acoustic-scene-classification-results-a#Paseddula2019	52.6 (51.4 - 53.7)
	DSPLAB_TJU_task1a_1	Jinhua Liang	School of Electrical and Information Engineering, TianJin University, Tianjin, China	task-acoustic-scene-classification-results-a#Ding2019	66.5 (65.4 - 67.6)
	DSPLAB_TJU_task1a_2	Jinhua Liang	School of Electrical and Information Engineering, TianJin University, Tianjin, China	task-acoustic-scene-classification-results-a#Ding2019	69.6 (68.5 - 70.6)
	DSPLAB_TJU_task1a_3	Jinhua Liang	School of Electrical and Information Engineering, TianJin University, Tianjin, China	task-acoustic-scene-classification-results-a#Ding2019	65.0 (63.9 - 66.1)
	DSPLAB_TJU_task1a_4	Jinhua Liang	School of Electrical and Information Engineering, TianJin University, Tianjin, China	task-acoustic-scene-classification-results-a#Ding2019	69.5 (68.4 - 70.5)
	Fmta91_KNToosi_task1a_1	fateme Arabnezhad	Computer Engineering Department, Khaje Nasir Toosi, Tehran, Iran	task-acoustic-scene-classification-results-a#Arabnezhad2019	76.2 (75.2 - 77.2)
	Fraile_UPM_task1a_1	Ruben Fraile	CITSEM, Universidad Politecnica de Madrid, Madrid, Spain	task-acoustic-scene-classification-results-a#Fraile2019	58.7 (57.6 - 59.9)
	DCASE2019 baseline	Toni Heittola	Computing Sciences, Tampere University, Tampere, Finland	task-acoustic-scene-classification-results-a#Heittola2019	63.3 (62.2 - 64.5)
	Huang_IL_task1a_1	Paulo Lopez Meyer	Intel Labs, Intel Corporation, Zapopan, Jalisco, Mexico	task-acoustic-scene-classification-results-a#Huang2019	80.5 (79.6 - 81.4)
	Huang_IL_task1a_2	Paulo Lopez Meyer	Intel Labs, Intel Corporation, Zapopan, Jalisco, Mexico	task-acoustic-scene-classification-results-a#Huang2019	81.1 (80.2 - 82.0)
	Huang_IL_task1a_3	Paulo Lopez Meyer	Intel Labs, Intel Corporation, Zapopan, Jalisco, Mexico	task-acoustic-scene-classification-results-a#Huang2019	81.3 (80.4 - 82.2)
	Huang_IL_task1a_4	Paulo Lopez Meyer	Intel Labs, Intel Corporation, Zapopan, Jalisco, Mexico	task-acoustic-scene-classification-results-a#Huang2019	79.5 (78.6 - 80.5)
	Huang_SCNU_task1a_1	Zhenyi Huang	School of Computer, South China Normal University, Guangzhou, China	task-acoustic-scene-classification-results-a#Huang2019a	79.2 (78.3 - 80.1)
	JSNU_WDXY_task1a_1	Xinixn Ma	School of Physics and Electronic, Jiangsu Normal University, Xuzhou, China	task-acoustic-scene-classification-results-a#Ma2019	72.2 (71.1 - 73.2)
	Jung_UOS_task1a_1	Ha-Jin Yu	Computing Sciences, Univerisity of Seoul, Seoul, Republic of Korea	task-acoustic-scene-classification-results-a#Jung2019	81.1 (80.2 - 82.0)
	Jung_UOS_task1a_2	Ha-jin Yu	Computing Sciences, Univerisity of Seoul, Seoul, Republic of Korea	task-acoustic-scene-classification-results-a#Jung2019	81.2 (80.3 - 82.1)
	Jung_UOS_task1a_3	Ha-jin Yu	Computing Sciences, Univerisity of Seoul, Seoul, Republic of Korea	task-acoustic-scene-classification-results-a#Jung2019	81.0 (80.1 - 81.9)
	Jung_UOS_task1a_4	Ha-jin Yu	Computing Sciences, Univerisity of Seoul, Seoul, Republic of Korea	task-acoustic-scene-classification-results-a#Jung2019	81.2 (80.3 - 82.1)
	KK_I2R_task1a_1	Teh KK	I2R, A-star, Singapore	task-acoustic-scene-classification-results-a#KK2019	76.6 (75.6 - 77.6)
	KK_I2R_task1a_2	Teh KK	I2R, A-star, Singapore	task-acoustic-scene-classification-results-a#KK2019	77.7 (76.7 - 78.6)
	KK_I2R_task1a_3	Teh KK	I2R, A-star, Singapore	task-acoustic-scene-classification-results-a#KK2019	77.2 (76.2 - 78.2)
	Kong_SURREY_task1a_1	Qiuqiang Kong	Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, England	task-acoustic-scene-classification-results-a#Kong2019	70.5 (69.5 - 71.6)
	Koutini_CPJKU_task1a_1	Khaled Koutini	Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria	task-acoustic-scene-classification-results-a#Koutini2019	82.8 (82.0 - 83.7)
	Koutini_CPJKU_task1a_2	Khaled Koutini	Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria	task-acoustic-scene-classification-results-a#Koutini2019	83.7 (82.9 - 84.6)
	Koutini_CPJKU_task1a_3	Khaled Koutini	Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria	task-acoustic-scene-classification-results-a#Koutini2019	83.5 (82.6 - 84.4)
	Koutini_CPJKU_task1a_4	Khaled Koutini	Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria	task-acoustic-scene-classification-results-a#Koutini2019	83.8 (82.9 - 84.6)
	LamPham_HCMGroup_task1a_1	Lam Pham	School of Computing, University of Kent, Chatham, United Kingdom	task-acoustic-scene-classification-results-a#Pham2019	73.9 (72.9 - 74.9)
	LamPham_KentGroup_task1a_1	Lam Pham	School of Computing, University of Kent, Chatham, United Kingdom	task-acoustic-scene-classification-results-a#Pham2019a	76.8 (75.8 - 77.7)
	Lei_CQU_task1a_1	Chongqin Lei	Intelligent Information Technology and System Lab, CHONGQING UNIVERSITY, Chongqing, China	task-acoustic-scene-classification-results-a#Lei2019	75.5 (74.5 - 76.5)
	Li_NPU_task1a_1	Ning FangLi	Mechanical Engineering, Northwestern Polytechnical University School, 127 West Youyi Road, Xi'an, 710072, China	task-acoustic-scene-classification-results-a#FangLi2019	59.9 (58.8 - 61.0)
	Li_NPU_task1a_2	Ning FangLi	Mechanical Engineering, Northwestern Polytechnical University School, 127 West Youyi Road, Xi'an, 710072, China	task-acoustic-scene-classification-results-a#FangLi2019	61.8 (60.7 - 62.9)
	Liang_HUST_task1a_1	Han Liang	Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, China	task-acoustic-scene-classification-results-a#Liang2019	68.2 (67.1 - 69.2)
	Liang_HUST_task1a_2	Han Liang	Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, China	task-acoustic-scene-classification-results-a#Liang2019	66.4 (65.3 - 67.5)
	Liu_SCUT_task1a_1	Liu Mingle	School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province	task-acoustic-scene-classification-results-a#Mingle2019	78.3 (77.4 - 79.3)
	Liu_SCUT_task1a_2	Liu Mingle	School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province	task-acoustic-scene-classification-results-a#Mingle2019	79.9 (79.0 - 80.8)
	Liu_SCUT_task1a_3	Liu Mingle	School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province	task-acoustic-scene-classification-results-a#Mingle2019	78.3 (77.3 - 79.2)
	Liu_SCUT_task1a_4	Liu Mingle	School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province	task-acoustic-scene-classification-results-a#Mingle2019	78.4 (77.4 - 79.3)
	MaLiu_BIT_task1a_1	Sifan Ma	Laboratory of Modern Communication, Beijing Institute of Technology, Beijing, China	task-acoustic-scene-classification-results-a#Ma2019a	72.8 (71.8 - 73.8)
	MaLiu_BIT_task1a_2	Wei Liu	Laboratory of Modern Communication, Beijing Institute of Technology, Beijing, China	task-acoustic-scene-classification-results-a#Liu2019	76.0 (75.1 - 77.0)
	MaLiu_BIT_task1a_3	Sifan Ma	Laboratory of Modern Communication, Beijing Institute of Technology, Beijing, China	task-acoustic-scene-classification-results-a#Ma2019a	73.3 (72.3 - 74.3)
	Mars_PRDCSG_task1a_1	Rohith Mars	Core Technology Group, Panasonic R&D Center, Singapore, Singapore	task-acoustic-scene-classification-results-a#Mars2019	79.3 (78.3 - 80.2)
	McDonnell_USA_task1a_1	Mark McDonnell	School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia	task-acoustic-scene-classification-results-a#Gao2019	80.0 (79.0 - 80.9)
	McDonnell_USA_task1a_2	Mark McDonnell	School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia	task-acoustic-scene-classification-results-a#Gao2019	80.5 (79.6 - 81.4)
	McDonnell_USA_task1a_3	Mark McDonnell	School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia	task-acoustic-scene-classification-results-a#Gao2019	80.4 (79.5 - 81.3)
	McDonnell_USA_task1a_4	Mark McDonnell	School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia	task-acoustic-scene-classification-results-a#Gao2019	80.3 (79.4 - 81.2)
	Naranjo-Alcazar_VfyAI_task1a_1	Javier Naranjo-Alcazar	Visualfy AI, Visualfy, Benisano, Spain	task-acoustic-scene-classification-results-a#Naranjo-Alcazar2019	74.1 (73.1 - 75.2)
	Naranjo-Alcazar_VfyAI_task1a_2	Javier Naranjo-Alcazar	Visualfy AI, Visualfy, Benisano, Spain	task-acoustic-scene-classification-results-a#Naranjo-Alcazar2019	74.2 (73.2 - 75.2)
	Naranjo-Alcazar_VfyAI_task1a_3	Javier Naranjo-Alcazar	Visualfy AI, Visualfy, Benisano, Spain	task-acoustic-scene-classification-results-a#Naranjo-Alcazar2019	74.0 (73.0 - 75.0)
	Naranjo-Alcazar_VfyAI_task1a_4	Javier Naranjo-Alcazar	Visualfy AI, Visualfy, Benisano, Spain	task-acoustic-scene-classification-results-a#Naranjo-Alcazar2019	74.1 (73.1 - 75.1)
	Plata_SRPOL_task1a_1	Marcin Plata	Data Intelligence Group, Samsung R&D Institute Poland, Warsaw, Poland	task-acoustic-scene-classification-results-a#Plata2019	78.8 (77.9 - 79.8)
	Plata_SRPOL_task1a_2	Marcin Plata	Data Intelligence Group, Samsung R&D Institute Poland, Warsaw, Poland	task-acoustic-scene-classification-results-a#Plata2019	79.2 (78.3 - 80.1)
	Plata_SRPOL_task1a_3	Marcin Plata	Data Intelligence Group, Samsung R&D Institute Poland, Warsaw, Poland	task-acoustic-scene-classification-results-a#Plata2019	77.2 (76.3 - 78.2)
	Plata_SRPOL_task1a_4	Marcin Plata	Data Intelligence Group, Samsung R&D Institute Poland, Warsaw, Poland	task-acoustic-scene-classification-results-a#Plata2019	77.9 (77.0 - 78.9)
	SSW_ETRI_task1a_1	Suh Sangwon	Realistic AV Research Group, Electronics and Telecommunications Research Institute, Daejeon, Korea	task-acoustic-scene-classification-results-a#Sangwon2019	66.7 (65.6 - 67.8)
	SSW_ETRI_task1a_2	Suh Sangwon	Realistic AV Research Group, Electronics and Telecommunications Research Institute, Daejeon, Korea	task-acoustic-scene-classification-results-a#Sangwon2019	67.0 (65.9 - 68.1)
	SSW_ETRI_task1a_3	Suh Sangwon	Realistic AV Research Group, Electronics and Telecommunications Research Institute, Daejeon, Korea	task-acoustic-scene-classification-results-a#Sangwon2019	67.6 (66.5 - 68.7)
	SSW_ETRI_task1a_4	Suh Sangwon	Realistic AV Research Group, Electronics and Telecommunications Research Institute, Daejeon, Korea	task-acoustic-scene-classification-results-a#Sangwon2019	67.6 (66.5 - 68.7)
	Salvati_DMIF_task1a_1	Daniele Salvati	Mathematics, Computer Science and Physics, University of Udine, Udine, Italy	task-acoustic-scene-classification-results-a#Salvati2019	68.5 (67.5 - 69.6)
	Seo_LGE_task1a_1	Seo Hyeji	Advanced Robotics Lab, LG Electronics, Seoul, Korea	task-acoustic-scene-classification-results-a#Hyeji2019	81.6 (80.7 - 82.5)
	Seo_LGE_task1a_2	Seo Hyeji	Advanced Robotics Lab, LG Electronics, Seoul, Korea	task-acoustic-scene-classification-results-a#Hyeji2019	82.5 (81.6 - 83.4)
	Seo_LGE_task1a_3	Seo Hyeji	Advanced Robotics Lab, LG Electronics, Seoul, Korea	task-acoustic-scene-classification-results-a#Hyeji2019	81.1 (80.2 - 82.0)
	Seo_LGE_task1a_4	Seo Hyeji	Advanced Robotics Lab, LG Electronics, Seoul, Korea	task-acoustic-scene-classification-results-a#Hyeji2019	82.5 (81.7 - 83.4)
	Waldekar_IITKGP_task1a_1	Shefali Waldekar	Electronics and Electrical Communication Engineering Dept., Indian Institute of Technology Kharagpur, Kharagpur, India	task-acoustic-scene-classification-results-a#Waldekar2019	65.9 (64.8 - 67.0)
	Wang_BTBU_task1a_1	Zhuhe Wang	Noise and Vibration Laboratory, Beijing Technology and Business University, Beijing, China	task-acoustic-scene-classification-results-a#Wang2019	32.2 (31.1 - 33.3)
	Wang_NWPU_task1a_1	Mou Wang	School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China	task-acoustic-scene-classification-results-a#Wang2019a_t1	80.6 (79.7 - 81.5)
	Wang_NWPU_task1a_2	Mou Wang	School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China	task-acoustic-scene-classification-results-a#Wang2019a	80.1 (79.1 - 81.0)
	Wang_NWPU_task1a_3	Mou Wang	School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China	task-acoustic-scene-classification-results-a#Wang2019a	76.6 (75.6 - 77.6)
	Wang_NWPU_task1a_4	Mou Wang	School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China	task-acoustic-scene-classification-results-a#Wang2019a	76.8 (75.8 - 77.8)
	Wang_SCUT_task1a_1	Wucheng Wang	School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province	task-acoustic-scene-classification-results-a#Wang2019b	76.4 (75.4 - 77.4)
	Wang_SCUT_task1a_2	Wucheng Wang	School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province	task-acoustic-scene-classification-results-a#Wang2019b	76.6 (75.6 - 77.5)
	Wang_SCUT_task1a_3	Wucheng Wang	School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province	task-acoustic-scene-classification-results-a#Wang2019b	75.9 (74.9 - 76.9)
	Wang_SCUT_task1a_4	Wucheng Wang	School of Electronic and Information Enginnering, South China University of Technology, GuangZhou, GuangDong Province	task-acoustic-scene-classification-results-a#Wang2019b	76.5 (75.5 - 77.5)
	Wilkinghoff_FKIE_task1a_1	Kevin Wilkinghoff	Communication Systems, Fraunhofer Institute for Communication, Information Processing and Ergonomics, Wachtberg, Germany	task-acoustic-scene-classification-results-a#Wilkinghoff2019	74.6 (73.6 - 75.6)
	Wilkinghoff_FKIE_task1a_2	Kevin Wilkinghoff	Communication Systems, Fraunhofer Institute for Communication, Information Processing and Ergonomics, Wachtberg, Germany	task-acoustic-scene-classification-results-a#Wilkinghoff2019	76.2 (75.2 - 77.2)
	Wu_CUHK_task1a_1	Yuzhong Wu	Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, China	task-acoustic-scene-classification-results-a#Wu2019	80.1 (79.1 - 81.0)
	Yang_UESTC_task1a_1	Yang Haocong	Electronic Engineering, University of Electronic Science and Technology of China, Chengdu, China	task-acoustic-scene-classification-results-a#Haocong2019	79.9 (78.9 - 80.8)
	Yang_UESTC_task1a_2	Yang Haocong	Electronic Engineering, University of Electronic Science and Technology of China, Chengdu, China	task-acoustic-scene-classification-results-a#Haocong2019	81.6 (80.7 - 82.5)
	Yang_UESTC_task1a_3	Yang Haocong	Electronic Engineering, University of Electronic Science and Technology of China, Chengdu, China	task-acoustic-scene-classification-results-a#Haocong2019	81.2 (80.3 - 82.1)
	Zeinali_BUT_task1a_1	Hossein Zeinali	Information Technology, Brno University of Technology, Brno, Czech Republic	task-acoustic-scene-classification-results-a#Zeinali2019	78.9 (78.0 - 79.9)
	Zeinali_BUT_task1a_2	Hossein Zeinali	Information Technology, Brno University of Technology, Brno, Czech Republic	task-acoustic-scene-classification-results-a#Zeinali2019	78.9 (77.9 - 79.8)
	Zeinali_BUT_task1a_3	Hossein Zeinali	Information Technology, Brno University of Technology, Brno, Czech Republic	task-acoustic-scene-classification-results-a#Zeinali2019	79.1 (78.1 - 80.0)
	Zhang_IOA_task1a_1	Pengyuan Zhang	Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Beijing, China	task-acoustic-scene-classification-results-a#Chen2019	84.9 (84.1 - 85.7)
	Zhang_IOA_task1a_2	Pengyuan Zhang	Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Beijing, China	task-acoustic-scene-classification-results-a#Chen2019	84.9 (84.1 - 85.8)
	Zhang_IOA_task1a_3	Pengyuan Zhang	Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Beijing, China	task-acoustic-scene-classification-results-a#Chen2019	85.2 (84.4 - 86.0)
	Zhang_IOA_task1a_4	Pengyuan Zhang	Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Beijing, China	task-acoustic-scene-classification-results-a#Chen2019	84.8 (83.9 - 85.6)
	Zheng_USTC_task1a_1	Xu Zheng	Computing Sciences, University of Science of Techonology of China, Hefei,Anhui,China	task-acoustic-scene-classification-results-a#Zheng2019	75.7 (74.7 - 76.7)
	Zheng_USTC_task1a_2	Xu Zheng	Computing Sciences, University of Science of Techonology of China, Hefei,Anhui,China	task-acoustic-scene-classification-results-a#Zheng2019	71.3 (70.3 - 72.4)
	Zheng_USTC_task1a_3	Xu Zheng	Computing Sciences, University of Science of Techonology of China, Hefei,Anhui,China	task-acoustic-scene-classification-results-a#Zheng2019	78.9 (77.9 - 79.8)
	Zhou_Kuaiyu_task1a_1	Nai Zhou	Beijing Kuaiyu Electronics Co., Ltd., Beijing, China	task-acoustic-scene-classification-results-a#Zhou2019_t1	79.8 (78.8 - 80.7)
	Zhou_Kuaiyu_task1a_2	Nai Zhou	Beijing Kuaiyu Electronics Co., Ltd., Beijing, China	task-acoustic-scene-classification-results-a#Zhou2019_t1	79.4 (78.5 - 80.4)
	Zhou_Kuaiyu_task1a_3	Nai Zhou	Beijing Kuaiyu Electronics Co., Ltd., Beijing, China	task-acoustic-scene-classification-results-a#Zhou2019_t1	78.7 (77.7 - 79.6)
	Zhu_SSLabBUPT_task1a_1	Houwei Zhu	Speech Lab, Samsung Research China-Beijing, Beijing, China	task-acoustic-scene-classification-results-a#Zhu2019	79.2 (78.3 - 80.1)
	Zhu_SSLabBUPT_task1a_2	Houwei Zhu	Speech Lab, Samsung Research China-Beijing, Beijing, China	task-acoustic-scene-classification-results-a#Zhu2019	78.8 (77.9 - 79.7)
	Zhu_SSLabBUPT_task1a_3	Houwei Zhu	Speech Lab, Samsung Research China-Beijing, Beijing, China	task-acoustic-scene-classification-results-a#Zhu2019	79.1 (78.2 - 80.1)
	Zhu_SSLabBUPT_task1a_4	Houwei Zhu	Speech Lab, Samsung Research China-Beijing, Beijing, China	task-acoustic-scene-classification-results-a#Zhu2019	78.8 (77.8 - 79.7)

Complete results and technical reports can be found at subtask A results page

Subtask B

Rank	Submission Information
Rank	Code	Author	Affiliation	Technical Report	Accuracy with 95% confidence interval
	Eghbal-zadeh_CPJKU_task1b_1	Khaled Koutini	Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria	task-acoustic-scene-classification-results-b#Eghbal-zadeh2019	74.5 (73.5 - 75.5)
	Eghbal-zadeh_CPJKU_task1b_2	Khaled Koutini	Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria	task-acoustic-scene-classification-results-b#Eghbal-zadeh2019	74.5 (73.5 - 75.5)
	Eghbal-zadeh_CPJKU_task1b_3	Khaled Koutini	Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria	task-acoustic-scene-classification-results-b#Eghbal-zadeh2019	73.4 (72.4 - 74.5)
	Eghbal-zadeh_CPJKU_task1b_4	Khaled Koutini	Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria	task-acoustic-scene-classification-results-b#Eghbal-zadeh2019	73.4 (72.3 - 74.4)
	DCASE2019 baseline	Toni Heittola	Computing Sciences, Tampere University, Tampere, Finland	task-acoustic-scene-classification-results-b#Heittola2019	47.7 (46.5 - 48.8)
	Jiang_UESTC_task1b_1	Shengwang Jiang	School of Communication and Information Engineering, University of Electronic Science and Technology of China, Chengdu, China	task-acoustic-scene-classification-results-b#Jiang2019	70.3 (69.2 - 71.3)
	Jiang_UESTC_task1b_2	Shengwang Jiang	School of Communication and Information Engineering, University of Electronic Science and Technology of China, Chengdu, China	task-acoustic-scene-classification-results-b#Jiang2019	69.9 (68.9 - 71.0)
	Jiang_UESTC_task1b_3	Shengwang Jiang	School of Communication and Information Engineering, University of Electronic Science and Technology of China, Chengdu, China	task-acoustic-scene-classification-results-b#Jiang2019	69.0 (68.0 - 70.1)
	Jiang_UESTC_task1b_4	Shengwang Jiang	School of Communication and Information Engineering, University of Electronic Science and Technology of China, Chengdu, China	task-acoustic-scene-classification-results-b#Jiang2019	69.6 (68.6 - 70.7)
	Kong_SURREY_task1b_1	Qiuqiang Kong	Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, England	task-acoustic-scene-classification-results-b#Kong2019	61.6 (60.4 - 62.7)
	Kosmider_SRPOL_task1b_1	Michał Kośmider	Artificial Intelligence, Samsung R&D Institute Poland, Warsaw, Poland	task-acoustic-scene-classification-results-b#Komider2019	75.1 (74.1 - 76.1)
	Kosmider_SRPOL_task1b_2	Michał Kośmider	Artificial Intelligence, Samsung R&D Institute Poland, Warsaw, Poland	task-acoustic-scene-classification-results-b#Komider2019	75.3 (74.3 - 76.3)
	Kosmider_SRPOL_task1b_3	Michał Kośmider	Artificial Intelligence, Samsung R&D Institute Poland, Warsaw, Poland	task-acoustic-scene-classification-results-b#Komider2019	74.9 (73.9 - 75.9)
	Kosmider_SRPOL_task1b_4	Michał Kośmider	Artificial Intelligence, Samsung R&D Institute Poland, Warsaw, Poland	task-acoustic-scene-classification-results-b#Komider2019	75.2 (74.3 - 76.2)
	LamPham_KentGroup_task1b_1	Lam Pham	School of Computing, University of Kent, Chatham, United Kingdom	task-acoustic-scene-classification-results-b#Pham2019	72.8 (71.8 - 73.8)
	McDonnell_USA_task1b_1	Mark McDonnell	School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia	task-acoustic-scene-classification-results-b#Gao2019	74.2 (73.2 - 75.2)
	McDonnell_USA_task1b_2	Mark McDonnell	School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia	task-acoustic-scene-classification-results-b#Gao2019	74.1 (73.1 - 75.2)
	McDonnell_USA_task1b_3	Mark McDonnell	School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia	task-acoustic-scene-classification-results-b#Gao2019	74.9 (73.9 - 75.9)
	McDonnell_USA_task1b_4	Mark McDonnell	School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia	task-acoustic-scene-classification-results-b#Gao2019	74.4 (73.4 - 75.4)
	Primus_CPJKU_task1b_1	Paul Primus	Computational Perception, Johannes Kepler University Linz, Linz, Austria	task-acoustic-scene-classification-results-b#Primus2019	71.3 (70.2 - 72.3)
	Primus_CPJKU_task1b_2	Paul Primus	Computational Perception, Johannes Kepler University Linz, Linz, Austria	task-acoustic-scene-classification-results-b#Primus2019	73.4 (72.4 - 74.4)
	Primus_CPJKU_task1b_3	Paul Primus	Computational Perception, Johannes Kepler University Linz, Linz, Austria	task-acoustic-scene-classification-results-b#Primus2019	71.6 (70.6 - 72.7)
	Primus_CPJKU_task1b_4	Paul Primus	Computational Perception, Johannes Kepler University Linz, Linz, Austria	task-acoustic-scene-classification-results-b#Primus2019	74.2 (73.2 - 75.2)
	Song_HIT_task1b_1	Hongwei Song	Computer Sciences and Technology, Harbin Institute of Technology, Harbin, China	task-acoustic-scene-classification-results-b#Song2019	67.3 (66.2 - 68.3)
	Song_HIT_task1b_2	Hongwei Song	Computer Sciences and Technology, Harbin Institute of Technology, Harbin, China	task-acoustic-scene-classification-results-b#Song2019	72.2 (71.2 - 73.3)
	Song_HIT_task1b_3	Hongwei Song	Computer Sciences and Technology, Harbin Institute of Technology, Harbin, China	task-acoustic-scene-classification-results-b#Song2019	72.1 (71.1 - 73.1)
	Waldekar_IITKGP_task1b_1	Shefali Waldekar	Electronics and Electrical Communication Engineering Dept., Indian Institute of Technology Kharagpur, Kharagpur, India	task-acoustic-scene-classification-results-b#Waldekar2019	62.1 (60.9 - 63.2)
	Wang_NWPU_task1b_1	Rui Wang	School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China	task-acoustic-scene-classification-results-b#Wang2019	65.7 (64.6 - 66.8)
	Wang_NWPU_task1b_2	Rui Wang	School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China	task-acoustic-scene-classification-results-b#Wang2019	68.5 (67.4 - 69.6)
	Wang_NWPU_task1b_3	Rui Wang	School of Marine Sciences and Technology, Northwestern Polytechnical University, Xi'an, China	task-acoustic-scene-classification-results-b#Wang2019	70.3 (69.3 - 71.4)

Complete results and technical reports can be found at subtask B results page

Subtask C

Rank	Submission Information
Rank	Code	Author	Affiliation	Technical Report	Accuracy with 95% confidence interval
	DCASE2019 baseline	Toni Heittola	Computing Sciences, Tampere University, Tampere, Finland	task-acoustic-scene-classification-results-c#Heittola2019	47.6 (47.1 - 48.0)
	Kong_SURREY_task1c_1	Qiuqiang Kong	Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, England	task-acoustic-scene-classification-results-c#Kong2019	50.7 (50.2 - 51.2)
	Lehner_SAL_task1c_1	Bernhard Lehner	Silicon Austria Labs, JKU, Linz, Austria	task-acoustic-scene-classification-results-c#Lehner2019	58.7 (58.1 - 59.2)
	Lehner_SAL_task1c_2	Bernhard Lehner	Silicon Austria Labs, JKU, Linz, Austria	task-acoustic-scene-classification-results-c#Lehner2019	61.3 (60.7 - 61.9)
	Lehner_SAL_task1c_3	Bernhard Lehner	Silicon Austria Labs, JKU, Linz, Austria	task-acoustic-scene-classification-results-c#Lehner2019	60.9 (60.3 - 61.5)
	Lehner_SAL_task1c_4	Bernhard Lehner	Silicon Austria Labs, JKU, Linz, Austria	task-acoustic-scene-classification-results-c#Lehner2019	60.5 (59.9 - 61.1)
	McDonnell_USA_task1c_1	Mark McDonnell	School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia	task-acoustic-scene-classification-results-c#Gao2019	58.2 (57.6 - 58.7)
	McDonnell_USA_task1c_2	Mark McDonnell	School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia	task-acoustic-scene-classification-results-c#Gao2019	58.0 (57.5 - 58.6)
	McDonnell_USA_task1c_3	Mark McDonnell	School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia	task-acoustic-scene-classification-results-c#Gao2019	58.8 (58.2 - 59.4)
	McDonnell_USA_task1c_4	Mark McDonnell	School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, Australia	task-acoustic-scene-classification-results-c#Gao2019	58.4 (57.9 - 59.0)
	Rakowski_SRPOL_task1c_1	Alexander Rakowski	Audio Intelligence, Samsung R&D Institute Poland, Warsaw, Poland	task-acoustic-scene-classification-results-c#Rakowski2019_t1	57.2 (56.6 - 57.8)
	Rakowski_SRPOL_task1c_2	Alexander Rakowski	Audio Intelligence, Samsung R&D Institute Poland, Warsaw, Poland	task-acoustic-scene-classification-results-c#Rakowski2019_t1	57.2 (56.6 - 57.8)
	Rakowski_SRPOL_task1c_3	Alexander Rakowski	Audio Intelligence, Samsung R&D Institute Poland, Warsaw, Poland	task-acoustic-scene-classification-results-c#Rakowski2019_t1	61.6 (61.0 - 62.2)
	Rakowski_SRPOL_task1c_4	Michał Kośmider	Audio Intelligence, Samsung R&D Institute Poland, Warsaw, Poland	task-acoustic-scene-classification-results-c#Rakowski2019_t1	64.4 (63.8 - 65.1)
	Wilkinghoff_FKIE_task1c_1	Kevin Wilkinghoff	Communication Systems, Fraunhofer Institute for Communication, Information Processing and Ergonomics, Wachtberg, Germany	task-acoustic-scene-classification-results-c#Wilkinghoff2019	61.9 (61.3 - 62.5)
	Wilkinghoff_FKIE_task1c_2	Kevin Wilkinghoff	Communication Systems, Fraunhofer Institute for Communication, Information Processing and Ergonomics, Wachtberg, Germany	task-acoustic-scene-classification-results-c#Wilkinghoff2019	62.1 (61.5 - 62.7)
	Zhu_SRCBBUPT_task1c_1	Houwei Zhu	Speech Lab, Samsung Research China-Beijing, Beijing, China	task-acoustic-scene-classification-results-c#Zhu2019	67.2 (66.6 - 67.9)
	Zhu_SRCBBUPT_task1c_2	Houwei Zhu	Speech Lab, Samsung Research China-Beijing, Beijing, China	task-acoustic-scene-classification-results-c#Zhu2019	67.4 (66.8 - 68.1)
	Zhu_SRCBBUPT_task1c_3	Houwei Zhu	Speech Lab, Samsung Research China-Beijing, Beijing, China	task-acoustic-scene-classification-results-c#Zhu2019	66.3 (65.7 - 67.0)
	Zhu_SRCBBUPT_task1c_4	Houwei Zhu	Speech Lab, Samsung Research China-Beijing, Beijing, China	task-acoustic-scene-classification-results-c#Zhu2019	67.1 (66.4 - 67.8)

Complete results and technical reports can be found at subtask C results page

Submissions

Subtask	Teams	Entries	Authors	Affiliations
Subtask A	38	98	111	40
Subtask B	10	29	25	10
Subtask C	6	19	19	8
Overall	46	146	120	44

Awards

This task will offer two awards, not necessarily based on the evaluation set performance ranking. These awards aim to encourage contestants to openly publish their code, and to use novel and problem-specific approaches which leverage knowledge of the audio domain. We also highly encourage student authorship.

Open source Award

Reproducible system award

Reproducible system award of 500 USD will be offered for the highest scoring method that is open-source and fully reproducible. For full reproducibility, the authors must provide all the information needed to run the system and achieve the reported performance. The choice of licence is left to the author, but should ideally be selected among the ones approved by the Open Source Initiative.

Judges Award

Judges’ award

Judges’ award of 500 USD will be offered for the method considered by the judges to be the most interesting or innovative. Criteria considered for this award include but are not limited to: originality, complexity, student participation, open-source, etc. Single model approaches are strongly preferred over ensembles; occasionally, small ensembles of different models can be considered, if the approach is innovative.

More information can be found on the Award page.

The awards are sponsored by

Gold sponsor						Silver sponsor

Bronze sponsors

Technical sponsor

Baseline system

The baseline system provides a simple entry-level state-of-the-art approach that gives reasonable results in the subtasks of Task 1. The baseline system is built on dcase_util toolbox.

The system has all needed functionality for the dataset handling, acoustic feature storing and accessing, acoustic model training and storing, and evaluation. The modular structure of the system enables participants to modify the system to their needs. The baseline system is a good starting point especially for the entry level researchers to familiarize themselves with the acoustic scene classification problem.

Repository

DCASE2019 Task 1 baseline, repository

System description

The baseline system implements a convolutional neural network (CNN) based approach, where log mel-band energies are first extracted for each 10-second signal, and a network consisting of two CNN layers and one fully connected layer is trained to assign scene labels to the audio signals.

The baseline system is built on dcase_util toolbox. The machine learning part of the code in built on Keras (v2.2.2), using TensorFlow (v1.9.0) as backend.

Parameters

Acoustic features

Analysis frame 40 ms (50% hop size)
Log mel-band energies (40 bands)

Neural network

Input shape: 40 * 500 (10 seconds)
Architecture:
- CNN layer #1
  - 2D Convolutional layer (filters: 32, kernel size: 7) + Batch normalization + ReLu activation
  - 2D max pooling (pool size: (5, 5)) + Dropout (rate: 30%)
- CNN layer #2
  - 2D Convolutional layer (filters: 64, kernel size: 7) + Batch normalization + ReLu activation
  - 2D max pooling (pool size: (4, 100)) + Dropout (rate: 30%)
- Flatten
- Dense layer #1
  - Dense layer (units: 100, activation: ReLu )
  - Dropout (rate: 30%)
- Output layer (activation: softmax/sigmoid)
Learning (epochs: 200, batch size: 16, data shuffling between epochs)
- Optimizer: Adam (learning rate: 0.001)
Model selection:
- Approximately 30% of the original training data is assigned to validation set, split done such that training and validation sets do not have segments from the same location and both sets have data from each city
- Model performance after each epoch is evaluated on the validation set, and best performing model is selected

For Task 1A and 1B systems, the activation function for the output layer is Softmax and decision is made based on maximum output. For Task 1C, the activation function for the output layer is Sigmoid and decision is made based on threshold value (0.5); if at least one of the class values is over the threshold, the most probable target scene class is chosen, if all values are under the threshold, unknown scene class is chosen.

Results for the development dataset

Results are calculated using TensorFlow in GPU mode (using Nvidia Titan XP GPU card). Because results produced with GPU card are generally non-deterministic, the system was trained and tested 10 times; mean and standard deviation of the performance from these 10 independent trials are shown in the results tables.

Subtask A

Scene label	Accuracy
Airport	48.4 %
Bus	62.3 %
Metro	65.1 %
Metro station	54.5 %
Park	83.1 %
Public square	40.7 %
Shopping mall	59.4 %
Street, pedestrian	60.9 %
Street, traffic	86.7 %
Tram	64.0 %
Average	62.5 % (± 0.6)

Note: The reported baseline system performance is not exactly reproducible due to varying setups. However, you should be able obtain very similar results.

Subtask B

Material from all three devices (A, B and C) are used for training amd testing. Results are calculated the same way as for subtask A, with mean and standard deviation of the performance from 10 independent trials shown in the results table.

Remember that ranking in this subtask will be done by devices B and C (third column in this table).

Scene label	Device B	Device C	Average (B,C)	Device A
Airport	18.3 %	24.1 %	21.2 %	51.2 %
Bus	40.4 %	70.0 %	55.2 %	68.0 %
Metro	50.7 %	36.1 %	43.4 %	62.4 %
Metro station	28.7%	36.1 %	30.0 %	54.4 %
Park	45.2 %	57.0 %	51.1 %	80.4 %
Public square	22.8 %	11.3 %	17.0 %	35.4 %
Shopping mall	63.5 %	64.8 %	64.2 %	64.4 %
Street, pedestrian	37.0 %	37.6 %	37.3 %	63.3 %
Street, traffic	77.0 %	86.5 %	81.8 %	85.8 %
Tram	12.0 %	12.6 %	12.3 %	52.2 %
Average	39.6 % (± 2.7)	43.1 % (± 2.2)	41.4 % (± 1.7)	61.9 % (± 0.8)

Note: The reported baseline system performance is not exactly reproducible due to varying setups. However, you should be able obtain very similar results.

Subtask C

Scene label	Accuracy
Airport	44.1 %
Bus	59.2 %
Metro	51.5 %
Metro station	41.3 %
Park	74.0 %
Public square	34.7 %
Shopping mall	50.9 %
Street, pedestrian	47.5 %
Street, traffic	78.4 %
Tram	60.7 %
Class Average	54.2 %
Unknown	43.1 %
Accuracy (Class Average \| Unknown)	48.7 % (± 3.2)

Note: The reported baseline system performance is not exactly reproducible due to varying setups. However, you should be able obtain very similar results.

Citation

If you are participating to this task or using the dataset or baseline code please cite the following paper:

Publication

Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. A multi-device dataset for urban acoustic scene classification. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018), 9–13. November 2018. URL: https://arxiv.org/abs/1807.09840.

PDF

A multi-device dataset for urban acoustic scene classification

Abstract

This paper introduces the acoustic scene classification task of DCASE 2018 Challenge and the TUT Urban Acoustic Scenes 2018 dataset provided for the task, and evaluates the performance of a baseline system in the task. As in previous years of the challenge, the task is defined for classification of short audio samples into one of predefined acoustic scene classes, using a supervised, closed-set classification setup. The newly recorded TUT Urban Acoustic Scenes 2018 dataset consists of ten different acoustic scenes and was recorded in six large European cities, therefore it has a higher acoustic variability than the previous datasets used for this task, and in addition to high-quality binaural recordings, it also includes data recorded with mobile devices. We also present the baseline system consisting of a convolutional neural network and its performance in the subtasks using the recommended cross-validation setup.

Keywords

Acoustic scene classification, DCASE challenge, public datasets, multi-device data

PDF

	Annamaria Mesaros Tampere University
	Toni Heittola Tampere University
	Tuomas Virtanen Tampere University

Coordinators

Content

Acoustic Scene Classification Subtask A

Acoustic Scene Classification with mismatched recording devices Subtask B

Open set Acoustic Scene Classification Subtask C

Description

Audio dataset

Recording procedure

Development and evaluation datasets

Reference labels

Download

Subtask A

Subtask B

Subtask C

Task setup

Subtask A

Development dataset

Evaluation dataset

Subtask B

Development dataset

Evaluation dataset

Subtask C

Development dataset

Evaluation dataset

External data resources

Submission

Public leaderboards

Datasets

Task rules

Evaluation

Ranking

Results

Subtask A

Subtask B

Subtask C

Submissions

Awards

Reproducible system award

Judges’ award

The awards are sponsored by

Baseline system

Repository

System description

Parameters

Acoustic features

Neural network

Results for the development dataset

Subtask A

Subtask B

Subtask C

Citation

A multi-device dataset for urban acoustic scene classification

Abstract

Keywords

Acoustic Scene Classification
Subtask A

Acoustic Scene Classification with mismatched recording devices
Subtask B

Open set Acoustic Scene Classification
Subtask C