DCASE2019 Challenge

IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events
4 March - 30 June 2019
Challenge has ended.

Results for some tasks are ready and presented in task specific results pages:

Task 2 Task 3 Task 4 Task 5

Introduction

Sounds carry a large amount of information about our everyday environment and physical events that take place in it. We can perceive the sound scene we are within (busy street, office, etc.), and recognize individual sound sources (car passing by, footsteps, etc.). Developing signal processing methods to automatically extract this information has huge potential in several applications, for example searching for multimedia based on its audio content, making context-aware mobile devices, robots, cars etc., and intelligent monitoring systems to recognize activities in their environments using acoustic information. However, a significant amount of research is still needed to reliably recognize sound scenes and individual sound sources in realistic soundscapes, where multiple sounds are present, often simultaneously, and distorted by the environment.

Challenge status

Task Task description Development dataset Baseline system Public leaderboard Evaluation dataset Results
Task 1, Acoustic Scene Classification Released Released Released Released Released Released
Task 2, Audio tagging with noisy labels and minimal supervision Released Released Released Released Use test set Released
Task 3, Sound Event Localization and Detection Released Released Released Not used Released Released
Task 4, Sound event detection in domestic environments Released Released Released Not used Released Released
Task 5, Urban Sound Tagging Released Released Released Not used Released Released

updated 28.06.2019

Tasks

Acoustic scene classification

Scenes Task 1

The goal of acoustic scene classification is to classify a test recording into one of the predefined acoustic scene classes. This task is a continuation of the Acoustic Scene Classification task from previous DCASE Challenge editions, with some changes that bring new research problems into focus.

We provide three different setups of the acoustic classification problem:

A Match
Subtask A: Acoustic Scene Classification

Basic closed set classification, using data from a single device, high quality audio (similar to Task 1 / Subtask A in DCASE2018 Challenge). Development data and evaluation data from same device are provided.

B Mismatch
Subtask B: Acoustic Scene Classification with Mismatched Devices

Closed set classification that uses data from multiple devices (similar to Task 1 / Subtask B in DCASE2018 Challenge). Development data contains mostly data from other device than Evaluation data. The task encourages domain adaptation methods to cope with the mismatch.

C OpenSet
Subtask C: Open set Acoustic Scene Classification

New Setup in which evaluation data will also contain recordings from acoustic scenes not encountered in the training data. To limit the number of research problems, this subtask uses single device data.

The dataset for this task is an extension of TUT Urban Acoustic Scenes 2018, with recordings from more cities and acoustic scenes.

Organizers

Annamaria Mesaros

Annamaria Mesaros

Toni Heittola

Toni Heittola

Tuomas Virtanen

Tuomas Virtanen

Task description



Audio tagging with noisy labels and minimal supervision

Tags Task 2

Current machine learning techniques require large and varied datasets in order to provide good performance and generalization. However, manually labelling a dataset is time-consuming, which limits its size. Websites like Freesound or Flickr host large volumes of user-contributed audio and metadata, and labels can be inferred automatically from the metadata or using pre-trained models. Nevertheless, these automatically inferred labels might include a substantial level of noise. This task addresses how to exploit a small amount of manually-labeled data and a larger quantity of noisy web data in an audio tagging task with a large vocabulary setting. In addition, since the data comes from different sources, the task encourages domain adaptation approaches to deal with domain mismatch.

Organizers

Manoj Plakal

Manoj Plakal

Frederic Font Corbera

Frederic Font Corbera

Daniel P. W. Ellis

Daniel P. W. Ellis

Task description Results



Sound Event Localization and Detection

Localization Task 3

Given a multichannel audio input, the goal of a sound event localization and detection (SELD) method is to output all instances of the sound labels in the recording, its respective onset-offset times, and spatial locations in azimuth and elevation angles. Each individual sound event instance in the provided recordings are spatially stationary with a fixed location during their entire duration. Successful implementation of such a SELD method will enable the automatic description of the social and human activities and help machines to interact with the world more seamlessly. Specifically, SELD will enable people with hearing impairment to visualize sounds. Robots and smart video conference equipment can recognize and track the sound source of interest. Further, smart homes, smart cities, and smart industries can use SELD for audio surveillance.

Organizers

Sharath Adavanne

Sharath Adavanne

Archontis Politis

Archontis Politis

Tuomas Virtanen

Tuomas Virtanen

Task description Results



Sound event detection in domestic environments

Domestic Task 4

This task is the follow-up to DCASE 2018 task 4. The task evaluates systems for the detection of sound events using real data either weakly labeled or unlabeled and synthetic data that is strongly labeled (with time stamps). The target of the systems is to provide not only the event class but also the event time boundaries. The main scientific question this task is aiming to investigate is: do we really need real but partially and weakly annotated data or is using synthetic data sufficient? or do we need both?

Organizers

Task description Results



Urban Sound Tagging

Urban Task 5

This task evaluates systems for tagging short audio recordings with urban sound tags related to urban noise pollution. All recordings come from an acoustic sensor network deployed in New York City. The set of tags was selected based on discussions with noise officials in New York City and inspection of the city's noise code. This task aims to investigate audio tagging system performance on a relevant, real-world task given limited, unbalanced data of varying reliability.

Organizers

Mark Cartwright

Mark Cartwright

Ana Elisa Mendez Mendez

Ana Elisa Mendez Mendez

Vincent Lostanlen

Vincent Lostanlen

Justin Salamon

Justin Salamon

Juan P. Bello

Juan P. Bello

Task description Results



Awards

DCASE 2019 Challenge will offer awards for open-source and innovative methods. These awards are meant to encourage open science and reproducibility, and therefore the Reproducible system award is directly based on these criteria. In addition, through our Judges’ award we want to encourage novel and innovative approaches.

Award information

The awards are sponsored by

Gold sponsor Silver sponsor
Sonos Harman
Bronze sponsors
Cochlear.ai Oticon Sound Intelligence
Technical sponsor
Inria