DCASE2021 Challenge

IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events
1 March - 1 July 2021

Introduction

Sounds carry a large amount of information about our everyday environment and physical events that take place in it. We can perceive the sound scene we are within (busy street, office, etc.), and recognize individual sound sources (car passing by, footsteps, etc.). Developing signal processing methods to automatically extract this information has huge potential in several applications, for example searching for multimedia based on its audio content, making context-aware mobile devices, robots, cars etc., and intelligent monitoring systems to recognize activities in their environments using acoustic information. However, a significant amount of research is still needed to reliably recognize sound scenes and individual sound sources in realistic soundscapes, where multiple sounds are present, often simultaneously, and distorted by the environment.

Challenge status

Task Task description Development dataset Baseline system Evaluation dataset Results
Task 1, Acoustic Scene Classification Released Released Released Released Released
Task 2, Unsupervised Anomalous Sound Detection for Machine Condition Monitoring under Domain Shifted Conditions Released Released Released Released Released
Task 3, Sound Event Localization and Detection with Directional Interference Released Released Released Released Released
Task 4, Sound Event Detection and Separation in Domestic Environments Released Released Released Released Released
Task 5, Few-shot Bioacoustic Event Detection Released Released Released Released Released
Task 6, Automated Audio Captioning Released Released Released Released Released

updated 2021/07/05

Tasks

Acoustic scene classification

Scenes Task 1

The goal of acoustic scene classification is to classify a test recording into one of the predefined ten acoustic scene classes. This task is a continuation of the Acoustic Scene Classification task from previous DCASE Challenge editions, with some changes that bring new research problems into focus. This task is a follow up to DCASE 2020 Task 1.

We provide two different setups of the acoustic classification problem:

A Complexity
Subtask A: Low-Complexity Acoustic Scene Classification with Multiple Devices

Classification of data from multiple devices (real and simulated) targeting generalization properties of systems across a number of different devices while focusing on low-complexity solutions.

B Modality
Subtask B: Audio-Visual Scene Classification

This subtask provides scene data with audio and video material to allow learning complementary information from a different modality. There are no restrictions on the modality or combinations of modalities used for the systems. This task is for machine learning enthusiasts that are interested in development of complex methods without the limitations or specific problems from Subtask A.

Organizers

Annamaria Mesaros

Annamaria Mesaros

Irene Martin Morato

Irene Martin Morato

Shanshan Wang

Shanshan Wang

Toni Heittola

Toni Heittola

Tuomas Virtanen

Tuomas Virtanen

Task description



Unsupervised Anomalous Sound Detection for Machine Condition Monitoring under Domain Shifted Conditions

Monitoring Task 2

The scope of this task is to identify whether the sound emitted from a target machine is normal or anomalous via an anomaly detector trained using only normal sound data. The main difference from the DCASE 2020 Task 2 is that the participants have to solve the domain shift problem, i.e., the condition where the acoustic characteristics of the training and test data are different.

Organizers

Yohei Kawaguchi

Yohei Kawaguchi

Hitachi, Ltd.

Keisuke Imoto

Keisuke Imoto

Doshisha University

Yuma Koizumi

Yuma Koizumi

Google, Inc.

Noboru Harada

Noboru Harada

Daisuke Niizumi

Daisuke Niizumi

Kota Dohi

Kota Dohi

Hitachi, Ltd.

Ryo Tanabe

Ryo Tanabe

Hitachi, Ltd.

Harsh Purohit

Harsh Purohit

Hitachi, Ltd.

Takashi Endo

Takashi Endo

Hitachi, Ltd.

Task description Results



Sound Event Localization and Detection with Directional Interference

Localization Task 3

The scope of this task is temporal detection, classification, and simultaneous localization of sound activity of interest, emitted by sound sources under real reverberant conditions and under both static and dynamic scenarios. The main difference from the previous year’s task is the introduction of directional (localised) interference from unknown sound types, in conjunction with realistic spatial ambient noise. This task is a follow up to DCASE 2020 Task 3.

Organizers

Archontis Politis

Archontis Politis

Antoine Deleforge

Antoine Deleforge

Sharath Adavanne

Sharath Adavanne

Prerak Srivastava

Prerak Srivastava

Daniel Krause

Daniel Krause

Tuomas Virtanen

Tuomas Virtanen

Task description Results



Sound Event Detection and Separation in Domestic Environments

Domestic Task 4

The task evaluates systems for the detection of sound events using weakly labeled data (without timestamps). The target of the systems is to provide not only the event class but also the event time boundaries given that multiple events can be present in an audio recording. This year, we also encourage participants to propose systems that use source separation jointly with sound event detection. This task aims to investigate how we can optimally exploit synthetic data and to what extent can source separation improve sound event detection, and vice-versa? This task is a follow up to DCASE 2020 Task 4.

Organizers

Nicolas Turpault

Nicolas Turpault

Francesca Ronchini

Francesca Ronchini

Scott Wisdom

Scott Wisdom

Hakan Erdogan

Hakan Erdogan

John Hershey

John Hershey

Justin Salamon

Justin Salamon

Prem Seetharaman

Prem Seetharaman

Daniel P. W. Ellis

Daniel P. W. Ellis

Task description Results



Few-shot Bioacoustic Event Detection

Bio Task 5

This challenge focuses on sound event detection in a few-shot learning setting for animal (mammal and bird) vocalisations. Participants will be expected to create a method that can extract information from five exemplar vocalisations (shots) of mammals or birds and detect and classify sounds in field recordings. The main objective is to find reliable algorithms that are capable of dealing with data sparsity, class imbalance, and noisy/busy environments.

Organizers

Task description Results



Automated Audio Captioning

Caption Task 6

Automated audio captioning is the task of general audio content description using free text. It is an intermodal translation task (not speech-to-text), where a system accepts as an input an audio signal and outputs the textual description (i.e. the caption) of that signal. Audio captioning methods can model concepts (e.g. "muffled sound"), physical properties of objects and environment (e.g. "the sound of a big car", "people talking in a small and empty room"), and high level knowledge ("a clock rings three times"). This modeling can be used in various applications, ranging from automatic content description to intelligent and content oriented machine-to-machine interaction. This task is a follow up to DCASE 2020 Task 6.

Organizers

Konstantinos Drossos

Konstantinos Drossos

Samuel Lipping

Samuel Lipping

Tuomas Virtanen

Tuomas Virtanen

Task description Results