DCASE2022 Challenge

Challenge on Detection and Classification of Acoustic Scenes and Events
15 March - 1 July 2022

Introduction

Sounds carry a large amount of information about our everyday environment and physical events that take place in it. We can perceive the sound scene we are within (busy street, office, etc.), and recognize individual sound sources (car passing by, footsteps, etc.). Developing signal processing methods to automatically extract this information has huge potential in several applications, for example searching for multimedia based on its audio content, making context-aware mobile devices, robots, cars etc., and intelligent monitoring systems to recognize activities in their environments using acoustic information. However, a significant amount of research is still needed to reliably recognize sound scenes and individual sound sources in realistic soundscapes, where multiple sounds are present, often simultaneously, and distorted by the environment.

Challenge status

Task Task description Development dataset Baseline system Evaluation dataset Results
Task 1, Low-Complexity Acoustic Scene Classification Released Released Released Released TBA
Task 2, Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques Released Released Released Released TBA
Task 3, Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes Released Released Released Released TBA
Task 4, Sound Event Detection in Domestic Environments Released Released Released Released TBA
Task 5, Few-shot Bioacoustic Event Detection Released Released Released Released TBA
Task 6, Automated Audio Captioning and Language-Based Audio Retrieval Released Released Released Released TBA

updated 2022/06/01

Tasks

Low-Complexity Acoustic Scene Classification

Scenes Task 1

The task targets acoustic scene classification with devices with low computational and memory allowance, which impose certain limits on the model complexity. This task is a follow-up of previous years Low Complexity Acoustic Scene Classification with Multiple Devices, with updated requirements and method for method complexity measurements, namely a new way of taking into account the model parameters and the addition of multiply-accumulate operation count (MACs).

Organizers

Annamaria Mesaros

Annamaria Mesaros

Irene Martin Morato

Irene Martin Morato

Francesco Paissan

Francesco Paissan

Alberto Ancilotto

Alberto Ancilotto

Elisabetta Farella

Elisabetta Farella

Toni Heittola

Toni Heittola

Tuomas Virtanen

Tuomas Virtanen

Task description



Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques

Monitoring Task 2

The goal of this task is to identify whether a machine is normal or anomalous using only normal sound data under domain shifted conditions. The main difference from DCASE 2021 Task 2 is that the domain (source domain / target domain ) of data are not available during evaluation. Therefore, the participants are expected to develop domain generalization techniques in which the output anomaly scores are not affected by the domain shifts.

Organizers

Kota Dohi

Kota Dohi

Hitachi, Ltd.

Keisuke Imoto

Keisuke Imoto

Doshisha University

Yuma Koizumi

Yuma Koizumi

Google, Inc.

Noboru Harada

Noboru Harada

Daisuke Niizumi

Daisuke Niizumi

Tomoya Nishida

Tomoya Nishida

Hitachi, Ltd.

Harsh Purohit

Harsh Purohit

Hitachi, Ltd.

Takashi Endo

Takashi Endo

Hitachi, Ltd.

Masaaki Yamamoto

Masaaki Yamamoto

Hitachi, Ltd.

Yohei Kawaguchi

Yohei Kawaguchi

Hitachi, Ltd.

Task description



Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes

Localization Task 3

Given multichannel audio input, a SELD system aims to provide spatiotemporal trajectories of detected sound events along with information of the class/type of those events, from a set of predefined target classes. This year the task will be evaluated using real annotated scene recordings, while development will be based on a similar small development set and potential use of external data.

Organizers

Archontis Politis

Archontis Politis

Yuki Mitsufuji

Yuki Mitsufuji

Kazuki Shimada

Kazuki Shimada

Tuomas Virtanen

Tuomas Virtanen

Sharath Adavanne

Sharath Adavanne

Parthasaarathy Sudarsanam

Parthasaarathy Sudarsanam

Daniel Krause

Daniel Krause

Naoya Takahashi

Naoya Takahashi

Shusuke Takahashi

Shusuke Takahashi

Yuichiro Koyama

Yuichiro Koyama

Task description



Sound Event Detection in Domestic Environments

Domestic Task 4

This task evaluates systems for the detection of sound events using real data, either weakly labeled or unlabeled and simulated data that is strongly labeled (with time stamps). The target of the systems is to provide not only the event class but also the event time boundaries.The aim is to investigate what is the most efficient way to exploit different sources of data to train a sound event detection system.

Organizers

Francesca Ronchini

Francesca Ronchini

Nicolas Turpault

Nicolas Turpault

Eduardo Fonseca

Eduardo Fonseca

Daniel P. W. Ellis

Daniel P. W. Ellis

Task description



Few-shot Bioacoustic Event Detection

Bio Task 5

This task focuses on sound event detection in a few-shot learning setting for animal (mammal and bird) vocalisations. Participants will be expected to create a method that can extract information from five exemplar vocalisations (shots) of mammals or birds and detect and classify sounds in field recordings. Each recording can have multiple types of calls or species present in it, as well as background noise, however only the label of interest needs to be detected.

Organizers

Ester Vidana Vila

Ester Vidana Vila

La Salle, Universitat Ramon Llull

Helen Whitehead

Helen Whitehead

University of Salford

Frants Jensen

Frants Jensen

Syracuse University

Joe Morford

Joe Morford

University of Oxford

Michael Emmerson

Michael Emmerson

Queen Mary University of London

Dan Stowell

Dan Stowell

Task description



Automated Audio Captioning and Language-Based Audio Retrieval

Caption Task 6

This task approaches the problem of analysis of audio signals by using natural language to represent rich characteristics of audio signals.

A Captioning
Subtask A: Automated Audio Captioning

This task is a continuation of Task 6 at DCASE 2020 and 2021 Challenges and focuses on the research question “How can we make machines understand higher level and human-perceived information from general sounds?”.

B Retrieval
Subtask B: Language-Based Audio Retrieval

The goal of this task is to evaluate methods where a retrieval system takes a free-form textual description as an input and is supposed to rank audio signals in a fixed dataset based on their match to the given description.

Organizers

Felix Gontier

Felix Gontier

INRIA

Samuel Lipping

Samuel Lipping

Konstantinos Drossos

Konstantinos Drossos

Tuomas Virtanen

Tuomas Virtanen