Introduction

Sounds carry a large amount of information about our everyday environment and physical events that take place in it. We can perceive the sound scene we are within (busy street, office, etc.), and recognize individual sound sources (car passing by, footsteps, etc.). Developing signal processing methods to automatically extract this information has huge potential in several applications, for example searching for multimedia based on its audio content, making context-aware mobile devices, robots, cars etc., and intelligent monitoring systems to recognize activities in their environments using acoustic information. However, a significant amount of research is still needed to reliably recognize sound scenes and individual sound sources in realistic soundscapes, where multiple sounds are present, often simultaneously, and distorted by the environment.

Challenge status

Task	Task description	Development dataset	Baseline system	Evaluation dataset	Results
Task 1, Low-Complexity Acoustic Scene Classification	Released	Released	Released	Released	Released
Task 2, Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques	Released	Released	Released	Released	Released
Task 3, Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes	Released	Released	Released	Released	Released
Task 4, Sound Event Detection in Domestic Environments	Released	Released	Released	Released	Released
Task 5, Few-shot Bioacoustic Event Detection	Released	Released	Released	Released	Released
Task 6, Automated Audio Captioning and Language-Based Audio Retrieval	Released	Released	Released	Released	Released

updated 2022/07/04

Tasks

Low-Complexity Acoustic Scene Classification

Scenes Task 1

The task targets acoustic scene classification with devices with low computational and memory allowance, which impose certain limits on the model complexity. This task is a follow-up of previous years Low Complexity Acoustic Scene Classification with Multiple Devices, with updated requirements and method for method complexity measurements, namely a new way of taking into account the model parameters and the addition of multiply-accumulate operation count (MACs).

Organizers

Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques

Monitoring Task 2

The goal of this task is to identify whether a machine is normal or anomalous using only normal sound data under domain shifted conditions. The main difference from DCASE 2021 Task 2 is that the domain (source domain / target domain ) of data are not available during evaluation. Therefore, the participants are expected to develop domain generalization techniques in which the output anomaly scores are not affected by the domain shifts.

Organizers

Kota Dohi

Hitachi, Ltd.

Keisuke Imoto

Doshisha University

Yuma Koizumi

Google, Inc.

Noboru Harada

NTT Corporation

Daisuke Niizumi

NTT Corporation

Tomoya Nishida

Hitachi, Ltd.

Harsh Purohit

Hitachi, Ltd.

Takashi Endo

Hitachi, Ltd.

Masaaki Yamamoto

Hitachi, Ltd.

Yohei Kawaguchi

Hitachi, Ltd.

Task description Results

Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes

Localization Task 3

Given multichannel audio input, a SELD system aims to provide spatiotemporal trajectories of detected sound events along with information of the class/type of those events, from a set of predefined target classes. This year the task will be evaluated using real annotated scene recordings, while development will be based on a similar small development set and potential use of external data.

Organizers

Archontis Politis

Tampere University

Yuki Mitsufuji

SONY

Kazuki Shimada

SONY

Tuomas Virtanen

Tampere University

Sharath Adavanne

Tampere University

Parthasaarathy Sudarsanam

Tampere University

Daniel Krause

Tampere University

Naoya Takahashi

SONY

Shusuke Takahashi

SONY

Yuichiro Koyama

SONY

Task description Results

Sound Event Detection in Domestic Environments

Domestic Task 4

This task evaluates systems for the detection of sound events using real data, either weakly labeled or unlabeled and simulated data that is strongly labeled (with time stamps). The target of the systems is to provide not only the event class but also the event time boundaries.The aim is to investigate what is the most efficient way to exploit different sources of data to train a sound event detection system.

Organizers

Romain Serizel

University of Lorraine

Francesca Ronchini

University of Lorraine

Nicolas Turpault

Inria Nancy Grand-Est

Samuele Cornell

Università Politecnica delle Marche,

Eduardo Fonseca

Google, Inc.

Daniel P. W. Ellis

Google, Inc.

Task description Results

Few-shot Bioacoustic Event Detection

Bio Task 5

This task focuses on sound event detection in a few-shot learning setting for animal (mammal and bird) vocalisations. Participants will be expected to create a method that can extract information from five exemplar vocalisations (shots) of mammals or birds and detect and classify sounds in field recordings. Each recording can have multiple types of calls or species present in it, as well as background noise, however only the label of interest needs to be detected.

Organizers

Ines Nolasco

Queen Mary University of London

Shubhr Singh

Queen Mary University of London

Vincent Lostanlen

Centre National de la Recherche Scientifique(CNRS)
Laboratoire des Sciences du Numérique de Nantes (LS2N)

Ariana Strandburg-Peshkin

University of Konstanz
Max Planck Institute of Animal Behavior

Lisa Gill

BIOTOPIA Naturkundemuseum Bayern

Hanna Pamula

AGH University of Science and Technology

Ester Vidana Vila

La Salle, Universitat Ramon Llull

Helen Whitehead

University of Salford

Ivan Kiskin

University of Surrey

Frants Jensen

Syracuse University

Joe Morford

University of Oxford

Michael Emmerson

Queen Mary University of London

Veronica Morfi

Queen Mary University of London

Dan Stowell

Tilburg University

Task description Results

Automated Audio Captioning and Language-Based Audio Retrieval

Caption Task 6

This task approaches the problem of analysis of audio signals by using natural language to represent rich characteristics of audio signals.

A Captioning

Subtask A: Automated Audio Captioning

This task is a continuation of Task 6 at DCASE 2020 and 2021 Challenges and focuses on the research question “How can we make machines understand higher level and human-perceived information from general sounds?”.

B Retrieval

Subtask B: Language-Based Audio Retrieval

The goal of this task is to evaluate methods where a retrieval system takes a free-form textual description as an input and is supposed to rank audio signals in a fixed dataset based on their match to the given description.

Organizers

Huang Xie

Tampere University

Felix Gontier

INRIA

Samuel Lipping

Tampere University

Konstantinos Drossos

Tampere University

Tuomas Virtanen

Tampere University

Romain Serizel

University of Lorraine

Task description Subtask A Subtask B

Subtask A results Subtask B results

Schedule

15 Mar 2022

Challenge launch

01 Jun 2022

Release of evaluation datasets

15 Jun 2022

Challenge deadline

01 Jul 2022

Challenge results

Contact

Recent news

DCASE2022 Challenge results published

DCASE2022 Challenge received 410 submission entries

DCASE2022 Challenge evaluation datasets available

Content

Introduction

Challenge status

Tasks

Low-Complexity Acoustic Scene Classification

Organizers

Annamaria Mesaros

Irene Martin Morato

Francesco Paissan

Alberto Ancilotto

Elisabetta Farella

Alessio Brutti

Toni Heittola

Tuomas Virtanen

Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques

Organizers

Kota Dohi

Keisuke Imoto

Yuma Koizumi

Noboru Harada

Daisuke Niizumi

Tomoya Nishida

Harsh Purohit

Takashi Endo

Masaaki Yamamoto

Yohei Kawaguchi

Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes

Organizers

Archontis Politis

Yuki Mitsufuji

Kazuki Shimada

Tuomas Virtanen

Sharath Adavanne

Parthasaarathy Sudarsanam

Daniel Krause

Naoya Takahashi

Shusuke Takahashi

Yuichiro Koyama

Sound Event Detection in Domestic Environments

Organizers

Romain Serizel

Francesca Ronchini

Nicolas Turpault

Samuele Cornell

Eduardo Fonseca

Daniel P. W. Ellis

Few-shot Bioacoustic Event Detection

Organizers

Ines Nolasco

Shubhr Singh

Vincent Lostanlen

Ariana Strandburg-Peshkin

Lisa Gill

Hanna Pamula

Ester Vidana Vila

Helen Whitehead

Ivan Kiskin

Frants Jensen

Joe Morford

Michael Emmerson

Veronica Morfi

Dan Stowell

Automated Audio Captioning and Language-Based Audio Retrieval

Organizers

Huang Xie

Felix Gontier

DCASE2022 Challenge evaluation
datasets available