Introduction
Sounds carry a large amount of information about our everyday environment and physical events that take place in it. We can perceive the sound scene we are within (busy street, office, etc.), and recognize individual sound sources (car passing by, footsteps, etc.). Developing signal processing methods to automatically extract this information has huge potential in several applications, for example searching for multimedia based on its audio content, making context-aware mobile devices, robots, cars etc., and intelligent monitoring systems to recognize activities in their environments using acoustic information. However, a significant amount of research is still needed to reliably recognize sound scenes and individual sound sources in realistic soundscapes, where multiple sounds are present, often simultaneously, and distorted by the environment.
Challenge status
Task | Task description | Development dataset | Baseline system | Evaluation dataset | Results |
---|---|---|---|---|---|
Task 1, Low-Complexity Acoustic Scene Classification | Released | Released | Released | Released | Released |
Task 2, Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques | Released | Released | Released | Released | Released |
Task 3, Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes | Released | Released | Released | Released | Released |
Task 4, Sound Event Detection in Domestic Environments | Released | Released | Released | Released | Released |
Task 5, Few-shot Bioacoustic Event Detection | Released | Released | Released | Released | Released |
Task 6, Automated Audio Captioning and Language-Based Audio Retrieval | Released | Released | Released | Released | Released |
updated 2022/07/04
Tasks
Low-Complexity Acoustic Scene Classification
The task targets acoustic scene classification with devices with low computational and memory allowance, which impose certain limits on the model complexity. This task is a follow-up of previous years Low Complexity Acoustic Scene Classification with Multiple Devices, with updated requirements and method for method complexity measurements, namely a new way of taking into account the model parameters and the addition of multiply-accumulate operation count (MACs).
Organizers
Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques
The goal of this task is to identify whether a machine is normal or anomalous using only normal sound data under domain shifted conditions. The main difference from DCASE 2021 Task 2 is that the domain (source domain / target domain ) of data are not available during evaluation. Therefore, the participants are expected to develop domain generalization techniques in which the output anomaly scores are not affected by the domain shifts.
Organizers
Kota Dohi
Hitachi, Ltd.
Tomoya Nishida
Hitachi, Ltd.
Harsh Purohit
Hitachi, Ltd.
Takashi Endo
Hitachi, Ltd.
Masaaki Yamamoto
Hitachi, Ltd.
Yohei Kawaguchi
Hitachi, Ltd.
Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes
Given multichannel audio input, a SELD system aims to provide spatiotemporal trajectories of detected sound events along with information of the class/type of those events, from a set of predefined target classes. This year the task will be evaluated using real annotated scene recordings, while development will be based on a similar small development set and potential use of external data.
Organizers
Sound Event Detection in Domestic Environments
This task evaluates systems for the detection of sound events using real data, either weakly labeled or unlabeled and simulated data that is strongly labeled (with time stamps). The target of the systems is to provide not only the event class but also the event time boundaries.The aim is to investigate what is the most efficient way to exploit different sources of data to train a sound event detection system.
Organizers
Few-shot Bioacoustic Event Detection
This task focuses on sound event detection in a few-shot learning setting for animal (mammal and bird) vocalisations. Participants will be expected to create a method that can extract information from five exemplar vocalisations (shots) of mammals or birds and detect and classify sounds in field recordings. Each recording can have multiple types of calls or species present in it, as well as background noise, however only the label of interest needs to be detected.
Organizers
Helen Whitehead
University of Salford
Joe Morford
University of Oxford
Michael Emmerson
Queen Mary University of London
Automated Audio Captioning and Language-Based Audio Retrieval
This task approaches the problem of analysis of audio signals by using natural language to represent rich characteristics of audio signals.
This task is a continuation of Task 6 at DCASE 2020 and 2021 Challenges and focuses on the research question “How can we make machines understand higher level and human-perceived information from general sounds?”.
The goal of this task is to evaluate methods where a retrieval system takes a free-form textual description as an input and is supposed to rank audio signals in a fixed dataset based on their match to the given description.