Results for each tasks are presented in task specific results pages:Task 2 Task 3 Task 4 Task 5
Sounds carry a large amount of information about our everyday environment and physical events that take place in it. We can perceive the sound scene we are within (busy street, office, etc.), and recognize individual sound sources (car passing by, footsteps, etc.). Developing signal processing methods to automatically extract this information has huge potential in several applications, for example searching for multimedia based on its audio content, making context-aware mobile devices, robots, cars etc., and intelligent monitoring systems to recognize activities in their environments using acoustic information. However, a significant amount of research is still needed to reliably recognize sound scenes and individual sound sources in realistic soundscapes, where multiple sounds are present, often simultaneously, and distorted by the environment.
Acoustic scene classification
The goal of acoustic scene classification is to classify a test recording into one of the provided predefined classes that characterizes the environment in which it was recorded. Audio data recorded in different large european cities will provide a new challenging problem by introducing more acoustic variability for each class than the previous editions.
General-purpose audio tagging of Freesound content with AudioSet labels
The task evaluates systems for general-purpose audio tagging with an increased number of categories and using data with annotations of varying reliability. This poses the challenges of classifying sound events of very diverse nature (including musical instruments, human sounds, domestic sounds, animals, etc.) and leveraging subsets of training data with annotations of different quality levels. The data used are audio samples from Freesound organized by some categories of the AudioSet Ontology. This task will provide insight towards the development of broadly-applicable sound event classifiers that consider an increased and diverse amount of categories. These models can be used, for example, in automatic description of multimedia or acoustic monitoring applications.
Bird audio detection
Detecting bird sounds in audio is an important task for automatic wildlife monitoring, as well as in citizen science and audio library management. Bird sound detection is a very common required first step before further analysis (e.g. classification, counting), and makes it possible to conduct work with large datasets (e.g. continuous 24h monitoring) by filtering data down to regions of interest.
In order to be relevant to a wide variety of sound monitoring applications, and accessible to a wide range of methods, the bird detection task is deliberately simplified to a binary classification paradigm: within each ten-second time region, are there any birds present?
The major challenge in this task is generalisation. In real applications, the deployment conditions do not match the "training" conditions, and sound analysis algorithms should be able to handle this scenario in order to be practically useful. Hence, we provide development datasets recorded in different parts of the world, and we will use testing data from outdoor monitoring scenarios which do not match the development data. The challenge is to develop an algorithm which inherently generalises well, or which can self-adapt to the new conditions.
Large-scale weakly labeled semi-supervised sound event detection in domestic environments
The task evaluates systems for the large-scale detection of sound events using weakly labeled data. The challenge is to explore the possibility to exploit a large amount of unbalanced and unlabelled training data together with a small weakly annotated training set to improve system performance. The data are YouTube video excerpts focusing on domestic context which could be used for example in ambient assisted living applications. The domain was chosen due to the scientific challenges (wide variety of sounds, time-localized events...) and potential industrial applications.
Monitoring of domestic activities based on multi-channel acoustics
There is a rising interest in smart environments that enhance the quality of live for humans in terms of e.g. safety, security, comfort, and home care. In order to have smart functionality, situational awareness is required, which might be obtained by interpreting a multitude of sensing modalities including acoustics. The latter is already used in vocal assistants such as Google Home, Apple HomePod, and Amazon Echo. While these devices focus on speech, they could be extended to identify domestic activities carried out by humans. In the literature, this recognition of activities based on acoustics is already touched upon. Yet, the acoustic models are typically based on single channel and single location recordings. In this task, it is investigated to which extend multi-channel acoustic recordings are beneficial for the purpose of detecting domestic activities.
OrganizersTask description Results