Challenge results and analysis of submitted systems have been published in:
A. Mesaros, A. Diment, B. Elizalde, T. Heittola, E. Vincent, B. Raj, and T. Virtanen. Sound event detection in the DCASE 2017 challenge. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019. In press. doi:10.1109/TASLP.2019.2907016.
Sound event detection in the DCASE 2017 Challenge
Each edition of the challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) contained several tasks involving sound event detection in different setups. DCASE 2017 presented participants with three such tasks, each having specific datasets and detection requirements: Task 2, in which target sound events were very rare in both training and testing data, Task 3 having overlapping events annotated in real-life audio, and Task 4, in which only weakly-labeled data was available for training. In this paper, we present the three tasks, including the datasets and baseline systems, and analyze the challenge entries for each task. We observe the popularity of methods using deep neural networks, and the still widely used mel frequency based representations, with only few approaches standing out as radically different. Analysis of the systems behavior reveals that task-specific optimization has a big role in producing good performance; however, often this optimization closely follows the ranking metric, and its maximization/minimization does not result in universally good performance. We also introduce the calculation of confidence intervals based on a jackknife resampling procedure, to perform statistical analysis of the challenge results. The analysis indicates that while the 95% confidence intervals for many systems overlap, there are significant difference in performance between the top systems and the baseline for all tasks.
Event detection;Task analysis;Training;Acoustics;Speech processing;Glass;Hidden Markov models;Sound event detection;weak labels;pattern recognition;jackknife estimates;confidence intervals
Results for each tasks are presented in task specific results pages:
Sounds carry a large amount of information about our everyday environment and physical events that take place in it. We can perceive the sound scene we are within (busy street, office, etc.), and recognize individual sound sources (car passing by, footsteps, etc.). Developing signal processing methods to automatically extract this information has huge potential in several applications, for example searching for multimedia based on its audio content, making context-aware mobile devices, robots, cars etc., and intelligent monitoring systems to recognize activities in their environments using acoustic information. However, a significant amount of research is still needed to reliably recognize sound scenes and individual sound sources in realistic soundscapes, where multiple sounds are present, often simultaneously, and distorted by the environment.
Acoustic scene classification
The goal of acoustic scene classification is to classify a test recording into one of predefined classes that characterizes the environment in which it was recorded -- for example "park", "street", "office". The acoustic data will include recordings from 15 contexts, approximately one hour of data from each context. The setup is similar to the previous DCASE challenge, but with a higher number of classes and diversity of data.Task description Results
Detection of rare sound events
This task will focus on detection of rare sound events in artificially created mixtures. This specific use of data will allow creating mixtures of everyday audio and sound events of interest at different event-to-background ratio, providing a larger amount of training conditions than would be available in real recordings.Task description Results
Sound event detection in real life audio
The third task will use training and testing material recorded in real life environments. This task evaluates performance of the sound event detection systems in multisource conditions similar to our everyday life, where the sound sources are rarely heard in isolation. In this case, there is no control over the number of overlapping sound events at each time, not in the training nor the testing audio data. The annotations of event activities are done manually, and can therefore be somewhat subjective.Task description Results
Large-scale weakly supervised sound event detection for smart cars
The task evaluates systems for the large-scale detection of sound events using weakly labeled training data. The data are web video excerpts focusing on transportation and warnings due to their industry relevance and to the underuse of audio in this context.Task description Results
For each challenge task, a development dataset and baseline system will be provided. Challenge evaluation will be done using an evaluation dataset that will be published shortly before the deadline. Task-specific rules are available on the task pages.