The goal of this task is to evaluate systems for the detection of sound events using real data, with different types of annotations data and corresponding labels available for training.
The target of the systems is to provide the event class and the event time localization given that multiple events can be present in an audio recording (see Fig. 1). The types of data available for training are weakly-labeled or unlabeled short real audio recordings, strongly-labeled synthetic audio, and softly-labeled long real recordings.
We provide two subtasks:
Sound Event Detection with Weak Labels and Synthetic Soundscapes
The goal of the task is to evaluate systems for the detection of sound events using real data either weakly labeled or unlabeled and simulated data that is strongly labeled (with time stamps).Subtask A
Sound Event Detection with Soft Labels
The goal of this task is to evaluate systems for the detection of sound events that use softly labeled data for training in addition to other types of data such as weakly labeled, unlabeled or strongly labeled. The main focus of this subtask is to investigate whether using soft labels brings any improvement in performance.Subtask B