It has been brought to our attention that the Task 3 reference annotations do not always correspond to audible sound events. To address this issue, we implemented a verification of the data as follows.
The selected sound classes for the task are: brakes squeaking, car, children, large vehicle, people speaking, and people walking. These contain sound events mapped based on the sound source, for example “car” contains sounds annotated as "car passing by", "car engine running", "car idling", etc, "large vehicle" contains sounds produced by buses and trucks, "children" contains sounds annotated as "children yelling" and " children talking", etc.
Three persons (other than the annotator) listened to each audio segment annotated as belonging to one of the six mapped classes, marking agreement about the presence of the indicated sound within the segment. Agreement/disagreement did not take into account the sound event onset and offset, only the presence of the sound event within the annotated segment. Event instances that were confirmed by at least one person were kept, resulting in elimination of about 10% of the original event instances in the development set. Similar verification of the upcoming evaluation set was done and resulted in elimination of 10% of original event instances.
A new version of TUT Sound Events 2017 was released (version 2), containing the new metadata (the reduced set of annotated sound events). This will be used officially for the challenge. The original metadata file is still available in the same package as a separate folder.
Please make sure you are using version 2 of the dataset as this contains the verified annotations.
Baseline system performance has been updated with the new version.