DCASE Workshop 2021 Proceedings

The proceedings of the DCASE2021 Workshop have been published as an electronic publication:

The goal of Unsupervised Anomaly Detection (UAD) is to detect anomalous signals under the condition that only non-anomalous (normal) data is available beforehand. In UAD under Domain-Shift Conditions (UAD-S), data is further exposed to contextual changes that are usually unknown beforehand. Motivated by the difficulties encountered in the UAD-S task presented at the 2021 edition of the Detection and Classification of Acoustic Scenes and Events (DCASE) challenge, we visually inspect Uniform Manifold Approximation and Projections (UMAPs) for log-STFT, log-Mel and pretrained Look, Listen and Learn (L3) representations of the DCASE UAD-S dataset. In our investigation, we look for two beneficial qualities, Separability (SEP) and Discriminative Support (DSUP), and formulate several hypotheses that could facilitate diagnosis and developement of further representation and detection approaches. Particularly, we hypothesize that input length and pretraining may regulate a relevant tradeoff between SEP and DSUP. Our code as well as the resulting UMAPs and plots are publicly available.

Cites: 4 ( see at Google Scholar )

PDF

Video

MONYC: Music of New York City Dataset

Magdalena Fuentes (New York University); Danielle Zhao (New York University); Vincent Lostanlen (Cornell Lab of Ornithology); Mark Cartwright (New Jersey Institute of Technology); Charlie Mydlarz (New York University); Juan P Bello (New York University)

PDF Video

Systems based on sub-cluster AdaCos yield state-of-the-art performance on the DCASE 2020 dataset for anomalous sound detection. In contrast to the previous year, the dataset belonging to task 2 'Unsupervised Anomalous Sound Detection for Machine Condition Monitoring under Domain Shifted Conditions' of the DCASE challenge 2021 contains not only source domains with 1000 normal training samples for each machine but also so-called target domains with different acoustic conditions for which only 3 normal training samples are available. To address this additional problem, a novel anomalous sound detection system based on sub-cluster AdaCos for the DCASE challenge 2021 is presented. The system is trained to extract embeddings whose distributions are estimated in different ways for source and target domains, and utilize the resulting negative log-likelihoods as anomaly scores. In experimental evaluations, it is shown that the presented system significantly outperforms both baseline systems on the source and target domains of the development set. On the evaluation set of the challenge, the proposed system is ranked third among all 27 teams' submissions.

Cites: 10 ( see at Google Scholar )

PDF

Video

Sound Event Localization and Detection Based on Adaptive Hybrid Convolution and Multi-scale Feature Extractor

Sun Xinghao (Xinjiang University)

PDF Video

Understanding the reasons behind the predictions of deep neural networks is a pressing concern as it can be critical in several application scenarios. In this work, we present a novel interpretable model for polyphonic sound event detection. It tackles one of the limitations of our previous work, i.e. the difficulty to deal with a multi-label setting properly. The proposed architecture incorporates a prototype layer and an attention mechanism. The network learns a set of local prototypes in the latent space representing a patch in the input representation. Besides, it learns attention maps for positioning the local prototypes and reconstructing the latent space. Then, the predictions are solely based on the attention maps. Thus, the explanations provided are the attention maps and the corresponding local prototypes. Moreover, one can reconstruct the prototypes to the audio domain for inspection. The obtained results in urban sound event detection are comparable to that of two opaque baselines but with fewer parameters while offering interpretability.

Cites: 4 ( see at Google Scholar )

PDF

Video