Abstract
Computational Paralinguistics and in particular speech analysis have been featured in competitive data challenges across the international conferences ACII, ACM Multimedia, ICMI, ICML, Interspeech, NeurIPS, and beyond in foundation-laying series such as AVEC, ComParE, MuSe, or HEAR (co-)organised by the presenter over the last decade and a half. Here, a perspective talk based on the outcomes of these events is presented to the DCASE community, given how both fields have grown considerably in the recent years and open data, public benchmarks, and data challenges have had an important role in the development of both fields. A key aim is to identify significant differences in the approaches, to spark ideas across the tasks involved. To this end, the challenges will be presented in a nutshell including the field’s move from expert to deep representations and ultimately foundation models. In particular, insights on the most competitive approaches will be distilled based on the results of the participant field. On a final note, the talk will lend a potential future perspective on acoustic scenes and event analysis in a “paralinguistic” style.
More info