DCASE2016 Workshop

Workshop on Detection and Classification of Acoustic Scenes and Events
3rd of September 2016, Budapest, Hungary


The workshop aims to provide a venue for researchers working on computational analysis of sound events and scene analysis to present and discuss their results. We aim to bring together researchers from many different universities and companies with interest in the topic, and provide the opportunity for scientific exchange of ideas and opinions.

The technical program will include invited speakers on the topic of computational everyday sound analysis and recognition, and oral and poster presentations of accepted papers. The workshop is organized as a satellite event to the 2016 European Signal Processing Conference (EUSIPCO), taking place in Budapest, Hungary. The workshop is held on Saturday 3rd of September, as EUSIPCO is ending on Friday, allowing the EUSIPCO participants to easily take part to the workshop.


We invite submissions on the topics of computational analysis of acoustic scenes and sound events. Topics of interest include:

  • Acoustic scene classification
  • Sound event detection
  • Environmental sound recognition
  • Signal processing methods for environmental audio scene analysis
  • Machine learning methods for environmental sound analysis
  • Computational auditory scene analysis

The results of the Detection and Classification of Acoustic Scenes and Events (DCASE) challenge 2016 will also be announced at the workshop.


Full technical program

9:00 Registration
9:30 Welcome
9:40 Keynote

Acoustic Scene and Events Recognition: How Similar is it to Speech Recognition and Music Genre Recognition?

Gaël Richard

Télécom ParisTech

10:30 DCASE2016

DCASE challenge: Philosophy, tasks and results

Mark Plumbley1, Tuomas Virtanen2

1 University of Surrey, United Kingdom 2 Tampere University of Technology, Finland

11:10 Coffee
11:30 Presentations

Oral presentations of workshop papers (5 x 15 min)

13:00 Lunch
14:00 Keynote

Audio Event Recognition: Pathways to Impact

Sacha Krstulović

Audio Analytic

14:40 Posters

Posters of workshop papers and DCASE2016 challenge results

15:30 Coffee
16:10 Panel discussion

Gaël Richard1, Sacha Krstulović2, Jürgen Geiger3, and Stefan Goetze4

1 Télécom ParisTech, France 2 Audio Analytic 3 Huawei Technologies 4 Fraunhofer IDMT

Moderator: Mark Plumbley

16:50 Closing remarks
Social program


Acoustic Scene and Events Recognition: How Similar is it to Speech Recognition and Music Genre Recognition?

Gaël Richard
9:40 - 10:30

Acoustic scene classification and sound events recognition are receiving a growing interest fueled by the number of potential applications: smart hearing aids, indexing, sound retrieval, predictive maintenance, bioacoustics, environment robust speech recognition, elderly assistance, security. The emergence of this new domain is however rather recent especially compared to other domains of “audio signal analysis” such as speech/speaker recognition or even Music Information Retrieval. As a consequence a number of approaches for ASC and SER was directly derived from established speech recognition techniques or specific music genre or music instrument recognition but is this really justified? This talk will discuss similarities and specificities of the three problems, discuss some of the human performance in the different tasks and propose some perspectives for the domain of Acoustic Scene Classification and sound events recognition.


Gaël Richard received the State Engineering degree from Telecom ParisTech, France (formerly ENST) in 1990, the Ph.D. degree from LIMSI-CNRS, University of Paris-XI, in 1994 in speech synthesis, and the Habilitation à Diriger des Recherches degree from the University of Paris XI in September 2001. After the Ph.D. degree , he spent two years at the CAIP Center, Rutgers University, Piscataway, NJ, in the Speech Processing Group of Prof. J. Flanagan, where he explored innovative approaches for speech production. From 1997 to 2001, he successively worked for Matra, Bois d’Arcy, France, and for Philips, Montrouge, France. In particular, he was the Project Manager of several large scale European projects in the field of audio and multimodal signal processing. In September 2001, he joined Telecom ParisTech, where he is now a Full Professor in audio signal processing and Head of the Signal and Image processing department. He is a coauthor of over 200 papers and inventor in a number of patents. He was an Associate Editor of the IEEE Transactions on Audio, Speech and Language Processing (ASLP) between 1997 and 2011, one of the guest editors of the special issue on “Music Signal Processing” of IEEE Journal on Selected Topics in Signal Processing (2011) and is currently the lead guest editor of a Special issue on Sound Scene and Event Analysis, for IEEE Transactions on ASLP. He currently is a member of the IEEE Audio and Acoustic Signal Processing Technical Committee, member of the EURASIP SAT on Acoustics, Sound and Music Signal Processing, member of AES and senior member of the IEEE.


Gaël Richard

Head of Signal and Image Processing

Télécom ParisTech

Audio Event Recognition: Pathways to Impact

Sacha Krstulović
14:00 - 14:40

This talk explores the relationship between academic research and industrial applications in the field of Audio Event Recognition (AER), and in particular how they can mutually inform each other in order to achieve social and economic impact. The talk starts with a panorama of the Smart Home market, a branch of the Internet of Things, where AER enables a variety of innovative applications. Then it reviews the practical attributes of these applications, which are dealing with indoor sounds, imperfect consumer electronic microphones, limited computational power and 24/7 exposure of the system to a quasi-random variety of sounds. These attributes, in turn, may help research identifying impactful topics such as, e.g., data collection, robustness, computational cost versus sound recognition performance, or system evaluation. The talk concludes by illustrating how fruitful collaborations can be built between industry and academia to help solving such relevant technological challenges in the field of AER.


Dr Sacha Krstulović was born in France and received his engineering degree from the ESTACA (Levallois-Perret, France) in 1996. He received his PhD from the EPFL (Lausanne, Switzerland) in 2001, for research focusing on introducing articulatory constraints in acoustic speech modelling. He then joined the IRISA in France until 2006 as a Research Engineer in the field of Automatic Speaker Recognition and Sparse Signal Modelling, then the DFKI in Germany until 2007, where he contributed to building one of the first HMM-based speech synthesizers for the German language. He then joined Toshiba Research Europe Ltd in Cambridge as a Research Engineer to keep working on multilingual HMM-based speech synthesis, then Nuance’s Advanced Speech Group in 2011 to come back to the field of Automatic Speech Recognition. Since 2012, he has been the lead technologist at Audio Analytic Ltd, first as the Lead Research Engineer and later as the Vice President of Technology, to push the limits of the pioneering domain of Audio Event Detection, a domain in which Audio Analytic has now reached a world leading position.


Sacha Krstulović

Vice President of Technology

Audio Analytic


The workshop will be held at Hilton Budapest Hotel, Budapest, Hungary. The venue is the same as for EUSIPCO2016.


The registration fee is 50 Euros. The registration includes lunch and coffee.