Catherine Guastavino

McGill University, School of Information Studies

Making sense of the sounds around us: Auditory scene analysis in everyday listening



Categorizing sounds is of vital importance to handle the variety and complexity of complex environments and subsequently guide action (e.g., avoid an approaching car, attend to a crying baby, or answer a ringing phone). But how do people spontaneously and effortlessly categorize sounds into meaningful categories to make sense of their environment?

We begin by reviewing prominent theories and empirical evidence from research on isolated sound events as well as complex auditory scenes. We highlight the relevance of different types of similarities (acoustic, causal, and semantic) as well as person-related factors (e.g., expertise, developmental stage) and situational factors (e.g., activity, context) in everyday listening.

We then focus on the critical and complex role that sound plays in the way we manage and experience urban spaces. Systematic reviews of noise regulations from around the world have revealed inconsistencies and limitations to the classification of “problematic” urban sounds. Can research on everyday sound categorization and/or computational auditory scene analysis inform to the next generation of sound management strategies that shape our cities?

Finally, while most cities treat urban sound as “noise”, an isolated nuisance that should be mitigated when problems arise, sound can also support our well-being, orientation, focus, and our lasting memories of urban spaces - even the city as a whole (e.g. music, conversation, bird chirping, water sounds). We seek to understand which types of sounds and contexts can support these positive outcomes. We illustrate our approach through soundscape interventions from our Sounds in the City partnership, which brings together researchers, built environment professionals and citizens, to look at urban sound from a novel, resource-oriented perspective and nourish creative solutions to make cities sound better. We conclude with perspectives on how research on everyday sounds can inform computational auditory scene analysis with the aim of making our cities better places to use and inhabit.



Catherine Guastavino heads the Sounds in the City research partnership in Montreal, Canada as well as research projects on spatial hearing, spatial audio, music psychology and multisensory perception. She is an Associate Professor at McGill University where she holds a William Dawson Research Chair.

Initially trained in mathematics, she then studied Music Technology at IRCAM (Institut Recherche et Coordination Acoustique Musique, received a Ph.D. in Psychoacoustics from the Université Pierre et Marie Curie (Paris, France) and post-doctoral training in cognitive psychology at McGill before joining the McGill School of Information Studies in 2005. She is a member of the Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT) where she served as Associate Director for Scientific and Technological Research from 2007 to 2009, and an associate member of the McGill Schulich School of Music.

She has co-authored over 170 peer-reviewed articles in fields ranging from psychology to acoustics, audio engineering, information science and urban design. She is also a member of the ISO working group on Soundscape (CAC/ISO/TC43/SC1). Her research is funded by the National Science Foundation, the Natural Sciences and Engineering Research Council, the Social Sciences and Humanities Research Council, the Canadian Foundation for Innovation, the Fonds de Recherche du Québec, as well as research and development grants with industry partners.


Jessie Barry

Cornell University, The Cornell Lab of Ornithology

Deep learning meets public tools to inspire the protection of nature



At the Cornell Lab of Ornithology, we believe that birds provide a unique mechanism to provide insights into natural systems, and to provide us with a barometer to gauge our relationship with the natural world. Birds inspire and engage millions of people around the world – in most cases, putting a name on the species of bird that someone is seeing or hearing is the first and most critical step in providing deeper insights. At the Macaulay Library we have developed the world largest curated collection of sounds, photos and videos of birds and other animals. We have used these data resources to provide insights for the machine learning and computer vision community to develop and improve models to accurately detect and classify species and rapidly improve the field of fine grain image recognition. In this presentation we will explore how deep learning has been used to build models and engage more than 3 million users to accurately identify birds. We will explore the lessons learned and how these may apply to sound recognition, and what the major remaining frontiers are for the DCASE community to build applications focused on acoustics.



Jessie Barry is the Program Manager of the Macaulay Library at the Cornell Lab of Ornithology. She contributed to the creation of the Merlin Bird ID app, a free app to help anyone identify birds. Jessie helped catalyze a collaboration with eBird, to enable a global community to gather sounds and images which are archived in the Macaulay Library. As a lifelong birder, she is grateful to be able to share her passion for birds with others and inspire conservation action.