Gaëtan Hadjeres

The Sound Effect Foundation Model: Beyond Text-to-Audio Generation

Abstract

We introduce the Sound Effect Foundation Model, Sony AI's generative approach to enhance sound effect creation and manipulation. By leveraging professional high-quality datasets focused exclusively on sound effects, our model generates high-fidelity audio with precise controls—extending beyond traditional text-to-audio capabilities. This model is easily extensible in order to fulfill professional creators' needs and workflows. Key features span from sound variation, infilling for seamless audio repairs to the creation of personalized audio characters and much more. Via bespoke user interfaces and professional software integration, we show that our approach suggests novel workflows while enhancing existing ones, and hopefully add AI generative models to the toolbox of professional creators.

Biography

Gaëtan Hadjeres is a Staff Research Scientist at Sony AI, where he develops generative AI tools for audio creation. His work bridges generative modeling, self-supervised learning, and human-computer interaction, with a focus on tools that expand artistic expression—from Bach-inspired chorales to expressive piano performances and novel sonic textures. His approach prioritizes the artist’s role in the creative process, offering new ways to explore AI-assisted composition. He is also a trained pianist and double bassist and studied music composition at the Conservatoire de Paris, bringing together his technical and artistic backgrounds in his works.

For more info: https://www.linkedin.com/in/gaëtan-hadjeres-a01a67a7/

Dr Simone Graetzer

Enhancing and Predicting Speech Intelligibility in Noise

Abstract

The Clarity Project is a UK Research and Innovation funded research project on speech intelligibility enhancement and prediction for Hearing Aid (HA) signal processing involving four UK universities and several project partners. Since 2019, we have been running Machine Learning (ML) challenges to facilitate the development of novel ML-based approaches. The challenges focus on speech-in-noise listening because this is the situation in which HA users report the most dissatisfaction, and we run both enhancement and prediction challenges as both enhancement and prediction algorithms are fundamental to the development of HA technology. We evaluate speech enhancement challenge submissions by running listening tests involving HA wearers. The enhanced HA signals and listening test scores can be fed into the prediction challenges. For each challenge, we provide tools, datasets and baseline systems. Our data and code are made open-source to lower barriers that might prevent researchers from considering hearing loss and hearing aids. Recently, we released the results of our third prediction challenge, and we are now gathering feedback on how to ensure sustainability beyond our current funding. You can find out more at https://claritychallenge.org/.

Biography

Simone Graetzer Ph.D. MIOA is a Senior Research Fellow currently working in speech acoustics, psychoacoustics (e.g., perception of sound emitted by low and zero carbon technologies), speech communication in noise, and hearing aid signal processing. This research is funded by the UK Research and Innovation (UKRI) Engineering and Physical Sciences Research Council (EPSRC) and Innovate UK. She is a Co-Investigator in the Centre for Doctoral Training in Sustainable Sound Futures and a Co-Lead in the EPSRC Noise Network Plus.

For more info: https://www.linkedin.com/in/simone-graetzer-b7b47892/

Content

Gaëtan Hadjeres

The Sound Effect Foundation Model: Beyond Text-to-Audio Generation

Abstract

Biography

Dr Simone Graetzer

Enhancing and Predicting Speech Intelligibility in Noise

Abstract

Biography