8:30 Auditorium hall |
Registration |
9:00 Auditorium |
Welcome session |
9:30 Auditorium |
Keynote 1 |
Gaëtan Hadjeres
Staff Research Scientist at SonyAI
|
|
The Sound Effect Foundation Model: Beyond Text-to-Audio Generation
Abstract
We introduce the Sound Effect Foundation Model, Sony AI's generative approach to enhance sound effect creation and manipulation. By leveraging professional high-quality datasets focused exclusively on sound effects, our model generates high-fidelity audio with precise controls—extending beyond traditional text-to-audio capabilities. This model is easily extensible in order to fulfill professional creators' needs and workflows. Key features span from sound variation, infilling for seamless audio repairs to the creation of personalized audio characters and much more. Via bespoke user interfaces and professional software integration, we show that our approach suggests novel workflows while enhancing existing ones, and hopefully add AI generative models to the toolbox of professional creators.
More info
|
|
10:45 Auditorium hall |
Coffee break |
11:10 Auditorium |
Poster session 1 spotlights (includes Challenge Task spotlights) |
|
Task 1 Low-Complexity Acoustic Scene Classification with Device Information
Schmid, Florian and Primus, Paul and Heittola, Toni and Mesaros, Annamaria and Martin-Morato, Irene and Widmer, Gerhard
|
|
|
Task 2 First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring
Nishida, Tomoya and Noboru, Harada and Niizumi, Daisuke and Albertini, Davide and Sannino, Roberto and Pradolini, Simone and Augusti, Filippo and Imoto, Keisuke and Dohi, Kota and Purohit, Harsh and Endo, Takashi and Kawaguchi, Yohei
|
|
|
Task 3 Stereo Sound Event Localization and Detection in Regular Video Content
Shimada, Kazuki and Roman, Iran and Mitsufuji, Yuki and Diaz-Guerra, David and Uchida, Kengo and Takahashi, Naoya and Takahashi, Shusuke and Politis, Archontis and Virtanen, Tuomas and Sudarsanam, Parthasaarathy and Pandey, Ruchi and Koyama, Yuichiro and Shibuya, Takashi
|
|
Task 4 Spatial Semantic Segmentation of Sound Scenes
Yasuda, Masahiro and Binh Thien, Nguyen and Harada, Noboru and Serizel, Romain and Mishra, Mayank and Delcroix, Marc and Araki, Shoko and Takeuchi, Daiki and Niizumi, Daisuke and Ohishi, Yasunori and Nakatani, Tomohiro and Kawamura, Takao and Ono, Nobutaka
|
|
|
Task 5 Audio Question Answering
Yang, Huck and Ghosh, Sreyan and Wang, Qing and Kim, Jaeyeon and Hong, Hengyi and Kumar, Sonal and Zhong, Guirui and Kong, Zhifeng and Sakshi, FNU and Lokegaonkar, Vaibhavi and Duraiswami, Ramani and Manocha, Dinesh and Kim, Gunhee and Du, Jun and Valle, Rafeal
|
|
Task 6 Language-Based Audio Retrieval
Xie, Huang and Primus, Paul and Weck, Benno and Virtanen, Tuomas
|
|
Towards Spatial Audio Understanding Via Question Answering
Sudarsanam, Parthasaarathy and Politis, Archontis
|
|
|
Bioacoustics on Tiny Hardware at the BioDCASE 2025 Challenge
Carmantini, Giovanni and Benhamadi, Yasmine and Carreau, Matthieu and Kwak, Minkyung and Morandi, Ilaria and Förstner, Friedrich and Hladik, Pierre-Emmanue and Lagrange, Mathieu and Linhart, Pavel and Petrusková, Tereza and Lostanlen, Vincent and Kahl, Stefan
|
|
|
Integrating Spatial and Semantic Embeddings for Stereo Sound Event Localization in Videos
Berghi, Davide and Jackson, Philip
|
|
|
Stereo Sound Event Localization and Detection with Onscreen/Offscreen Classification
Shimada, Kazuki and Politis, Archontis and Roman, Iran and Sudarsanam, Parthasaarathy and Diaz-Guerra, David and Pandey, Ruchi and Uchida, Kengo and Koyama, Yuichiro and Takahashi, Naoya and Shibuya, Takashi and Takahashi, Shusuke and Virtanen, Tuomas and Mitsufuji, Yuki
|
|
|
Sound Event Detection using Time-frequency Bounding Boxes with a Self-Supervised Audio Spectrogram Transformer
Zhu, Zhi and Sato, Yoshinao
|
|
|
Exploiting Stereo Spatial Properties with ReCoOP Framework for Joint Sound Event Detection and Localization
Banerjee, Mohor and Nagisetty, Srikanth and Teo, Han Boon
|
|
|
12:00 Sala Aranyó |
Poster session 1 (includes Challenge Task posters) |
13:00 Plaça Guttenberg |
Lunch |
14:30 Auditorium |
Poster session 2 spotlights |
|
Listening or Reading? An Empirical Study of Modality Importance Analysis Across AQA Question Types
Yin, Zeyu and Cai, Yiqiang and Lyu, Xinyang and Deng, Pingsong and Li, Shengchen
|
|
|
Crossing the Species Divide: Transfer Learning from Speech to Animal Sounds
Cauzinille, Jules and Miron, Marius and Pietquin, Olivier and Hagiwara, Masato and Marxer, Ricard and Rey, Arnaud and Favre, Benoit
|
|
|
Comparison of Foundation Model Pre-Training Strategies and Architectures for Urban Garden Recordings
Koutsogeorgos, Parmenion and Härmä, Aki
|
|
|
Universal Incremental Learning for Few-Shot Bird Sound Classification
Mulimani , Manjunath and Mesaros, Annamaria
|
|
|
Hierarchical and Multimodal Learning for Heterogeneous Sound Classification
Anastasopoulou, Panagiota and Dal Rí, Francesco and Serra, Xavier and Font, Frederic
|
|
|
Cross-Modal Attention Architectures for Language-Based Audio Retrieval
Calvet, Oscar and Torre Toledano, Doroteo
|
|
|
Latent Multi-view Learning for Robust Environmental Sound Representations
Ding, Sivan and Wilkins, Julia and Fuentes, Magdalena and Bello, Juan Pablo
|
|
|
Robust Detection of Overlapping Bioacoustic Sound Events
Mahon, Louis and Hoffman, Benjamin and Cuisimano, Maddie and Hagiwara, Masato and James, Logan and Woolley, Sarah and Effenberger, Felix and Keen, Sara and Liu, Jen-yu and Pietquin, Olivier
|
|
|
A Lightweight Temporal Attention Module for Frequency Dynamic Sound Event Detection
Zhang, Yuliang
|
|
|
Whale-VAD: Whale Vocalisation Activity Detection
Geldenhuys, Christiaan and Tonitz, Günther and Niesler, Thomas
|
|
|
Importance-Weighted Domain Adaptation for Sound Source Tracking
Zhong, Bingxiang and Dietzen, Thomas
|
|
|
A Three-Level Evaluation Protocol for Acoustic Scene Understanding of Large Language Audio Models
Harish, Dilip and Abeßer, Jakob
|
|
|
Supervised Detection of Baleen Whale Calls on Edge-Compute
van Toor, Astrid
|
|
|
15:20 Sala Aranyó |
Poster session 2 |
16:20 Auditorium hall |
Coffee break |
16:50-18:30 Auditorium |
Townhall discussion
An open discussion session about the future of DCASE conducted by Dan Stowell and Romain Serizel. Please contribute to the discussion by filling out this form and participating in the session. |
20:00 Vraba restaurant |
Gala dinner
Diner at restaurant Vraba, in Port Vell |