General notes
- Below is the program for DCASE Workshop 2021, including a timetable with the different sessions, a description for each session and the list of papers presented at each poster session.
- DCASE Workshop 2021 uses Gather Town as a virtual workshop space where attendants can meet and have casual interaction.
- Complementarily, to facilitate asynchronous interaction, a Slack workspace for the workshop is also enabled, with channels dedicated to the different papers, challenge tasks, workshop sessions and sponsors (see below for links to the channels). Note that this workspace is specific to the workshop and it is not the same as the general DCASE community Slack workspace.
- Some sessions live on a Zoom meeting room, some are pre-recorded and also streamed in Zoom, and some will be available on YouTube. Please check each sesssion's description for specific details.
- All papers is presented with a video presentation available on YouTube (see links below). Discussion with authors is possible by attending the corresponding virtual poster session in Gather Town, or by using the dedicated paper channels in the Slack workspace.
- Access links to Gather Town, Slack and Zoom are provided below, but the spaces are restricted to registered participants. Registerd participants should have received email instructions for accessing the virtual workshop spaces.
Timetable
Time | Day | |||||||
---|---|---|---|---|---|---|---|---|
PST (San Francisco, CET-9) |
EST (New York, CET-6) |
CET (Barcelona) |
JST (Tokyo, CET +8) |
Monday 15th | Tuesday 16th | Wednesday 17th | Thursday 18th | Friday 19th |
00:00 | 03:00 | 09:00 | 17:00 | Posters A1 | Posters B1 | Posters C1 | ||
01:00 | 04:00 | 10:00 | 18:00 | |||||
02:00 | 05:00 | 11:00 | 19:00 | |||||
03:00 | 06:00 | 12:00 | 20:00 | |||||
04:00 | 07:00 | 13:00 | 21:00 | |||||
05:00 | 08:00 | 14:00 | 22:00 | |||||
06:00 | 09:00 | 15:00 | 23:00 | Welcome | Challenge spotlights | Town Hall discussion | ||
07:00 | 10:00 | 16:00 | 00:00 | Keynote A | Keynote B | Challenge posters | Industry panel | Closing |
08:00 | 11:00 | 17:00 | 01:00 | ATMUS | Posters A2 | Posters B2 | Posters C2 |
Welcome session
- Live session in Gather Town / Zoom
- A video recording of the session is available for re-watching
- Chat about this session in this Slack channel
Welcome to DCASE Workshop 2021! This session will introduce the program of the workshop and introduce the online tools that will be used for the different sessions.
Keynote A
- Live session in Gather Town / Zoom
- A video recording of the session is available for re-watching
- Chat about this session in this Slack channel
Perception and Acoustics of Everyday Sound Events by Laurie Heller
The human ability to recognize sounds is crucial for being aware of events in one’s surroundings, in addition to being useful for communication and enjoyment (e.g. speech and music). There are some ways in which our auditory recognition system is robust and powerful, but there are other ways in which it can be malleable and easily fooled. I will review evidence for each of these facets of sound recognition, with an eye (or ear) towards how this could be relevant to machine learning of sounds.
We begin by establishing some fundamental psychological principles that guide how we collect and interpret data from humans. Not all such experimental data are equal in quality or validity for every application. This is important because human labels are often used as features in machine Sound Event Classification (SEC) systems, and human recognition data are often used as a benchmark for SEC system performance.
We then explore the robustness of human sound recognition. There is surprisingly minimal information required to recognize many common sounds: in the right circumstances, recognition can be robust to profound distortions of the content, such as spectral changes, obfuscation by noise, very short samples, asynchrony with video, etc. There is evidence that our perceptual systems can merge both bottom-up information (given by the acoustic stimulus) and top-down information (given by our knowledge of the world) to come up with accurate interpretations of our sound environment. Yet, there are some sounds that can be made by entirely different processes (e.g. Foley sound effects) that fool people regularly. It is informative to consider what properties allow these sound substitutions. As with SEC, there can be confusions between sounds from different causes and different categories.
The causal information provided by sound comes in the form of acoustics. While there is no guarantee that the acoustics will be unique for every causal event, there is a causal connection between the acoustics and the event that produced it. I will describe some classes of spectro-temporal acoustic features that have been suggested by human data and acoustic analysis. Human accuracy for identifying materials, objects and actions from sounds will be interpreted in this framework. Some implications for machine learning systems will be discussed.
About Laurie Heller
Laurie Heller directs the Auditory Lab at Carnegie Mellon University in Pittsburgh, PA, USA. where she is a Professor of Psychology (teaching). Her research examines the human ability to use sound to understand events happening in the environment. This basic research relates psychological performance to acoustic properties and high-level auditory information.
Initially trained in Brain and Cognitive Sciences at MIT, she received a Ph.D. in Psychology from the University of Pennsylvania and postdoctoral training in binaural hearing in the Neuroscience program at the University of Connecticut Health Center. She conducted auditory research at the Naval Submarine Medical Research Laboratory before taking a faculty position at Brown University. Since joining the Psychology Department at CMU in 2009 she has also become affiliated with CMU’s Neuroscience Institute and Music and Technology Program.
Prof Heller has coauthored papers on the perception and cognitive neuroscience of sound recognition, auditory-visual interactions, auditory-gesture interactions, sound event descriptions, auditory displays, binaural hearing, echolocation training, visual imagery, sound detection in noise, signal processing, otoacoustic emissions, noise-induce hearing loss, and sound event classification via machine learning. She created the Sound Events Database at auditorylab.org.
ATMUS
- Live session in Gather Town / Zoom
- A video recording of the session is available for re-watching. The video will only be available until the 26th of November.
- Chat about this session in this Slack channel
ATMUS (Musical Atlas) is a musical stage action in an immersive 360 sound and audiovisual environment which will have its digital premiere in the context of DCASE 2021. The performance is characterized by taking as a starting point the community digital archive of sound, visual, 360º and photographic content generated with the participation of citizens during the sound heritage projects of the BitLab Cooperative. The processes of sound maps and field recordings are concentrated in the geographies of Sant Andreu Barcelona (Kaleidoscope 2019), Raval and Sagrada Familia (Noise Maps 2020), Ciutat Meridiana (2021), Llobregat and Cardener (2021) and Besòs (2021).
The stage action will consist of a musical spectacle with 360 visuals, where 2 bands of musicians, in trio format and placed in the shape of a crescent or accordion, will play a bespoke soundtrack composed to accompany the sound and audiovisual recordings. The recording of the show will take place in an immersive cellar, with a system of 360º visual projections and multi-channel surround sound, collaborating with musicians Za!, Sara Fontan and Gambardella. ATMUS proposes an artistic take on the use of audio materials that are very similar to those found in many of the datasets used by the DCASE research community.
ATMUS is an event organized by Bitlab Cooperative in collaboration with the DCASE Workshop 2021 organization committee, the Music Technology Group of Universitat Pompeu Fabra, Sónar +D and IDEAL, Centre d'Arts Digitals. More information, can be found here.
Keynote B
- Live session in Gather Town / Zoom
- A video recording of the session is available for re-watching
- Chat about this session in this Slack channel
Look and listen: audio-visual learning in video by Kristen Grauman
Perception systems that can both see and hear have great potential to unlock real-world video understanding. I will present our recent work exploring audio-visual video analysis in terms of both semantic and spatial perception. First, we consider visually-guided audio source separation: given video with multiple sounding objects, which sounds come from which visual objects? The proposed methods can focus on a human speaker’s voice amidst busy ambient sounds, split the sounds of multiple instruments playing simultaneously, or simply provide a semantic prior for the category of a visible object. Then, moving from those semantic tasks to spatial understanding, we introduce ideas for learning about a 3D environment from audio-visual sensing, including self-supervised feature learning from echoes, audio-visual floorplan reconstruction, and active embodied source separation, where an agent intelligently moves to hear things better.
About Kristen Grauman
Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin and a Research Director in Facebook AI Research (FAIR). Her research in computer vision and machine learning focuses on visual recognition, video, and embodied perception. Before joining UT-Austin in 2007, she received her Ph.D. at MIT. She is an IEEE Fellow, AAAI Fellow, Sloan Fellow, and recipient of the 2013 Computers and Thought Award. She and her collaborators have been recognized with several Best Paper awards in computer vision, including a 2011 Marr Prize and a 2017 Helmholtz Prize (test of time award). She served as an Associate Editor-in-Chief for PAMI and as a Program Chair of CVPR 2015 and NeurIPS 2018.
Poster session A
- Live session in Gather Town
- This session has two assigned time slots (A1 and A2). Authors of each poster might only be present in one of the time slots, please check the table below to see the preferred time slot per poster.
- This session will not be recorded, but individual paper presentation videos are available for watching at any time. Here is a video playlist with all the presentatons of this session. Individual video links are in the table below.
Poster session featuring the following papers:
ID | Paper | Time slot preference |
---|---|---|
10 |
Ensemble Of Complementary Anomaly Detectors Under Domain Shifted Conditions
Jose A Lopez (Intel Labs); Georg Stemmer (Intel Labs); Paulo Lopez Meyer (Intel Labs); Pradyumna Singh (Intel Labs); Juan Del Hoyo Ontiveros (Intel Labs); Hector Cordourier (Intel Labs) Paper Poster Video Slack channel |
2 |
17 |
A Lightweight Approach for Semi-Supervised Sound Event Detection with Unsupervised Data Augmentation
Xinyu Cai (Tsinghua University); Heinrich Dinkel (Xiaomi Techonology) Paper Poster Video Slack channel |
1 |
19 |
Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information
Zhongjie Ye (Peking University); Helin Wang (Peking University); Dongchao Yang (Peking university); Yuexian Zou (Peking University) Paper Poster Video Slack channel |
1 |
23 |
Combining Multiple Distributions based on Sub-Cluster AdaCos for Anomalous Sound Detection under Domain Shifted Conditions
Kevin Wilkinghoff (Fraunhofer Institute for Communication) Best paper award Paper Poster Video Slack channel |
1&2 |
26 |
Multiple Feature Resolutions for Different Polyphonic Sound Detection Score Scenarios in DCASE 2021 Task 4
Diego de Benito-Gorron (Universidad Autónoma de Madrid); Sergio Segovia (Universidad Autónoma de Madrid); Daniel Ramos (Universidad Autónoma de Madrid); Doroteo T. Toledano (Universidad Autónoma de Madrid) Paper Poster Video Slack channel |
1&2 |
31 |
Acoustic Event Detection Using Speaker Recognition Techniques: Model Optimization and Explainable Features
Mattson Ogg (Johns Hopkins University Applied Physics Laboratory ); Benjamin Skerritt-Davis (Johns Hopkins University Applied Physics Laboratory) Paper Poster Video Slack channel |
2 |
39 |
Many-to-Many Audio Spectrogram Tansformer: Transformer for Sound Event Localization and Detection
Sooyoung Park (Electronics and Telecommunications Research Institute); Youngho Jeong (Electronics and Telecommunications Research Institute); Taejin Lee (Electronics and Telecommunications Research Institute) Paper Poster Video Slack channel |
1 |
40 |
An Ensemble Approach to Anomalous Sound Detection Based on Conformer-Based Autoencoder and Binary Classifier Incorporated with Metric Learning
Ibuki Kuroyanagi (Nagoya University); Tomoki Hayashi (Human Dataware Lab. Co., Ltd.); Yusuke Adachi (Human Dataware Lab. Co., Ltd.); Takenori Yoshimura (Human Dataware Lab. Co., Ltd.); Kazuya Takeda (Nagoya University); Tomoki Toda (Nagoya University) Paper Poster Video Slack channel |
1 |
43 |
A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection
Archontis Politis (Tampere University); Sharath Adavanne (Tampere University); Daniel Krause (Tampere University); Antoine Deleforge (Institut National de Recherche en Informatique et en Automatique); Prerak Srivastava (Institut National de Recherche en Informatique et en Automatique); Tuomas Virtanen (Tampere University) Paper Poster Video Slack channel |
1&2 |
54 |
Multi-Scale Network based on Split Attention for Semi-supervised Sound event detection
Xiujuan Zhu (Xinjiang University); Sun Xinghao (Xinjiang University) Paper Poster Video Slack channel |
1 |
55 |
Leveraging State-of-the-art ASR Techniques to Audio Captioning
Chaitanya Prasad Narisetty (Carnegie Mellon University); Tomoki Hayashi (Nagoya University); Ryunosuke Ishizaki (Nagoya University); Shinji Watanabe (Johns Hopkins University); Kazuya Takeda (Nagoya University) Paper Poster Video Slack channel |
2 |
59 |
micarraylib: Software for Reproducible Aggregation, Standardization, and Signal Processing of Microphone Array Datasets
Iran R Roman (New York University); Juan P Bello (New York University) Paper Poster Video Slack channel |
2 |
60 |
Improved Student Model Training for Acoustic Event Detection Models
Anthea H Cheung (Amazon); Qingming Tang (Amazon); Chieh-Chi Kao (Amazon); Ming Sun (Amazon); Chao Wang (Amazon) Paper Poster Video Slack channel |
2 |
70 |
Transfer Learning followed by Transformer for Automated Audio Captioning
Baekseung Kim (Chung-Ang University); Hyejin Won (Chung-Ang University); Il-Youp Kwak (Chung-Ang University); Changwon Lim (Chung-Ang University) Paper Poster Video Slack channel |
1 |
Poster session B
- Live session in Gather Town
- This session has two assigned time slots (B1 and B2). Authors of each poster might only be present in one of the time slots, please check the table below for an indication about when will authors be preset.
- This session will not be recorded, but individual paper presentation videos are available for watching at any time. Here is a video playlist with all the presentatons of this session. Individual video links are in the table below.
Poster session featuring the following papers:
ID | Paper | Time slot preference |
---|---|---|
9 |
Automated Audio Captioning with Weakly Supervised Pre-Training and Word Selection Methods
Qichen Han (NetEase); Weiqiang Yuan (NetEase); Dong Liu (NetEase); Xiang Li (NetEase); Zhen Yang (NetEase) Paper Poster Video Slack channel |
1 |
11 |
Squeeze-Excitation Convolutional Recurrent Neural Networks for Audio-Visual Scene Classification
Javier Naranjo-Alcazar (Universitat de València); Sergi Perez-Castanos (Universitat de València); Maximo Cobos (Universitat de València); Francesc J. Ferri (Universitat de València); Pedro Zuccarello (Instituto Tecnológico de Informática) Paper Poster Video Slack channel |
1 |
12 |
Domain Generalization on Efficient Acoustic Scene Classification Using Residual Normalization
Byeonggeun Kim (Qualcomm AI Research); Seunghan Yang (Qualcomm AI Research); Jangho Kim (Seoul National University); Simyung Chang (Qualcomm AI Research) Paper Poster Video Slack channel |
1 |
16 |
A Contrastive Semi-Supervised Learning Framework For Anomaly Sound Detection
Xinyu Cai (Tsinghua University); Heinrich Dinkel (Xiaomi Techonology) Paper Poster Video Slack channel |
- |
22 |
Toward Interpretable Polyphonic Sound Event Detection with Attention Maps Based on Local Prototypes
Pablo Zinemanas (Universitat Pompeu Fabra); Martín Rocamora (Universidad de la República); Eduardo Fonseca (Universitat Pompeu Fabra); Frederic Font (Universitat Pompeu Fabra); Xavier Serra (Universitat Pompeu Fabra) Paper Poster Video Slack channel |
1&2 |
29 |
Fairness and Underspecification in Acoustic Scene Classification: The Case for Disaggregated Evaluations
Andreas Triantafyllopoulos (audEERING GmbH / University of Augsburg); Manuel Milling (University of Augsburg); Konstantinos Drossos (Tampere University); Björn Schuller (University of Augsburg) Paper Poster Video Slack channel |
2 |
34 |
Diversity and Bias in Audio Captioning Datasets
Irene Martin (Tampere University); Annamaria Mesaros (Tampere University) Paper Poster Video Slack channel |
1 |
38 |
Assessment of Self-Attention on Learned Features For Sound Event Localization and Detection
Parthasaarathy Ariyakulam Sudarsanam (Tampere University); Archontis Politis (Tampere University); Konstantinos Drossos (Tampere University) Paper Poster Video Slack channel |
1&2 |
47 |
Sound Event Localization and Detection Based on Adaptive Hybrid Convolution and Multi-scale Feature Extractor
Sun Xinghao (Xinjiang University) Paper Poster Video Slack channel |
2 |
51 |
Continual Learning for Automated Audio Captioning Using the Learning without Forgetting Approach
Jan Berg (Tampere University); Konstantinos Drossos (Tampere University) Paper Poster Video Slack channel |
1 |
56 |
Using UMAP to Inspect Audio Data for Unsupervised Anomaly Detection Under Domain-Shift Conditions
Andres Fernandez (University of Surrey); Mark D. Plumbley (University of Surrey) Paper Poster Video Slack channel |
- |
57 |
Automated Audio Captioning by Fine-Tuning BART with AudioSet Tags
Felix Gontier (Institut National de Recherche en Informatique et en Automatique); Romain Serizel (Université de Lorraine); Christophe Cerisara (Centre National de la Recherche Scientifique) Paper Poster Video Slack channel |
- |
65 |
CL4AC: A Contrastive Loss for Audio Captioning
Xubo Liu (University of Surrey); Qiushi Huang (University of Surrey); Xinhao Mei (University of Surrey); Tom Ko (South University of Science and Technology); H. Tang (University of Surrey); Mark D. Plumbley (University of Surrey); Wenwu Wang (University of Surrey) Paper Poster Video Slack channel |
2 |
67 |
An Encoder-Decoder Based Audio Captioning System with Transfer and Reinforcement Learning
Xinhao Mei (University of Surrey); Qiushi Huang (University of Surrey); Xubo Liu (University of Surrey); Gengyun Chen (Nanjing University of Posts and Telecommunications); Jingqian Wu (Wake Forest University); Yusong Wu (University of Montreal); Jinzheng ZHAO (University of Surrey); Shengchen Li (Xi'an Jiaotong - Liverpool University); Tom Ko (South University of Science and Technology); H. Tang (University of Surrey); Xi Shao (Nanjing University of Posts and Telecommunications); Mark D. Plumbley (University of Surrey); Wenwu Wang (University of Surrey) Paper Poster Video Slack channel |
2 |
71 |
Self-Trained Audio Tagging and Sound Event Detection in Domestic Environments
Janek Ebbers (Paderborn University); Reinhold Haeb-Umbach (University of Paderborn) Paper Poster Video Slack channel |
- |
Poster session C
- Live session in Gather Town
- This session has two assigned time slots (C1 and C2). Authors of each poster might only be present in one of the time slots, please check the table below for an indication about when will authors be preset.
- This session will not be recorded, but individual paper presentation videos are available for watching at any time. Here is a video playlist with all the presentatons of this session. Individual video links are in the table below.
Poster session featuring the following papers:
ID | Paper | Time slot preference |
---|---|---|
6 |
ToyADMOS2: Another Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection under Domain Shift Conditions
Noboru Harada (NTT Corporation); Daisuke Niizumi (NTT Corporation); Daiki Takeuchi (NTT Corporation); Yasunori Ohishi (NTT Corporation); Masahiro Yasuda (NTT Corporation); Shoichiro Saito (NTT Corporation) Paper Poster Video Slack channel |
1&2 |
13 |
Detecting Presence Of Speech In Acoustic Data Obtained From Beehives
Pascal Janetzky (University of Würzburg); Padraig Davidson (University of Würzburg); Michael Steininger (University of Würzburg); Anna Krause (University of Würzburg); Andreas Hotho (Universitat of Würzburg) Paper Poster Video Slack channel |
1&2 |
25 |
Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning
Benno Weck (Huawei); Xavier Favory (Universitat Pompeu Fabra); Konstantinos Drossos (Tampere University); Xavier Serra (Universitat Pompeu Fabra) Paper Poster Video Slack channel |
1&2 |
30 |
Semi-supervised Sound Event Detection Using Multiscale Channel Attention and Multiple Consistency Training
Yih Wen Wang (National Sun Yat-sen University); Chia-Ping Chen (National Sun Yat-sen University); Chung-Li Lu (Chunghwa Telecom Laboratories); Bo-Cheng Chan (Chunghwa Telecom Laboratories) Paper Poster Video Slack channel |
1 |
35 |
A Multi-Modal Fusion Approach for Audio-Visual Scene Classification Enhanced by CLIP Variants
Soichiro Okazaki (Hitachi, Ltd.); Quan Kong (Hitachi, Ltd.); Tomoaki Yoshinaga (Hitachi, Ltd.) Paper Poster Video Slack channel |
1&2 |
41 |
The Impact of Non-Target Events in Synthetic Soundscapes for Sound Event Detection
Francesca Ronchini (Institut National de Recherche en Informatique et en Automatique); Romain Serizel (Université de Lorraine); Nicolas Turpault (Institut National de Recherche en Informatique et en Automatique); Samuele Cornell (Università Politecnica delle Marche) Paper Poster Video Slack channel |
1&2 |
42 |
What Makes Sound Event Localization and Detection Difficult? Insights from Error Analysis
Thi Ngoc Tho Nguyen (Nanyang Technological University); Karn N. Watcharasupat (Nanyang Technological University); Zhen Jian Lee; Ngoc Khanh Nguyen; Douglas L. Jones (University of Illinois Urbana-Champaign); Woon Seng Gan (Nanyang Technological University) Paper Poster Video Slack channel |
1&2 |
48 |
On the Effect of Coding Artifacts on Acoustic Scene Classification
Nagashree Rao (University of Erlangen-Nuremberg); Nils G Peters (University of Erlangen-Nuremberg) Paper Poster Video Slack channel |
1 |
53 |
Active Learning for Sound Event Classification using Monte-Carlo Dropout and PANN Embeddings
Stepan Shishkin (Fraunhofer Institute for Digital Media Technology); Danilo Hollosi (Fraunhofer Institute for Digital Media Technology); Simon Doclo (University of Oldenburg); Stefan Goetze (University of Sheffield) Best student paper award Paper Poster Video Slack channel |
1&2 |
62 |
MONYC: Music of New York City Dataset
Magdalena Fuentes (New York University); Danielle Zhao (New York University); Vincent Lostanlen (Cornell Lab of Ornithology); Mark Cartwright (New Jersey Institute of Technology); Charlie Mydlarz (New York University); Juan P Bello (New York University) Paper Poster Video Slack channel |
2 |
66 |
ARCA23K: An Audio Dataset for Investigating Open-Set Label Noise
Turab Iqbal (University of Surrey); Yin Cao (University of Surrey); Andrew Bailey (University of Surrey); Mark D. Plumbley (University of Surrey); Wenwu Wang (University of Surrey) Paper Poster Video Slack channel |
2 |
68 |
Audio Captioning Transformer
Xinhao Mei (University of Surrey); Xubo Liu (University of Surrey); Qiushi Huang (University of Surrey); Mark D. Plumbley (University of Surrey); Wenwu Wang (University of Surrey) Paper Poster Video Slack channel |
1 |
69 |
Waveforms and Spectrograms: Enhancing Acoustic Scene Classification Using Multimodal Feature Fusion
Dennis Fedorishin (University at Buffalo); Nishant Sankaran (University at Buffalo); Deen D Mohan (University at Buffalo); Justas Birgiolas (ACV Auctions); Philip Schneider (ACV Auctions); Srirangaraj Setlur (University at Buffalo, SUNY); Venu Govindaraju (University at Buffalo) Paper Poster Video Slack channel |
1&2 |
72 |
Improving Sound Event Detection with Foreground-Background Classification and Domain Adaptation
Michel Olvera (Institut National de Recherche en Informatique et en Automatique) Paper Poster Video Slack channel |
1&2 |
Challenge spotlights
- Live session in Gather Town / Zoom
- A video recording of the session is available for re-watching
This session will include short presentations from DCASE Challenge 2021 task organizers to summarise the objectives and results of each task. Q&A will be possible in the Challenge posters session scheduled right after this one.
Challenge posters
- Live session in Gather Town
- Unlike the standard poster sessions, this one only has one time slot assigned
- This session will not be recorded, but some task posters were submitted as papers and feature video presentations (linked below).
Session with posters sumarising the results of each DCASE Challenge 2021 task. The task organizers will be present at the session and available for Q&A. Note that for tasks 1A, 1B, 2 and 5, full papers were submitted and accepted to the workshop, thus the paper and the video presentation is also available and linked below.
Task | Paper ID | Title |
---|---|---|
1A | 33 |
Low-Complexity Acoustic Scene Classification for Multi-Device Audio: Analysis of DCASE 2021 Challenge Systems
Irene Martin (Tampere University); Toni Heittola (Tampere University); Annamaria Mesaros (Tampere University); Tuomas Virtanen (Tampere University) Paper Poster Video Slack channel |
1B | 20 |
Audio-Visual Scene Classification: Analysis of DCASE 2021 Challenge Submissions
Shanshan Wang (Tampere University); Annamaria Mesaros (Tampere University); Toni Heittola (Tampere University); Tuomas Virtanen (Tampere University) Paper Poster Video Slack channel |
2 | 61 |
Description and Discussion on DCASE 2021 Challenge Task 2: Unsupervised Anomalous Detection for Machine Condition Monitoring Under Domain Shifted Conditions
Yohei Kawaguchi (Hitachi, Ltd.); Keisuke Imoto (Doshisha University); Yuma Koizumi (Google, Inc.); Noboru Harada (NTT Corporation); Daisuke Niizumi (NTT Corporation); Kota Dohi (Hitachi, Ltd.); Ryo Tanabe (Hitachi, Ltd.); Harsh Purohit (Hitachi Ltd.); Takashi Endo (Hitachi, Ltd.) Paper Poster Video Slack channel |
3 | - | Sound Event Localization and Detection with Directional Interference Archontis Politis (Tampere University); Antoine Deleforge (Institut National de Recherche en Informatique et en Automatique); Sharath Adavanne (Tampere University); Prerak Srivastava (Institut National de Recherche en Informatique et en Automatique); Daniel Krause (Tampere University); Tuomas Virtanen (Tampere University) Poster Slack channel |
4 | - | Sound Event Detection and Separation in Domestic Environments Romain Serizel (Université de Lorraine); Nicolas Turpault (Institut National de Recherche en Informatique et en Automatique); Francesca Ronchini (Institut National de Recherche en Informatique et en Automatique); Scott Wisdom (Google, Inc.); Hakan Erdogan (Google, Inc.); John Hershey (Google, Inc.); Justin Salamon (Adobe Research); Prem Seethataman (Northwestern University); Eduardo Fonseca (Universitat Pompeu Fabra); Samuele Cornell (Università Politecnica delle Marche); Daniel P. W. Ellis (Google, Inc.) Poster Slack channel |
5 | 52 |
Few-Shot Bioacoustic Event Detection: A New Task at the DCASE 2021 Challenge
Veronica Morfi (Queen Mary University of London); Ines Nolasco (Queen Mary University of London); Vincent Lostanlen (Centre National de la Recherche Scientifique); Shubhr Singh (Queen Mary University of London); Ariana Strandburg-Peshkin (University of Konstanz); Lisa Gill (BIOTOPIA); Hanna Pamuła (AGH University of Science and Technology); David Benvent (Cornell University); Dan Stowell (Tilburg University) Paper Poster Video Slack channel |
6 | - | Automated Audio Captioning Konstantinos Drossos (Tampere University); Samuel Lipping (Tampere University); Tuomas Virtanen (Tampere University) Poster Slack channel |
Industry panel
- Live session in Gather Town / Zoom
- This session will not be recorded
- Propose and vote topics for discussion using this Sli.do board
- Chat about this session in this Slack channel
A discussion around DCASE topics particularly relevant for the industry. The panelists witll be Fatemeh Saki (Qualcomm), Mingqing Yun (Dolby), Yohei Kawaguchi (Hitachi), Sacha Krstulovic (Audio Analytic) and Justin Salamon (Adobe Research) as moderator. The topics for the discussion can be collaboratively decided using a sli.do board (see link above).
Town Hall discussion
- Live session in Gather Town / Zoom
- A video recording of the session is available for re-watching
- Propose and vote topics for discussion using this Sli.do board
- Chat about this session in this Slack channel
A discussion about the past, present and future of DCASE research. The panelists witll be Hanna Lukashevich (Fraunhofer IDMT), Romain Serizel (University of Lorraine), Magdalena Fuentes (New York University), Dan Stowell (Tilburg University) and Mark Plumbley (Univerisy of Surrey) as moderator. The topics for the discussion can be collaboratively decided using a sli.do board (see link above).
Closing session
- Live session in Gather Town / Zoom
- A video recording of the session is available for re-watching
- Chat about this session in this Slack channel
Closing session for the DCASE Workshop 2021. We will show some statistics of this year's edition, announce the winners of the Best Paper Award and get a glimpse of DCASE Workshop 2022.