Audio-Dependent Question Answering


Challenge results

Task description

The Audio-Dependent Question Answering (ADQA) task focuses on addressing a critical bottleneck in current Large Audio-Language Models (LALMs): "Textual Hallucination." Many state-of-the-art models currently pass audio understanding benchmarks by relying on text prompts and internal linguistic priors rather than actual audio perception. This task evaluates whether LALMs truly "listen" to audio or rely on textual shortcuts, using Audio-Dependency Filtering to ensure genuine audio perception.

More detailed task description can be found in the task description page

Submission statistics

  • Number of teams: 14
  • Number of submissions: 36
  • Lightweight submissions (< 30B parameters): 29

Teams ranking

Here are listed the best systems from all teams. The ranking is based on the achieved evaluation accuracy metric.

Rank Submission Information Evaluation Model
Submission Code Corresponding
Author
Affiliation Technical
Report
Eval
Accuracy
Dev
Accuracy
Diff Parameters
Lim_CAU_task5_4 Lim Chung-Ang University Kim_CAU_t5_2026 58.33 70.50 -12.17 96000000000
Nam_IND_task5_2 Nam Independent researcher Nam_IND_t5_2026 57.17 60000000000
Hu_IOA_task5_4 Hu Institute of Acoustics, Chinese Academy of Sciences Hu_IOA_t5_2026 57.03 67.70 -10.67 8600000000
Yin_XJTLU_task5_1 Yin Xi'an Jiaotong-Liverpool University Yin_XJTLU_t5_2026 56.00 68.33 -12.33 30000000000
Cheng_Surrey_task5_1 Cheng University of Surrey + Tencent Holdings Limited Cheng_SURREY_t5_2026 53.93 65.07 -11.13 8000000000
Zhang_WHU_task5_1 Zhang Wuhan University + The Chinese University of Hongkong, Shenzhen Zhang_WHU_t5_2026 51.57 62.79 -11.22 8000000000
Tathe_UIUC_task5_1 Tathe University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab Tathe_UIUC_t5_2026 50.60 58.43 -7.83 8900000000
Huang_JAIST_task5_1 Huang Japan Advanced Institute of Science and Technology Huang_JAIST_t5_2026 49.60 64.84 -15.24 8000000000
Kim_SGU_task5_4 Kim Sogang University Kim_SOGANG_t5_2026 49.13 7000000000
Wu_XMU_task5_1 Wu Xiamen University + Tsinghua University Wu_XMU_t5_2026 47.20 53.90 -6.70 7000000000
ZC_Inst_task5_1 Zheng hangzhou dianzi university Zheng_HDU_t5_2026 46.03 64.45 -18.42 8000000000
Guan_HEU_task5_1 Guan Harbin Engineering University + University of Technology Sydney Xiao_HEU_t5_2026 43.37 53.08 -9.71 8000000000
Song_BIT_task5_1 Song Beijing Institute of Technology Song_BIT_t5_2026 43.33 50.53 -7.20 7000000000
Xu_HUST_task5_1 Xu Huazhong University of Science and Technology Xu_HUST_t5_2026 31.87 56.69 -24.82 5400000000

Overall systems ranking

Here are listed all systems and their ranking according to the different metrics.

Rank Submission Information Evaluation Model
Submission Code Corresponding
Author
Affiliation Technical
Report
Eval
Accuracy
Dev
Accuracy
Diff Parameters Lightweight
Lim_CAU_task5_4 Lim Chung-Ang University Kim_CAU_t5_2026 58.33 70.50 -12.17 96000000000
Lim_CAU_task5_3 Lim Chung-Ang University Kim_CAU_t5_2026 58.10 70.01 -11.91 80000000000
Lim_CAU_task5_1 Lim Chung-Ang University Kim_CAU_t5_2026 57.30 69.63 -12.33 8000000000
Nam_IND_task5_2 Nam Independent researcher Nam_IND_t5_2026 57.17 60000000000
Nam_IND_task5_4 Nam Independent researcher Nam_IND_t5_2026 57.13 94000000000
Lim_CAU_task5_2 Lim Chung-Ang University Kim_CAU_t5_2026 57.07 69.69 -12.63 8000000000
Hu_IOA_task5_4 Hu Institute of Acoustics, Chinese Academy of Sciences Hu_IOA_t5_2026 57.03 67.70 -10.67 8600000000
Nam_IND_task5_3 Nam Independent researcher Nam_IND_t5_2026 56.73 60000000000
Hu_IOA_task5_3 Hu Institute of Acoustics, Chinese Academy of Sciences Hu_IOA_t5_2026 56.70 66.02 -9.32 8600000000
Yin_XJTLU_task5_1 Yin Xi'an Jiaotong-Liverpool University Yin_XJTLU_t5_2026 56.00 68.33 -12.33 30000000000
Nam_IND_task5_1 Nam Independent researcher Nam_IND_t5_2026 55.90 67.27 -11.37 60000000000
Yin_XJTLU_task5_4 Yin Xi'an Jiaotong-Liverpool University Yin_XJTLU_t5_2026 55.80 66.40 -10.60 8000000000
Yin_XJTLU_task5_2 Yin Xi'an Jiaotong-Liverpool University Yin_XJTLU_t5_2026 55.60 64.97 -9.37 8000000000
Yin_XJTLU_task5_3 Yin Xi'an Jiaotong-Liverpool University Yin_XJTLU_t5_2026 55.27 66.46 -11.19 8000000000
Cheng_Surrey_task5_1 Cheng University of Surrey + Tencent Holdings Limited Cheng_SURREY_t5_2026 53.93 65.07 -11.13 8000000000
Cheng_Surrey_task5_2 Cheng University of Surrey + Tencent Holdings Limited Cheng_SURREY_t5_2026 53.50 65.01 -11.51 8000000000
Baseline_Qwen3-Omni-30B Baseline DCASE 2026 Task 5 Organizers 53.17 62.48 -9.31 30000000000
Zhang_WHU_task5_1 Zhang Wuhan University + The Chinese University of Hongkong, Shenzhen Zhang_WHU_t5_2026 51.57 62.79 -11.22 8000000000
Zhang_WHU_task5_2 Zhang Wuhan University + The Chinese University of Hongkong, Shenzhen Zhang_WHU_t5_2026 51.13 62.79 -11.66 8000000000
Tathe_UIUC_task5_1 Tathe University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab Tathe_UIUC_t5_2026 50.60 58.43 -7.83 8900000000
Hu_IOA_task5_2 Hu Institute of Acoustics, Chinese Academy of Sciences Hu_IOA_t5_2026 49.87 58.93 -9.06 7600000000
Tathe_UIUC_task5_2 Tathe University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab Tathe_UIUC_t5_2026 49.83 57.31 -7.48 8900000000
Hu_IOA_task5_1 Hu Institute of Acoustics, Chinese Academy of Sciences Hu_IOA_t5_2026 49.63 58.93 -9.30 7600000000
Huang_JAIST_task5_1 Huang Japan Advanced Institute of Science and Technology Huang_JAIST_t5_2026 49.60 64.84 -15.24 8000000000
Kim_SGU_task5_4 Kim Sogang University Kim_SOGANG_t5_2026 49.13 7000000000
Kim_SGU_task5_3 Kim Sogang University Kim_SOGANG_t5_2026 48.87 59.43 -10.56 7000000000
Tathe_UIUC_task5_3 Tathe University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab Tathe_UIUC_t5_2026 48.67 56.32 -7.65 8900000000
Tathe_UIUC_task5_4 Tathe University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab Tathe_UIUC_t5_2026 48.33 55.51 -7.18 8900000000
Kim_SGU_task5_2 Kim Sogang University Kim_SOGANG_t5_2026 47.97 73.43 -25.46 7000000000
Wu_XMU_task5_1 Wu Xiamen University + Tsinghua University Wu_XMU_t5_2026 47.20 53.90 -6.70 7000000000
Kim_SGU_task5_1 Kim Sogang University Kim_SOGANG_t5_2026 46.77 72.76 -25.99 7000000000
ZC_Inst_task5_1 Zheng hangzhou dianzi university Zheng_HDU_t5_2026 46.03 64.45 -18.42 8000000000
Guan_HEU_task5_1 Guan Harbin Engineering University + University of Technology Sydney Xiao_HEU_t5_2026 43.37 53.08 -9.71 8000000000
Song_BIT_task5_1 Song Beijing Institute of Technology Song_BIT_t5_2026 43.33 50.53 -7.20 7000000000
Song_BIT_task5_2 Song Beijing Institute of Technology Song_BIT_t5_2026 41.53 48.10 -6.57 7000000000
Song_BIT_task5_3 Song Beijing Institute of Technology Song_BIT_t5_2026 40.97 54.20 -13.23 7000000000
Xu_HUST_task5_1 Xu Huazhong University of Science and Technology Xu_HUST_t5_2026 31.87 56.69 -24.82 5400000000

Lightweight system ranking

Here are listed all lightweight submissions (systems with less than 30B parameters).

Rank Submission Information Evaluation Model
Submission Code Corresponding
Author
Affiliation Technical
Report
Eval
Accuracy
Dev
Accuracy
Diff Parameters Pretrained
Model
Lim_CAU_task5_1 Lim Chung-Ang University Kim_CAU_t5_2026 57.30 69.63 -12.33 8000000000 MOSS-Audio-8B-Thinking
Lim_CAU_task5_2 Lim Chung-Ang University Kim_CAU_t5_2026 57.07 69.69 -12.63 8000000000 MOSS-Audio-8B-Thinking
Hu_IOA_task5_4 Hu Institute of Acoustics, Chinese Academy of Sciences Hu_IOA_t5_2026 57.03 67.70 -10.67 8600000000 MOSS-Audio-8B-Thinking, Qwen3-Embedding-0.6B
Hu_IOA_task5_3 Hu Institute of Acoustics, Chinese Academy of Sciences Hu_IOA_t5_2026 56.70 66.02 -9.32 8600000000 MOSS-Audio-8B-Thinking, Qwen3-Embedding-0.6B
Yin_XJTLU_task5_4 Yin Xi'an Jiaotong-Liverpool University Yin_XJTLU_t5_2026 55.80 66.40 -10.60 8000000000 MOSS-Audio-8B-Thinking
Yin_XJTLU_task5_2 Yin Xi'an Jiaotong-Liverpool University Yin_XJTLU_t5_2026 55.60 64.97 -9.37 8000000000 MOSS-Audio-8B-Thinking
Yin_XJTLU_task5_3 Yin Xi'an Jiaotong-Liverpool University Yin_XJTLU_t5_2026 55.27 66.46 -11.19 8000000000 MOSS-Audio-8B-Thinking
Cheng_Surrey_task5_1 Cheng University of Surrey + Tencent Holdings Limited Cheng_SURREY_t5_2026 53.93 65.07 -11.13 8000000000 MOSS-Audio-8B-Thinking
Cheng_Surrey_task5_2 Cheng University of Surrey + Tencent Holdings Limited Cheng_SURREY_t5_2026 53.50 65.01 -11.51 8000000000 MOSS-Audio-8B-Thinking
Zhang_WHU_task5_1 Zhang Wuhan University + The Chinese University of Hongkong, Shenzhen Zhang_WHU_t5_2026 51.57 62.79 -11.22 8000000000 MOSS-Audio-8B-Thinking
Zhang_WHU_task5_2 Zhang Wuhan University + The Chinese University of Hongkong, Shenzhen Zhang_WHU_t5_2026 51.13 62.79 -11.66 8000000000 MOSS-Audio-8B-Thinking
Tathe_UIUC_task5_1 Tathe University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab Tathe_UIUC_t5_2026 50.60 58.43 -7.83 8900000000 Qwen2.5-Omni-7B
Hu_IOA_task5_2 Hu Institute of Acoustics, Chinese Academy of Sciences Hu_IOA_t5_2026 49.87 58.93 -9.06 7600000000 Qwen2.5-Omni-7B, Qwen3-Embedding-0.6B
Tathe_UIUC_task5_2 Tathe University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab Tathe_UIUC_t5_2026 49.83 57.31 -7.48 8900000000 Qwen2.5-Omni-7B
Hu_IOA_task5_1 Hu Institute of Acoustics, Chinese Academy of Sciences Hu_IOA_t5_2026 49.63 58.93 -9.30 7600000000 Qwen2.5-Omni-7B, Qwen3-Embedding-0.6B
Huang_JAIST_task5_1 Huang Japan Advanced Institute of Science and Technology Huang_JAIST_t5_2026 49.60 64.84 -15.24 8000000000 Fun-Audio-Chat-8B
Kim_SGU_task5_4 Kim Sogang University Kim_SOGANG_t5_2026 49.13 7000000000 MiMo-Audio-7B-Instruct
Kim_SGU_task5_3 Kim Sogang University Kim_SOGANG_t5_2026 48.87 59.43 -10.56 7000000000 MiMo-Audio-7B-Instruct
Tathe_UIUC_task5_3 Tathe University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab Tathe_UIUC_t5_2026 48.67 56.32 -7.65 8900000000 Qwen2.5-Omni-7B
Tathe_UIUC_task5_4 Tathe University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab Tathe_UIUC_t5_2026 48.33 55.51 -7.18 8900000000 Qwen2.5-Omni-7B
Kim_SGU_task5_2 Kim Sogang University Kim_SOGANG_t5_2026 47.97 73.43 -25.46 7000000000 MiMo-Audio-7B-Instruct
Wu_XMU_task5_1 Wu Xiamen University + Tsinghua University Wu_XMU_t5_2026 47.20 53.90 -6.70 7000000000 Qwen2.5-Omni-7B
Kim_SGU_task5_1 Kim Sogang University Kim_SOGANG_t5_2026 46.77 72.76 -25.99 7000000000 MiMo-Audio-7B-Instruct
ZC_Inst_task5_1 Zheng hangzhou dianzi university Zheng_HDU_t5_2026 46.03 64.45 -18.42 8000000000 Fun-Audio-Chat-8B
Guan_HEU_task5_1 Guan Harbin Engineering University + University of Technology Sydney Xiao_HEU_t5_2026 43.37 53.08 -9.71 8000000000 Fun-Audio-Chat-8B
Song_BIT_task5_1 Song Beijing Institute of Technology Song_BIT_t5_2026 43.33 50.53 -7.20 7000000000 MiMo-Audio-7B-Instruct
Song_BIT_task5_2 Song Beijing Institute of Technology Song_BIT_t5_2026 41.53 48.10 -6.57 7000000000 MiMo-Audio-7B-Instruct
Song_BIT_task5_3 Song Beijing Institute of Technology Song_BIT_t5_2026 40.97 54.20 -13.23 7000000000 MiMo-Audio-7B-Instruct
Xu_HUST_task5_1 Xu Huazhong University of Science and Technology Xu_HUST_t5_2026 31.87 56.69 -24.82 5400000000 MOSS-Audio-4B-Instruct, DeBERTa-v3

Ranking by base model

Here are listed all systems grouped by the base pretrained model used.

Rank Submission Information Evaluation Model
Submission Code Corresponding
Author
Affiliation Technical
Report
Eval
Accuracy
Dev
Accuracy
Diff Base
Model
System
Size
Huang_JAIST_task5_1 Huang Japan Advanced Institute of Science and Technology Huang_JAIST_t5_2026 49.60 64.84 -15.24 Fun-Audio-Chat-8B 8.0B
ZC_Inst_task5_1 Zheng hangzhou dianzi university Zheng_HDU_t5_2026 46.03 64.45 -18.42 Fun-Audio-Chat-8B 8.0B
Guan_HEU_task5_1 Guan Harbin Engineering University + University of Technology Sydney Xiao_HEU_t5_2026 43.37 53.08 -9.71 Fun-Audio-Chat-8B 8.0B
Xu_HUST_task5_1 Xu Huazhong University of Science and Technology Xu_HUST_t5_2026 31.87 56.69 -24.82 MOSS-Audio-4B-Instruct 5.4B
Lim_CAU_task5_4 Lim Chung-Ang University Kim_CAU_t5_2026 58.33 70.50 -12.17 MOSS-Audio-8B-Thinking 96.0B
Lim_CAU_task5_3 Lim Chung-Ang University Kim_CAU_t5_2026 58.10 70.01 -11.91 MOSS-Audio-8B-Thinking 80.0B
Lim_CAU_task5_1 Lim Chung-Ang University Kim_CAU_t5_2026 57.30 69.63 -12.33 MOSS-Audio-8B-Thinking 8.0B
Lim_CAU_task5_2 Lim Chung-Ang University Kim_CAU_t5_2026 57.07 69.69 -12.63 MOSS-Audio-8B-Thinking 8.0B
Hu_IOA_task5_4 Hu Institute of Acoustics, Chinese Academy of Sciences Hu_IOA_t5_2026 57.03 67.70 -10.67 MOSS-Audio-8B-Thinking 8.6B
Hu_IOA_task5_3 Hu Institute of Acoustics, Chinese Academy of Sciences Hu_IOA_t5_2026 56.70 66.02 -9.32 MOSS-Audio-8B-Thinking 8.6B
Yin_XJTLU_task5_4 Yin Xi'an Jiaotong-Liverpool University Yin_XJTLU_t5_2026 55.80 66.40 -10.60 MOSS-Audio-8B-Thinking 8.0B
Yin_XJTLU_task5_2 Yin Xi'an Jiaotong-Liverpool University Yin_XJTLU_t5_2026 55.60 64.97 -9.37 MOSS-Audio-8B-Thinking 8.0B
Yin_XJTLU_task5_3 Yin Xi'an Jiaotong-Liverpool University Yin_XJTLU_t5_2026 55.27 66.46 -11.19 MOSS-Audio-8B-Thinking 8.0B
Cheng_Surrey_task5_1 Cheng University of Surrey + Tencent Holdings Limited Cheng_SURREY_t5_2026 53.93 65.07 -11.13 MOSS-Audio-8B-Thinking 8.0B
Cheng_Surrey_task5_2 Cheng University of Surrey + Tencent Holdings Limited Cheng_SURREY_t5_2026 53.50 65.01 -11.51 MOSS-Audio-8B-Thinking 8.0B
Zhang_WHU_task5_1 Zhang Wuhan University + The Chinese University of Hongkong, Shenzhen Zhang_WHU_t5_2026 51.57 62.79 -11.22 MOSS-Audio-8B-Thinking 8.0B
Zhang_WHU_task5_2 Zhang Wuhan University + The Chinese University of Hongkong, Shenzhen Zhang_WHU_t5_2026 51.13 62.79 -11.66 MOSS-Audio-8B-Thinking 8.0B
Kim_SGU_task5_4 Kim Sogang University Kim_SOGANG_t5_2026 49.13 MiMo-Audio-7B-Instruct 7.0B
Kim_SGU_task5_3 Kim Sogang University Kim_SOGANG_t5_2026 48.87 59.43 -10.56 MiMo-Audio-7B-Instruct 7.0B
Kim_SGU_task5_2 Kim Sogang University Kim_SOGANG_t5_2026 47.97 73.43 -25.46 MiMo-Audio-7B-Instruct 7.0B
Kim_SGU_task5_1 Kim Sogang University Kim_SOGANG_t5_2026 46.77 72.76 -25.99 MiMo-Audio-7B-Instruct 7.0B
Song_BIT_task5_1 Song Beijing Institute of Technology Song_BIT_t5_2026 43.33 50.53 -7.20 MiMo-Audio-7B-Instruct 7.0B
Song_BIT_task5_2 Song Beijing Institute of Technology Song_BIT_t5_2026 41.53 48.10 -6.57 MiMo-Audio-7B-Instruct 7.0B
Song_BIT_task5_3 Song Beijing Institute of Technology Song_BIT_t5_2026 40.97 54.20 -13.23 MiMo-Audio-7B-Instruct 7.0B
Tathe_UIUC_task5_1 Tathe University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab Tathe_UIUC_t5_2026 50.60 58.43 -7.83 Qwen2.5-Omni-7B 8.9B
Hu_IOA_task5_2 Hu Institute of Acoustics, Chinese Academy of Sciences Hu_IOA_t5_2026 49.87 58.93 -9.06 Qwen2.5-Omni-7B 7.6B
Tathe_UIUC_task5_2 Tathe University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab Tathe_UIUC_t5_2026 49.83 57.31 -7.48 Qwen2.5-Omni-7B 8.9B
Hu_IOA_task5_1 Hu Institute of Acoustics, Chinese Academy of Sciences Hu_IOA_t5_2026 49.63 58.93 -9.30 Qwen2.5-Omni-7B 7.6B
Tathe_UIUC_task5_3 Tathe University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab Tathe_UIUC_t5_2026 48.67 56.32 -7.65 Qwen2.5-Omni-7B 8.9B
Tathe_UIUC_task5_4 Tathe University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab Tathe_UIUC_t5_2026 48.33 55.51 -7.18 Qwen2.5-Omni-7B 8.9B
Wu_XMU_task5_1 Wu Xiamen University + Tsinghua University Wu_XMU_t5_2026 47.20 53.90 -6.70 Qwen2.5-Omni-7B 7.0B
Nam_IND_task5_2 Nam Independent researcher Nam_IND_t5_2026 57.17 Qwen3-Omni-30B-A3B-Instruct 60.0B
Nam_IND_task5_4 Nam Independent researcher Nam_IND_t5_2026 57.13 Qwen3-Omni-30B-A3B-Instruct 94.0B
Nam_IND_task5_3 Nam Independent researcher Nam_IND_t5_2026 56.73 Qwen3-Omni-30B-A3B-Instruct 60.0B
Yin_XJTLU_task5_1 Yin Xi'an Jiaotong-Liverpool University Yin_XJTLU_t5_2026 56.00 68.33 -12.33 Qwen3-Omni-30B-A3B-Instruct 30.0B
Nam_IND_task5_1 Nam Independent researcher Nam_IND_t5_2026 55.90 67.27 -11.37 Qwen3-Omni-30B-A3B-Instruct 60.0B

System characteristics

Rank Submission
code
Technical
Report
Eval Accuracy Parameters Pretrained Model System Size Lightweight
1 Lim_CAU_task5_4 Kim_CAU_t5_2026 58.33 96000000000 MOSS-Audio-8B-Thinking, Qwen3-Omni-30B-A3B-Instruct 96.0B
2 Lim_CAU_task5_3 Kim_CAU_t5_2026 58.10 80000000000 MOSS-Audio-8B-Thinking 80.0B
3 Lim_CAU_task5_1 Kim_CAU_t5_2026 57.30 8000000000 MOSS-Audio-8B-Thinking 8.0B
4 Nam_IND_task5_2 Nam_IND_t5_2026 57.17 60000000000 Qwen3-Omni-30B-A3B-Instruct 60.0B
5 Nam_IND_task5_4 Nam_IND_t5_2026 57.13 94000000000 Qwen3-Omni-30B-A3B-Instruct, Gemma-4-E4B-it 94.0B
6 Lim_CAU_task5_2 Kim_CAU_t5_2026 57.07 8000000000 MOSS-Audio-8B-Thinking 8.0B
7 Hu_IOA_task5_4 Hu_IOA_t5_2026 57.03 8600000000 MOSS-Audio-8B-Thinking, Qwen3-Embedding-0.6B 8.6B
8 Nam_IND_task5_3 Nam_IND_t5_2026 56.73 60000000000 Qwen3-Omni-30B-A3B-Instruct 60.0B
9 Hu_IOA_task5_3 Hu_IOA_t5_2026 56.70 8600000000 MOSS-Audio-8B-Thinking, Qwen3-Embedding-0.6B 8.6B
10 Yin_XJTLU_task5_1 Yin_XJTLU_t5_2026 56.00 30000000000 Qwen3-Omni-30B-A3B-Instruct 30.0B
11 Nam_IND_task5_1 Nam_IND_t5_2026 55.90 60000000000 Qwen3-Omni-30B-A3B-Instruct 60.0B
12 Yin_XJTLU_task5_4 Yin_XJTLU_t5_2026 55.80 8000000000 MOSS-Audio-8B-Thinking 8.0B
13 Yin_XJTLU_task5_2 Yin_XJTLU_t5_2026 55.60 8000000000 MOSS-Audio-8B-Thinking 8.0B
14 Yin_XJTLU_task5_3 Yin_XJTLU_t5_2026 55.27 8000000000 MOSS-Audio-8B-Thinking 8.0B
15 Cheng_Surrey_task5_1 Cheng_SURREY_t5_2026 53.93 8000000000 MOSS-Audio-8B-Thinking 8.0B
16 Cheng_Surrey_task5_2 Cheng_SURREY_t5_2026 53.50 8000000000 MOSS-Audio-8B-Thinking 8.0B
Baseline_Qwen3-Omni-30B 53.17 30000000000 Qwen3-Omni-30B-A3B-Instruct 30.0B
17 Zhang_WHU_task5_1 Zhang_WHU_t5_2026 51.57 8000000000 MOSS-Audio-8B-Thinking 8.0B
18 Zhang_WHU_task5_2 Zhang_WHU_t5_2026 51.13 8000000000 MOSS-Audio-8B-Thinking 8.0B
19 Tathe_UIUC_task5_1 Tathe_UIUC_t5_2026 50.60 8900000000 Qwen2.5-Omni-7B 8.9B
20 Hu_IOA_task5_2 Hu_IOA_t5_2026 49.87 7600000000 Qwen2.5-Omni-7B, Qwen3-Embedding-0.6B 7.6B
21 Tathe_UIUC_task5_2 Tathe_UIUC_t5_2026 49.83 8900000000 Qwen2.5-Omni-7B 8.9B
22 Hu_IOA_task5_1 Hu_IOA_t5_2026 49.63 7600000000 Qwen2.5-Omni-7B, Qwen3-Embedding-0.6B 7.6B
23 Huang_JAIST_task5_1 Huang_JAIST_t5_2026 49.60 8000000000 Fun-Audio-Chat-8B 8.0B
24 Kim_SGU_task5_4 Kim_SOGANG_t5_2026 49.13 7000000000 MiMo-Audio-7B-Instruct 7.0B
25 Kim_SGU_task5_3 Kim_SOGANG_t5_2026 48.87 7000000000 MiMo-Audio-7B-Instruct 7.0B
26 Tathe_UIUC_task5_3 Tathe_UIUC_t5_2026 48.67 8900000000 Qwen2.5-Omni-7B 8.9B
27 Tathe_UIUC_task5_4 Tathe_UIUC_t5_2026 48.33 8900000000 Qwen2.5-Omni-7B 8.9B
28 Kim_SGU_task5_2 Kim_SOGANG_t5_2026 47.97 7000000000 MiMo-Audio-7B-Instruct 7.0B
29 Wu_XMU_task5_1 Wu_XMU_t5_2026 47.20 7000000000 Qwen2.5-Omni-7B 7.0B
30 Kim_SGU_task5_1 Kim_SOGANG_t5_2026 46.77 7000000000 MiMo-Audio-7B-Instruct 7.0B
31 ZC_Inst_task5_1 Zheng_HDU_t5_2026 46.03 8000000000 Fun-Audio-Chat-8B 8.0B
32 Guan_HEU_task5_1 Xiao_HEU_t5_2026 43.37 8000000000 Fun-Audio-Chat-8B 8.0B
33 Song_BIT_task5_1 Song_BIT_t5_2026 43.33 7000000000 MiMo-Audio-7B-Instruct 7.0B
34 Song_BIT_task5_2 Song_BIT_t5_2026 41.53 7000000000 MiMo-Audio-7B-Instruct 7.0B
35 Song_BIT_task5_3 Song_BIT_t5_2026 40.97 7000000000 MiMo-Audio-7B-Instruct 7.0B
36 Xu_HUST_task5_1 Xu_HUST_t5_2026 31.87 5400000000 MOSS-Audio-4B-Instruct, DeBERTa-v3 5.4B

Technical reports

Selective Multi-Modal RAG for DCASE 2026 Task 5: Audio-Dependent Question Answering

Yuelan Cheng, Jinzheng Zhao, Rong Wan, Peiwei Chang, Yongqiang Chen, Wenwu Wang
University of Surrey, Guildford, UK; Tencent Holdings Limited, Beijing, China

PDF

Audio-Dependent Question Answering at the DCASE 2026 Challenge

Weiteng Hu, Yin Cao, Jun Yang
Institute of Acoustics, Chinese Academy of Sciences, Beijing, China

PDF

Curriculum Learning for Audio-Dependent Question Answering: Technical Report for DCASE 2026 Task 5

Qixuan Huang, Yizhi Pan, Xiajie Zhou, Rui Li, Masashi Unoki
Japan Advanced Institute of Science and Technology, Nomi, Japan

PDF

Audio-Grounded Hard-Example Training with Acoustic Tagging for Audio-Dependent Question Answering

Hyun Jun Kim, Byeongchan Kim, Jung Chan Ryu, Yu Ra Kim, Yuri Oh, Bo Eun Choi, Changwon Lim, Il-Youp Kwak
Chung-Ang University, Seoul, Korea

PDF

Task-Leaf Routed MiMo-Audio for DCASE 2026 Task 5

Jongha Kim, Leehyeon Song, Hyung-Min Park
Sogang University, Seoul, Korea

PDF

Learning from Audio-Dependency Errors: Data Curation Strategies Based on Model Confusion Patterns in Audio Question Answering

Hyeonuk Nam
Independent researcher, Seoul, South Korea

PDF

Audio-Dependent Question Answering with Attention-Anchored Reinforcement Learning on MiMo-Audio

Hongjin Song
Beijing Institute of Technology, Beijing, China

PDF

DCASE 2026 Audio-Dependent Question Answering Task

Aniket Tathe
University of Illinois Urbana-Champaign, Urbana, Illinois, USA; Carnegie Mellon University, WavLab, Pittsburgh, Pennsylvania, USA

PDF

Qwen2.5-Omni with All-Audio Audio-TaH for Audio-Dependent Question Answering

Chenglin Wu, Daiqing Wu, Yuxi Huang
Xiamen University, Xiamen, China; Tsinghua University, Beijing, China

PDF

GISP@HEU's Submission for DCASE 2026 Task 5: A LoRA-SFT Fine-Tuned Audio-Dependent Question Answering System

Feiyang Xiao, Qiaoxi Zhu, Jian Guan
Harbin Engineering University, Harbin, China; University of Technology Sydney, Ultimo, Australia

PDF

Audio Question Answering at the DCASE 2026 Challenge

Haoran Xu, Rui Zhang
Huazhong University of Science and Technology, Wuhan, China

PDF

Training-Free Inference-Time Exploration for Audio-Dependent Question Answering

Zeyu Yin, Qi Cao, Pingsong Deng, Yizhou Tan, Shengchen Li
Xi'an Jiaotong-Liverpool University, Suzhou, China

PDF

Structured Audio Reasoning and Robust Multi-Sample Inference for DCASE 2026 Audio-Dependent Question Answering Challenge

Yucong Zhang, Juan Liu, Ming Li
Wuhan University, Hubei, China; The Chinese University of Hongkong, Shenzhen, Guangdong, China

PDF

Fun-Audio-Chat-8B with LoRA Fine-Tuning for Audio-Dependent Question Answering

Chengqi Zheng
Hangzhou Dianzi University, Hangzhou, China

PDF