Task description
The Audio-Dependent Question Answering (ADQA) task focuses on addressing a critical bottleneck in current Large Audio-Language Models (LALMs): "Textual Hallucination." Many state-of-the-art models currently pass audio understanding benchmarks by relying on text prompts and internal linguistic priors rather than actual audio perception. This task evaluates whether LALMs truly "listen" to audio or rely on textual shortcuts, using Audio-Dependency Filtering to ensure genuine audio perception.
More detailed task description can be found in the task description page
Submission statistics
- Number of teams: 14
- Number of submissions: 36
- Lightweight submissions (< 30B parameters): 29
Teams ranking
Here are listed the best systems from all teams. The ranking is based on the achieved evaluation accuracy metric.
| Rank | Submission Information | Evaluation | Model | |||||
|---|---|---|---|---|---|---|---|---|
| Submission Code |
Corresponding Author |
Affiliation |
Technical Report |
Eval Accuracy |
Dev Accuracy |
Diff | Parameters | |
| Lim_CAU_task5_4 | Lim | Chung-Ang University | Kim_CAU_t5_2026 | 58.33 | 70.50 | -12.17 | 96000000000 | |
| Nam_IND_task5_2 | Nam | Independent researcher | Nam_IND_t5_2026 | 57.17 | 60000000000 | |||
| Hu_IOA_task5_4 | Hu | Institute of Acoustics, Chinese Academy of Sciences | Hu_IOA_t5_2026 | 57.03 | 67.70 | -10.67 | 8600000000 | |
| Yin_XJTLU_task5_1 | Yin | Xi'an Jiaotong-Liverpool University | Yin_XJTLU_t5_2026 | 56.00 | 68.33 | -12.33 | 30000000000 | |
| Cheng_Surrey_task5_1 | Cheng | University of Surrey + Tencent Holdings Limited | Cheng_SURREY_t5_2026 | 53.93 | 65.07 | -11.13 | 8000000000 | |
| Zhang_WHU_task5_1 | Zhang | Wuhan University + The Chinese University of Hongkong, Shenzhen | Zhang_WHU_t5_2026 | 51.57 | 62.79 | -11.22 | 8000000000 | |
| Tathe_UIUC_task5_1 | Tathe | University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab | Tathe_UIUC_t5_2026 | 50.60 | 58.43 | -7.83 | 8900000000 | |
| Huang_JAIST_task5_1 | Huang | Japan Advanced Institute of Science and Technology | Huang_JAIST_t5_2026 | 49.60 | 64.84 | -15.24 | 8000000000 | |
| Kim_SGU_task5_4 | Kim | Sogang University | Kim_SOGANG_t5_2026 | 49.13 | 7000000000 | |||
| Wu_XMU_task5_1 | Wu | Xiamen University + Tsinghua University | Wu_XMU_t5_2026 | 47.20 | 53.90 | -6.70 | 7000000000 | |
| ZC_Inst_task5_1 | Zheng | hangzhou dianzi university | Zheng_HDU_t5_2026 | 46.03 | 64.45 | -18.42 | 8000000000 | |
| Guan_HEU_task5_1 | Guan | Harbin Engineering University + University of Technology Sydney | Xiao_HEU_t5_2026 | 43.37 | 53.08 | -9.71 | 8000000000 | |
| Song_BIT_task5_1 | Song | Beijing Institute of Technology | Song_BIT_t5_2026 | 43.33 | 50.53 | -7.20 | 7000000000 | |
| Xu_HUST_task5_1 | Xu | Huazhong University of Science and Technology | Xu_HUST_t5_2026 | 31.87 | 56.69 | -24.82 | 5400000000 | |
Overall systems ranking
Here are listed all systems and their ranking according to the different metrics.
| Rank | Submission Information | Evaluation | Model | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Submission Code |
Corresponding Author |
Affiliation |
Technical Report |
Eval Accuracy |
Dev Accuracy |
Diff | Parameters | Lightweight | |
| Lim_CAU_task5_4 | Lim | Chung-Ang University | Kim_CAU_t5_2026 | 58.33 | 70.50 | -12.17 | 96000000000 | ||
| Lim_CAU_task5_3 | Lim | Chung-Ang University | Kim_CAU_t5_2026 | 58.10 | 70.01 | -11.91 | 80000000000 | ||
| Lim_CAU_task5_1 | Lim | Chung-Ang University | Kim_CAU_t5_2026 | 57.30 | 69.63 | -12.33 | 8000000000 | ||
| Nam_IND_task5_2 | Nam | Independent researcher | Nam_IND_t5_2026 | 57.17 | 60000000000 | ||||
| Nam_IND_task5_4 | Nam | Independent researcher | Nam_IND_t5_2026 | 57.13 | 94000000000 | ||||
| Lim_CAU_task5_2 | Lim | Chung-Ang University | Kim_CAU_t5_2026 | 57.07 | 69.69 | -12.63 | 8000000000 | ||
| Hu_IOA_task5_4 | Hu | Institute of Acoustics, Chinese Academy of Sciences | Hu_IOA_t5_2026 | 57.03 | 67.70 | -10.67 | 8600000000 | ||
| Nam_IND_task5_3 | Nam | Independent researcher | Nam_IND_t5_2026 | 56.73 | 60000000000 | ||||
| Hu_IOA_task5_3 | Hu | Institute of Acoustics, Chinese Academy of Sciences | Hu_IOA_t5_2026 | 56.70 | 66.02 | -9.32 | 8600000000 | ||
| Yin_XJTLU_task5_1 | Yin | Xi'an Jiaotong-Liverpool University | Yin_XJTLU_t5_2026 | 56.00 | 68.33 | -12.33 | 30000000000 | ||
| Nam_IND_task5_1 | Nam | Independent researcher | Nam_IND_t5_2026 | 55.90 | 67.27 | -11.37 | 60000000000 | ||
| Yin_XJTLU_task5_4 | Yin | Xi'an Jiaotong-Liverpool University | Yin_XJTLU_t5_2026 | 55.80 | 66.40 | -10.60 | 8000000000 | ||
| Yin_XJTLU_task5_2 | Yin | Xi'an Jiaotong-Liverpool University | Yin_XJTLU_t5_2026 | 55.60 | 64.97 | -9.37 | 8000000000 | ||
| Yin_XJTLU_task5_3 | Yin | Xi'an Jiaotong-Liverpool University | Yin_XJTLU_t5_2026 | 55.27 | 66.46 | -11.19 | 8000000000 | ||
| Cheng_Surrey_task5_1 | Cheng | University of Surrey + Tencent Holdings Limited | Cheng_SURREY_t5_2026 | 53.93 | 65.07 | -11.13 | 8000000000 | ||
| Cheng_Surrey_task5_2 | Cheng | University of Surrey + Tencent Holdings Limited | Cheng_SURREY_t5_2026 | 53.50 | 65.01 | -11.51 | 8000000000 | ||
| Baseline_Qwen3-Omni-30B | Baseline | DCASE 2026 Task 5 Organizers | 53.17 | 62.48 | -9.31 | 30000000000 | |||
| Zhang_WHU_task5_1 | Zhang | Wuhan University + The Chinese University of Hongkong, Shenzhen | Zhang_WHU_t5_2026 | 51.57 | 62.79 | -11.22 | 8000000000 | ||
| Zhang_WHU_task5_2 | Zhang | Wuhan University + The Chinese University of Hongkong, Shenzhen | Zhang_WHU_t5_2026 | 51.13 | 62.79 | -11.66 | 8000000000 | ||
| Tathe_UIUC_task5_1 | Tathe | University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab | Tathe_UIUC_t5_2026 | 50.60 | 58.43 | -7.83 | 8900000000 | ||
| Hu_IOA_task5_2 | Hu | Institute of Acoustics, Chinese Academy of Sciences | Hu_IOA_t5_2026 | 49.87 | 58.93 | -9.06 | 7600000000 | ||
| Tathe_UIUC_task5_2 | Tathe | University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab | Tathe_UIUC_t5_2026 | 49.83 | 57.31 | -7.48 | 8900000000 | ||
| Hu_IOA_task5_1 | Hu | Institute of Acoustics, Chinese Academy of Sciences | Hu_IOA_t5_2026 | 49.63 | 58.93 | -9.30 | 7600000000 | ||
| Huang_JAIST_task5_1 | Huang | Japan Advanced Institute of Science and Technology | Huang_JAIST_t5_2026 | 49.60 | 64.84 | -15.24 | 8000000000 | ||
| Kim_SGU_task5_4 | Kim | Sogang University | Kim_SOGANG_t5_2026 | 49.13 | 7000000000 | ||||
| Kim_SGU_task5_3 | Kim | Sogang University | Kim_SOGANG_t5_2026 | 48.87 | 59.43 | -10.56 | 7000000000 | ||
| Tathe_UIUC_task5_3 | Tathe | University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab | Tathe_UIUC_t5_2026 | 48.67 | 56.32 | -7.65 | 8900000000 | ||
| Tathe_UIUC_task5_4 | Tathe | University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab | Tathe_UIUC_t5_2026 | 48.33 | 55.51 | -7.18 | 8900000000 | ||
| Kim_SGU_task5_2 | Kim | Sogang University | Kim_SOGANG_t5_2026 | 47.97 | 73.43 | -25.46 | 7000000000 | ||
| Wu_XMU_task5_1 | Wu | Xiamen University + Tsinghua University | Wu_XMU_t5_2026 | 47.20 | 53.90 | -6.70 | 7000000000 | ||
| Kim_SGU_task5_1 | Kim | Sogang University | Kim_SOGANG_t5_2026 | 46.77 | 72.76 | -25.99 | 7000000000 | ||
| ZC_Inst_task5_1 | Zheng | hangzhou dianzi university | Zheng_HDU_t5_2026 | 46.03 | 64.45 | -18.42 | 8000000000 | ||
| Guan_HEU_task5_1 | Guan | Harbin Engineering University + University of Technology Sydney | Xiao_HEU_t5_2026 | 43.37 | 53.08 | -9.71 | 8000000000 | ||
| Song_BIT_task5_1 | Song | Beijing Institute of Technology | Song_BIT_t5_2026 | 43.33 | 50.53 | -7.20 | 7000000000 | ||
| Song_BIT_task5_2 | Song | Beijing Institute of Technology | Song_BIT_t5_2026 | 41.53 | 48.10 | -6.57 | 7000000000 | ||
| Song_BIT_task5_3 | Song | Beijing Institute of Technology | Song_BIT_t5_2026 | 40.97 | 54.20 | -13.23 | 7000000000 | ||
| Xu_HUST_task5_1 | Xu | Huazhong University of Science and Technology | Xu_HUST_t5_2026 | 31.87 | 56.69 | -24.82 | 5400000000 | ||
Lightweight system ranking
Here are listed all lightweight submissions (systems with less than 30B parameters).
| Rank | Submission Information | Evaluation | Model | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Submission Code |
Corresponding Author |
Affiliation |
Technical Report |
Eval Accuracy |
Dev Accuracy |
Diff | Parameters |
Pretrained Model |
|
| Lim_CAU_task5_1 | Lim | Chung-Ang University | Kim_CAU_t5_2026 | 57.30 | 69.63 | -12.33 | 8000000000 | MOSS-Audio-8B-Thinking | |
| Lim_CAU_task5_2 | Lim | Chung-Ang University | Kim_CAU_t5_2026 | 57.07 | 69.69 | -12.63 | 8000000000 | MOSS-Audio-8B-Thinking | |
| Hu_IOA_task5_4 | Hu | Institute of Acoustics, Chinese Academy of Sciences | Hu_IOA_t5_2026 | 57.03 | 67.70 | -10.67 | 8600000000 | MOSS-Audio-8B-Thinking, Qwen3-Embedding-0.6B | |
| Hu_IOA_task5_3 | Hu | Institute of Acoustics, Chinese Academy of Sciences | Hu_IOA_t5_2026 | 56.70 | 66.02 | -9.32 | 8600000000 | MOSS-Audio-8B-Thinking, Qwen3-Embedding-0.6B | |
| Yin_XJTLU_task5_4 | Yin | Xi'an Jiaotong-Liverpool University | Yin_XJTLU_t5_2026 | 55.80 | 66.40 | -10.60 | 8000000000 | MOSS-Audio-8B-Thinking | |
| Yin_XJTLU_task5_2 | Yin | Xi'an Jiaotong-Liverpool University | Yin_XJTLU_t5_2026 | 55.60 | 64.97 | -9.37 | 8000000000 | MOSS-Audio-8B-Thinking | |
| Yin_XJTLU_task5_3 | Yin | Xi'an Jiaotong-Liverpool University | Yin_XJTLU_t5_2026 | 55.27 | 66.46 | -11.19 | 8000000000 | MOSS-Audio-8B-Thinking | |
| Cheng_Surrey_task5_1 | Cheng | University of Surrey + Tencent Holdings Limited | Cheng_SURREY_t5_2026 | 53.93 | 65.07 | -11.13 | 8000000000 | MOSS-Audio-8B-Thinking | |
| Cheng_Surrey_task5_2 | Cheng | University of Surrey + Tencent Holdings Limited | Cheng_SURREY_t5_2026 | 53.50 | 65.01 | -11.51 | 8000000000 | MOSS-Audio-8B-Thinking | |
| Zhang_WHU_task5_1 | Zhang | Wuhan University + The Chinese University of Hongkong, Shenzhen | Zhang_WHU_t5_2026 | 51.57 | 62.79 | -11.22 | 8000000000 | MOSS-Audio-8B-Thinking | |
| Zhang_WHU_task5_2 | Zhang | Wuhan University + The Chinese University of Hongkong, Shenzhen | Zhang_WHU_t5_2026 | 51.13 | 62.79 | -11.66 | 8000000000 | MOSS-Audio-8B-Thinking | |
| Tathe_UIUC_task5_1 | Tathe | University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab | Tathe_UIUC_t5_2026 | 50.60 | 58.43 | -7.83 | 8900000000 | Qwen2.5-Omni-7B | |
| Hu_IOA_task5_2 | Hu | Institute of Acoustics, Chinese Academy of Sciences | Hu_IOA_t5_2026 | 49.87 | 58.93 | -9.06 | 7600000000 | Qwen2.5-Omni-7B, Qwen3-Embedding-0.6B | |
| Tathe_UIUC_task5_2 | Tathe | University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab | Tathe_UIUC_t5_2026 | 49.83 | 57.31 | -7.48 | 8900000000 | Qwen2.5-Omni-7B | |
| Hu_IOA_task5_1 | Hu | Institute of Acoustics, Chinese Academy of Sciences | Hu_IOA_t5_2026 | 49.63 | 58.93 | -9.30 | 7600000000 | Qwen2.5-Omni-7B, Qwen3-Embedding-0.6B | |
| Huang_JAIST_task5_1 | Huang | Japan Advanced Institute of Science and Technology | Huang_JAIST_t5_2026 | 49.60 | 64.84 | -15.24 | 8000000000 | Fun-Audio-Chat-8B | |
| Kim_SGU_task5_4 | Kim | Sogang University | Kim_SOGANG_t5_2026 | 49.13 | 7000000000 | MiMo-Audio-7B-Instruct | |||
| Kim_SGU_task5_3 | Kim | Sogang University | Kim_SOGANG_t5_2026 | 48.87 | 59.43 | -10.56 | 7000000000 | MiMo-Audio-7B-Instruct | |
| Tathe_UIUC_task5_3 | Tathe | University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab | Tathe_UIUC_t5_2026 | 48.67 | 56.32 | -7.65 | 8900000000 | Qwen2.5-Omni-7B | |
| Tathe_UIUC_task5_4 | Tathe | University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab | Tathe_UIUC_t5_2026 | 48.33 | 55.51 | -7.18 | 8900000000 | Qwen2.5-Omni-7B | |
| Kim_SGU_task5_2 | Kim | Sogang University | Kim_SOGANG_t5_2026 | 47.97 | 73.43 | -25.46 | 7000000000 | MiMo-Audio-7B-Instruct | |
| Wu_XMU_task5_1 | Wu | Xiamen University + Tsinghua University | Wu_XMU_t5_2026 | 47.20 | 53.90 | -6.70 | 7000000000 | Qwen2.5-Omni-7B | |
| Kim_SGU_task5_1 | Kim | Sogang University | Kim_SOGANG_t5_2026 | 46.77 | 72.76 | -25.99 | 7000000000 | MiMo-Audio-7B-Instruct | |
| ZC_Inst_task5_1 | Zheng | hangzhou dianzi university | Zheng_HDU_t5_2026 | 46.03 | 64.45 | -18.42 | 8000000000 | Fun-Audio-Chat-8B | |
| Guan_HEU_task5_1 | Guan | Harbin Engineering University + University of Technology Sydney | Xiao_HEU_t5_2026 | 43.37 | 53.08 | -9.71 | 8000000000 | Fun-Audio-Chat-8B | |
| Song_BIT_task5_1 | Song | Beijing Institute of Technology | Song_BIT_t5_2026 | 43.33 | 50.53 | -7.20 | 7000000000 | MiMo-Audio-7B-Instruct | |
| Song_BIT_task5_2 | Song | Beijing Institute of Technology | Song_BIT_t5_2026 | 41.53 | 48.10 | -6.57 | 7000000000 | MiMo-Audio-7B-Instruct | |
| Song_BIT_task5_3 | Song | Beijing Institute of Technology | Song_BIT_t5_2026 | 40.97 | 54.20 | -13.23 | 7000000000 | MiMo-Audio-7B-Instruct | |
| Xu_HUST_task5_1 | Xu | Huazhong University of Science and Technology | Xu_HUST_t5_2026 | 31.87 | 56.69 | -24.82 | 5400000000 | MOSS-Audio-4B-Instruct, DeBERTa-v3 | |
Ranking by base model
Here are listed all systems grouped by the base pretrained model used.
| Rank | Submission Information | Evaluation | Model | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Submission Code |
Corresponding Author |
Affiliation |
Technical Report |
Eval Accuracy |
Dev Accuracy |
Diff |
Base Model |
System Size |
|
| Huang_JAIST_task5_1 | Huang | Japan Advanced Institute of Science and Technology | Huang_JAIST_t5_2026 | 49.60 | 64.84 | -15.24 | Fun-Audio-Chat-8B | 8.0B | |
| ZC_Inst_task5_1 | Zheng | hangzhou dianzi university | Zheng_HDU_t5_2026 | 46.03 | 64.45 | -18.42 | Fun-Audio-Chat-8B | 8.0B | |
| Guan_HEU_task5_1 | Guan | Harbin Engineering University + University of Technology Sydney | Xiao_HEU_t5_2026 | 43.37 | 53.08 | -9.71 | Fun-Audio-Chat-8B | 8.0B | |
| Xu_HUST_task5_1 | Xu | Huazhong University of Science and Technology | Xu_HUST_t5_2026 | 31.87 | 56.69 | -24.82 | MOSS-Audio-4B-Instruct | 5.4B | |
| Lim_CAU_task5_4 | Lim | Chung-Ang University | Kim_CAU_t5_2026 | 58.33 | 70.50 | -12.17 | MOSS-Audio-8B-Thinking | 96.0B | |
| Lim_CAU_task5_3 | Lim | Chung-Ang University | Kim_CAU_t5_2026 | 58.10 | 70.01 | -11.91 | MOSS-Audio-8B-Thinking | 80.0B | |
| Lim_CAU_task5_1 | Lim | Chung-Ang University | Kim_CAU_t5_2026 | 57.30 | 69.63 | -12.33 | MOSS-Audio-8B-Thinking | 8.0B | |
| Lim_CAU_task5_2 | Lim | Chung-Ang University | Kim_CAU_t5_2026 | 57.07 | 69.69 | -12.63 | MOSS-Audio-8B-Thinking | 8.0B | |
| Hu_IOA_task5_4 | Hu | Institute of Acoustics, Chinese Academy of Sciences | Hu_IOA_t5_2026 | 57.03 | 67.70 | -10.67 | MOSS-Audio-8B-Thinking | 8.6B | |
| Hu_IOA_task5_3 | Hu | Institute of Acoustics, Chinese Academy of Sciences | Hu_IOA_t5_2026 | 56.70 | 66.02 | -9.32 | MOSS-Audio-8B-Thinking | 8.6B | |
| Yin_XJTLU_task5_4 | Yin | Xi'an Jiaotong-Liverpool University | Yin_XJTLU_t5_2026 | 55.80 | 66.40 | -10.60 | MOSS-Audio-8B-Thinking | 8.0B | |
| Yin_XJTLU_task5_2 | Yin | Xi'an Jiaotong-Liverpool University | Yin_XJTLU_t5_2026 | 55.60 | 64.97 | -9.37 | MOSS-Audio-8B-Thinking | 8.0B | |
| Yin_XJTLU_task5_3 | Yin | Xi'an Jiaotong-Liverpool University | Yin_XJTLU_t5_2026 | 55.27 | 66.46 | -11.19 | MOSS-Audio-8B-Thinking | 8.0B | |
| Cheng_Surrey_task5_1 | Cheng | University of Surrey + Tencent Holdings Limited | Cheng_SURREY_t5_2026 | 53.93 | 65.07 | -11.13 | MOSS-Audio-8B-Thinking | 8.0B | |
| Cheng_Surrey_task5_2 | Cheng | University of Surrey + Tencent Holdings Limited | Cheng_SURREY_t5_2026 | 53.50 | 65.01 | -11.51 | MOSS-Audio-8B-Thinking | 8.0B | |
| Zhang_WHU_task5_1 | Zhang | Wuhan University + The Chinese University of Hongkong, Shenzhen | Zhang_WHU_t5_2026 | 51.57 | 62.79 | -11.22 | MOSS-Audio-8B-Thinking | 8.0B | |
| Zhang_WHU_task5_2 | Zhang | Wuhan University + The Chinese University of Hongkong, Shenzhen | Zhang_WHU_t5_2026 | 51.13 | 62.79 | -11.66 | MOSS-Audio-8B-Thinking | 8.0B | |
| Kim_SGU_task5_4 | Kim | Sogang University | Kim_SOGANG_t5_2026 | 49.13 | MiMo-Audio-7B-Instruct | 7.0B | |||
| Kim_SGU_task5_3 | Kim | Sogang University | Kim_SOGANG_t5_2026 | 48.87 | 59.43 | -10.56 | MiMo-Audio-7B-Instruct | 7.0B | |
| Kim_SGU_task5_2 | Kim | Sogang University | Kim_SOGANG_t5_2026 | 47.97 | 73.43 | -25.46 | MiMo-Audio-7B-Instruct | 7.0B | |
| Kim_SGU_task5_1 | Kim | Sogang University | Kim_SOGANG_t5_2026 | 46.77 | 72.76 | -25.99 | MiMo-Audio-7B-Instruct | 7.0B | |
| Song_BIT_task5_1 | Song | Beijing Institute of Technology | Song_BIT_t5_2026 | 43.33 | 50.53 | -7.20 | MiMo-Audio-7B-Instruct | 7.0B | |
| Song_BIT_task5_2 | Song | Beijing Institute of Technology | Song_BIT_t5_2026 | 41.53 | 48.10 | -6.57 | MiMo-Audio-7B-Instruct | 7.0B | |
| Song_BIT_task5_3 | Song | Beijing Institute of Technology | Song_BIT_t5_2026 | 40.97 | 54.20 | -13.23 | MiMo-Audio-7B-Instruct | 7.0B | |
| Tathe_UIUC_task5_1 | Tathe | University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab | Tathe_UIUC_t5_2026 | 50.60 | 58.43 | -7.83 | Qwen2.5-Omni-7B | 8.9B | |
| Hu_IOA_task5_2 | Hu | Institute of Acoustics, Chinese Academy of Sciences | Hu_IOA_t5_2026 | 49.87 | 58.93 | -9.06 | Qwen2.5-Omni-7B | 7.6B | |
| Tathe_UIUC_task5_2 | Tathe | University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab | Tathe_UIUC_t5_2026 | 49.83 | 57.31 | -7.48 | Qwen2.5-Omni-7B | 8.9B | |
| Hu_IOA_task5_1 | Hu | Institute of Acoustics, Chinese Academy of Sciences | Hu_IOA_t5_2026 | 49.63 | 58.93 | -9.30 | Qwen2.5-Omni-7B | 7.6B | |
| Tathe_UIUC_task5_3 | Tathe | University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab | Tathe_UIUC_t5_2026 | 48.67 | 56.32 | -7.65 | Qwen2.5-Omni-7B | 8.9B | |
| Tathe_UIUC_task5_4 | Tathe | University of Illinois Urbana-Champaign + Carnegie Mellon University, WavLab | Tathe_UIUC_t5_2026 | 48.33 | 55.51 | -7.18 | Qwen2.5-Omni-7B | 8.9B | |
| Wu_XMU_task5_1 | Wu | Xiamen University + Tsinghua University | Wu_XMU_t5_2026 | 47.20 | 53.90 | -6.70 | Qwen2.5-Omni-7B | 7.0B | |
| Nam_IND_task5_2 | Nam | Independent researcher | Nam_IND_t5_2026 | 57.17 | Qwen3-Omni-30B-A3B-Instruct | 60.0B | |||
| Nam_IND_task5_4 | Nam | Independent researcher | Nam_IND_t5_2026 | 57.13 | Qwen3-Omni-30B-A3B-Instruct | 94.0B | |||
| Nam_IND_task5_3 | Nam | Independent researcher | Nam_IND_t5_2026 | 56.73 | Qwen3-Omni-30B-A3B-Instruct | 60.0B | |||
| Yin_XJTLU_task5_1 | Yin | Xi'an Jiaotong-Liverpool University | Yin_XJTLU_t5_2026 | 56.00 | 68.33 | -12.33 | Qwen3-Omni-30B-A3B-Instruct | 30.0B | |
| Nam_IND_task5_1 | Nam | Independent researcher | Nam_IND_t5_2026 | 55.90 | 67.27 | -11.37 | Qwen3-Omni-30B-A3B-Instruct | 60.0B | |
System characteristics
| Rank |
Submission code |
Technical Report |
Eval Accuracy | Parameters | Pretrained Model | System Size | Lightweight |
|---|---|---|---|---|---|---|---|
| 1 | Lim_CAU_task5_4 | Kim_CAU_t5_2026 | 58.33 | 96000000000 | MOSS-Audio-8B-Thinking, Qwen3-Omni-30B-A3B-Instruct | 96.0B | |
| 2 | Lim_CAU_task5_3 | Kim_CAU_t5_2026 | 58.10 | 80000000000 | MOSS-Audio-8B-Thinking | 80.0B | |
| 3 | Lim_CAU_task5_1 | Kim_CAU_t5_2026 | 57.30 | 8000000000 | MOSS-Audio-8B-Thinking | 8.0B | |
| 4 | Nam_IND_task5_2 | Nam_IND_t5_2026 | 57.17 | 60000000000 | Qwen3-Omni-30B-A3B-Instruct | 60.0B | |
| 5 | Nam_IND_task5_4 | Nam_IND_t5_2026 | 57.13 | 94000000000 | Qwen3-Omni-30B-A3B-Instruct, Gemma-4-E4B-it | 94.0B | |
| 6 | Lim_CAU_task5_2 | Kim_CAU_t5_2026 | 57.07 | 8000000000 | MOSS-Audio-8B-Thinking | 8.0B | |
| 7 | Hu_IOA_task5_4 | Hu_IOA_t5_2026 | 57.03 | 8600000000 | MOSS-Audio-8B-Thinking, Qwen3-Embedding-0.6B | 8.6B | |
| 8 | Nam_IND_task5_3 | Nam_IND_t5_2026 | 56.73 | 60000000000 | Qwen3-Omni-30B-A3B-Instruct | 60.0B | |
| 9 | Hu_IOA_task5_3 | Hu_IOA_t5_2026 | 56.70 | 8600000000 | MOSS-Audio-8B-Thinking, Qwen3-Embedding-0.6B | 8.6B | |
| 10 | Yin_XJTLU_task5_1 | Yin_XJTLU_t5_2026 | 56.00 | 30000000000 | Qwen3-Omni-30B-A3B-Instruct | 30.0B | |
| 11 | Nam_IND_task5_1 | Nam_IND_t5_2026 | 55.90 | 60000000000 | Qwen3-Omni-30B-A3B-Instruct | 60.0B | |
| 12 | Yin_XJTLU_task5_4 | Yin_XJTLU_t5_2026 | 55.80 | 8000000000 | MOSS-Audio-8B-Thinking | 8.0B | |
| 13 | Yin_XJTLU_task5_2 | Yin_XJTLU_t5_2026 | 55.60 | 8000000000 | MOSS-Audio-8B-Thinking | 8.0B | |
| 14 | Yin_XJTLU_task5_3 | Yin_XJTLU_t5_2026 | 55.27 | 8000000000 | MOSS-Audio-8B-Thinking | 8.0B | |
| 15 | Cheng_Surrey_task5_1 | Cheng_SURREY_t5_2026 | 53.93 | 8000000000 | MOSS-Audio-8B-Thinking | 8.0B | |
| 16 | Cheng_Surrey_task5_2 | Cheng_SURREY_t5_2026 | 53.50 | 8000000000 | MOSS-Audio-8B-Thinking | 8.0B | |
| Baseline_Qwen3-Omni-30B | 53.17 | 30000000000 | Qwen3-Omni-30B-A3B-Instruct | 30.0B | |||
| 17 | Zhang_WHU_task5_1 | Zhang_WHU_t5_2026 | 51.57 | 8000000000 | MOSS-Audio-8B-Thinking | 8.0B | |
| 18 | Zhang_WHU_task5_2 | Zhang_WHU_t5_2026 | 51.13 | 8000000000 | MOSS-Audio-8B-Thinking | 8.0B | |
| 19 | Tathe_UIUC_task5_1 | Tathe_UIUC_t5_2026 | 50.60 | 8900000000 | Qwen2.5-Omni-7B | 8.9B | |
| 20 | Hu_IOA_task5_2 | Hu_IOA_t5_2026 | 49.87 | 7600000000 | Qwen2.5-Omni-7B, Qwen3-Embedding-0.6B | 7.6B | |
| 21 | Tathe_UIUC_task5_2 | Tathe_UIUC_t5_2026 | 49.83 | 8900000000 | Qwen2.5-Omni-7B | 8.9B | |
| 22 | Hu_IOA_task5_1 | Hu_IOA_t5_2026 | 49.63 | 7600000000 | Qwen2.5-Omni-7B, Qwen3-Embedding-0.6B | 7.6B | |
| 23 | Huang_JAIST_task5_1 | Huang_JAIST_t5_2026 | 49.60 | 8000000000 | Fun-Audio-Chat-8B | 8.0B | |
| 24 | Kim_SGU_task5_4 | Kim_SOGANG_t5_2026 | 49.13 | 7000000000 | MiMo-Audio-7B-Instruct | 7.0B | |
| 25 | Kim_SGU_task5_3 | Kim_SOGANG_t5_2026 | 48.87 | 7000000000 | MiMo-Audio-7B-Instruct | 7.0B | |
| 26 | Tathe_UIUC_task5_3 | Tathe_UIUC_t5_2026 | 48.67 | 8900000000 | Qwen2.5-Omni-7B | 8.9B | |
| 27 | Tathe_UIUC_task5_4 | Tathe_UIUC_t5_2026 | 48.33 | 8900000000 | Qwen2.5-Omni-7B | 8.9B | |
| 28 | Kim_SGU_task5_2 | Kim_SOGANG_t5_2026 | 47.97 | 7000000000 | MiMo-Audio-7B-Instruct | 7.0B | |
| 29 | Wu_XMU_task5_1 | Wu_XMU_t5_2026 | 47.20 | 7000000000 | Qwen2.5-Omni-7B | 7.0B | |
| 30 | Kim_SGU_task5_1 | Kim_SOGANG_t5_2026 | 46.77 | 7000000000 | MiMo-Audio-7B-Instruct | 7.0B | |
| 31 | ZC_Inst_task5_1 | Zheng_HDU_t5_2026 | 46.03 | 8000000000 | Fun-Audio-Chat-8B | 8.0B | |
| 32 | Guan_HEU_task5_1 | Xiao_HEU_t5_2026 | 43.37 | 8000000000 | Fun-Audio-Chat-8B | 8.0B | |
| 33 | Song_BIT_task5_1 | Song_BIT_t5_2026 | 43.33 | 7000000000 | MiMo-Audio-7B-Instruct | 7.0B | |
| 34 | Song_BIT_task5_2 | Song_BIT_t5_2026 | 41.53 | 7000000000 | MiMo-Audio-7B-Instruct | 7.0B | |
| 35 | Song_BIT_task5_3 | Song_BIT_t5_2026 | 40.97 | 7000000000 | MiMo-Audio-7B-Instruct | 7.0B | |
| 36 | Xu_HUST_task5_1 | Xu_HUST_t5_2026 | 31.87 | 5400000000 | MOSS-Audio-4B-Instruct, DeBERTa-v3 | 5.4B |
Technical reports
Selective Multi-Modal RAG for DCASE 2026 Task 5: Audio-Dependent Question Answering
Yuelan Cheng, Jinzheng Zhao, Rong Wan, Peiwei Chang, Yongqiang Chen, Wenwu Wang
University of Surrey, Guildford, UK; Tencent Holdings Limited, Beijing, China
Cheng_Surrey_task5_1 Cheng_Surrey_task5_2
Selective Multi-Modal RAG for DCASE 2026 Task 5: Audio-Dependent Question Answering
Yuelan Cheng, Jinzheng Zhao, Rong Wan, Peiwei Chang, Yongqiang Chen, Wenwu Wang
University of Surrey, Guildford, UK; Tencent Holdings Limited, Beijing, China
Audio-Dependent Question Answering at the DCASE 2026 Challenge
Weiteng Hu, Yin Cao, Jun Yang
Institute of Acoustics, Chinese Academy of Sciences, Beijing, China
Hu_IOA_task5_1 Hu_IOA_task5_2 Hu_IOA_task5_3 Hu_IOA_task5_4
Audio-Dependent Question Answering at the DCASE 2026 Challenge
Weiteng Hu, Yin Cao, Jun Yang
Institute of Acoustics, Chinese Academy of Sciences, Beijing, China
Curriculum Learning for Audio-Dependent Question Answering: Technical Report for DCASE 2026 Task 5
Qixuan Huang, Yizhi Pan, Xiajie Zhou, Rui Li, Masashi Unoki
Japan Advanced Institute of Science and Technology, Nomi, Japan
Huang_JAIST_task5_1
Curriculum Learning for Audio-Dependent Question Answering: Technical Report for DCASE 2026 Task 5
Qixuan Huang, Yizhi Pan, Xiajie Zhou, Rui Li, Masashi Unoki
Japan Advanced Institute of Science and Technology, Nomi, Japan
Audio-Grounded Hard-Example Training with Acoustic Tagging for Audio-Dependent Question Answering
Hyun Jun Kim, Byeongchan Kim, Jung Chan Ryu, Yu Ra Kim, Yuri Oh, Bo Eun Choi, Changwon Lim, Il-Youp Kwak
Chung-Ang University, Seoul, Korea
Lim_CAU_task5_1 Lim_CAU_task5_2 Lim_CAU_task5_3 Lim_CAU_task5_4
Audio-Grounded Hard-Example Training with Acoustic Tagging for Audio-Dependent Question Answering
Hyun Jun Kim, Byeongchan Kim, Jung Chan Ryu, Yu Ra Kim, Yuri Oh, Bo Eun Choi, Changwon Lim, Il-Youp Kwak
Chung-Ang University, Seoul, Korea
Task-Leaf Routed MiMo-Audio for DCASE 2026 Task 5
Jongha Kim, Leehyeon Song, Hyung-Min Park
Sogang University, Seoul, Korea
Kim_SGU_task5_1 Kim_SGU_task5_2 Kim_SGU_task5_3 Kim_SGU_task5_4
Task-Leaf Routed MiMo-Audio for DCASE 2026 Task 5
Jongha Kim, Leehyeon Song, Hyung-Min Park
Sogang University, Seoul, Korea
Learning from Audio-Dependency Errors: Data Curation Strategies Based on Model Confusion Patterns in Audio Question Answering
Hyeonuk Nam
Independent researcher, Seoul, South Korea
Nam_IND_task5_1 Nam_IND_task5_2 Nam_IND_task5_3 Nam_IND_task5_4
Learning from Audio-Dependency Errors: Data Curation Strategies Based on Model Confusion Patterns in Audio Question Answering
Hyeonuk Nam
Independent researcher, Seoul, South Korea
Audio-Dependent Question Answering with Attention-Anchored Reinforcement Learning on MiMo-Audio
Hongjin Song
Beijing Institute of Technology, Beijing, China
Song_BIT_task5_1 Song_BIT_task5_2 Song_BIT_task5_3
Audio-Dependent Question Answering with Attention-Anchored Reinforcement Learning on MiMo-Audio
Hongjin Song
Beijing Institute of Technology, Beijing, China
DCASE 2026 Audio-Dependent Question Answering Task
Aniket Tathe
University of Illinois Urbana-Champaign, Urbana, Illinois, USA; Carnegie Mellon University, WavLab, Pittsburgh, Pennsylvania, USA
Tathe_UIUC_task5_1 Tathe_UIUC_task5_2 Tathe_UIUC_task5_3 Tathe_UIUC_task5_4
DCASE 2026 Audio-Dependent Question Answering Task
Aniket Tathe
University of Illinois Urbana-Champaign, Urbana, Illinois, USA; Carnegie Mellon University, WavLab, Pittsburgh, Pennsylvania, USA
Qwen2.5-Omni with All-Audio Audio-TaH for Audio-Dependent Question Answering
Chenglin Wu, Daiqing Wu, Yuxi Huang
Xiamen University, Xiamen, China; Tsinghua University, Beijing, China
Wu_XMU_task5_1
Qwen2.5-Omni with All-Audio Audio-TaH for Audio-Dependent Question Answering
Chenglin Wu, Daiqing Wu, Yuxi Huang
Xiamen University, Xiamen, China; Tsinghua University, Beijing, China
GISP@HEU's Submission for DCASE 2026 Task 5: A LoRA-SFT Fine-Tuned Audio-Dependent Question Answering System
Feiyang Xiao, Qiaoxi Zhu, Jian Guan
Harbin Engineering University, Harbin, China; University of Technology Sydney, Ultimo, Australia
Guan_HEU_task5_1
GISP@HEU's Submission for DCASE 2026 Task 5: A LoRA-SFT Fine-Tuned Audio-Dependent Question Answering System
Feiyang Xiao, Qiaoxi Zhu, Jian Guan
Harbin Engineering University, Harbin, China; University of Technology Sydney, Ultimo, Australia
Audio Question Answering at the DCASE 2026 Challenge
Haoran Xu, Rui Zhang
Huazhong University of Science and Technology, Wuhan, China
Xu_HUST_task5_1
Audio Question Answering at the DCASE 2026 Challenge
Haoran Xu, Rui Zhang
Huazhong University of Science and Technology, Wuhan, China
Training-Free Inference-Time Exploration for Audio-Dependent Question Answering
Zeyu Yin, Qi Cao, Pingsong Deng, Yizhou Tan, Shengchen Li
Xi'an Jiaotong-Liverpool University, Suzhou, China
Yin_XJTLU_task5_1 Yin_XJTLU_task5_2 Yin_XJTLU_task5_3 Yin_XJTLU_task5_4
Training-Free Inference-Time Exploration for Audio-Dependent Question Answering
Zeyu Yin, Qi Cao, Pingsong Deng, Yizhou Tan, Shengchen Li
Xi'an Jiaotong-Liverpool University, Suzhou, China
Structured Audio Reasoning and Robust Multi-Sample Inference for DCASE 2026 Audio-Dependent Question Answering Challenge
Yucong Zhang, Juan Liu, Ming Li
Wuhan University, Hubei, China; The Chinese University of Hongkong, Shenzhen, Guangdong, China
Zhang_WHU_task5_1 Zhang_WHU_task5_2
Structured Audio Reasoning and Robust Multi-Sample Inference for DCASE 2026 Audio-Dependent Question Answering Challenge
Yucong Zhang, Juan Liu, Ming Li
Wuhan University, Hubei, China; The Chinese University of Hongkong, Shenzhen, Guangdong, China