Low-Complexity Acoustic Scene Classification with Device Information


Challenge results

Task description

The goal of the acoustic scene classification task is to classify recordings into one of the ten predefined acoustic scene classes. This task continues the Acoustic Scene Classification tasks from previous editions of the DCASE Challenge, with a slight shift of focus. This year, the task concentrates on three challenging aspects: (1) a recording device mismatch, (2) low-complexity constraints, and (3) the limited availability of labeled data for training.

More detailed task description can be found in the task description page

Teams ranking

Submission information Rank Accuracy
(maximum among entries)
Rank Submission label Name Technical
Report
Official
rank
Accuracy
Chang_HYU mel Han2025 5 58.98
Chen_GXU CD Chen2025 9 56.63
Han_CSU KDTF-SepN Han2025a 13 32.58
Jeong_SEOULTECH DAFA-TE Jeong2025 8 57.86
Karasin_JKU MALACH25_4 Karasin2025 1 61.47
Krishna_SRIB SRIB-Team Gurugubelli2025 10 56.06
Li_NTU S2 Li2025 6 58.85
Luo_CQUPT DynaCP Luo2025 3 59.58
Ramezanee_SUT Sharif Ramezanee2025 7 57.92
DCASE2025 baseline Baseline 12 53.24
Tan_SNTLNTU SNTLNTU_T1_1 Tan2025 2 59.94
Zhang_AITHU-SJTU Agp_c96_s1 Zhang2025 4 59.28
Zhou_XJTLU Baseline Ziyang2025 11 55.52

Systems ranking

Submission information Evaluation dataset Development dataset
Accuracy
Rank Submission label Name Technical
Report
Official
system rank
Memory
rank
MACs
rank
Overall
with 95% confidence interval
Known
devices
Unknown
devices
Logloss Accuracy
Chang_HYU_task1_1 base Han2025 17 7 4 58.1 (57.9 - 58.4) 60.6 53.3 3.182 57.4
Chang_HYU_task1_2 mel Han2025 10 7 10 59.0 (58.7 - 59.2) 61.4 54.1 3.095 58.2
Chang_HYU_task1_3 hop Han2025 15 7 13 58.7 (58.4 - 58.9) 60.7 54.5 3.016 58.3
Chang_HYU_task1_4 hop_mel Han2025 16 7 13 58.6 (58.4 - 58.8) 60.8 54.2 3.210 57.6
Chen_GXU_task1_1 CD Chen2025 21 3 11 56.6 (56.4 - 56.9) 58.5 52.8 1.566 56.5
Chen_GXU_task1_2 CD Chen2025 22 3 11 56.5 (56.3 - 56.8) 58.4 52.9 1.411 57.7
Chen_GXU_task1_3 CD Chen2025 26 3 11 55.3 (55.0 - 55.5) 57.0 51.7 1.994 56.1
Han_CSU_task1_1 TF-SepN Han2025a 31 3 1 26.4 (26.2 - 26.7) 26.6 26.2 2.796 55.0
Han_CSU_task1_2 KDTF-SepN Han2025a 32 3 1 25.1 (24.9 - 25.3) 25.3 24.7 2.289 51.3
Han_CSU_task1_3 KDTF-SepN Han2025a 29 3 1 32.6 (32.3 - 32.8) 33.2 31.4 1.879 54.1
Han_CSU_task1_4 KDTF-SepN Han2025a 30 3 1 30.9 (30.7 - 31.1) 31.5 29.6 1.923 51.2
Jeong_SEOULTECH_task1_1 DAFA-TE Jeong2025 20 3 5 56.9 (56.6 - 57.1) 60.1 50.3 1.379 54.6
Jeong_SEOULTECH_task1_2 DAFA-TE Jeong2025 19 3 5 57.9 (57.6 - 58.1) 60.3 53.0 1.362 56.0
Karasin_JKU_task1_1 MALACH25_1 Karasin2025 2 3 11 61.4 (61.1 - 61.6) 64.0 56.2 1.105 60.5
Karasin_JKU_task1_2 MALACH25_2 Karasin2025 4 3 11 60.1 (59.9 - 60.4) 62.1 56.1 1.177 57.5
Karasin_JKU_task1_3 MALACH25_3 Karasin2025 3 3 11 60.3 (60.0 - 60.5) 62.8 55.2 1.155 59.0
Karasin_JKU_task1_4 MALACH25_4 Karasin2025 1 3 11 61.5 (61.2 - 61.7) 64.1 56.2 1.102 60.5
Krishna_SRIB_task1_1 SRIB-Team Gurugubelli2025 23 4 6 56.1 (55.8 - 56.3) 58.2 51.8 2.515 56.4
Li_NTU_task1_1 S1 Li2025 14 3 3 58.7 (58.5 - 59.0) 60.6 55.1 1.133 58.8
Li_NTU_task1_2 S2 Li2025 12 3 3 58.8 (58.6 - 59.1) 60.5 55.5 1.128 59.3
Luo_CQUPT_task1_1 DynaCP Luo2025 6 5 8 59.6 (59.3 - 59.8) 61.9 55.0 1.616 59.0
Ramezanee_SUT_task1_1 Sharif Ramezanee2025 27 6 7 54.6 (54.3 - 54.8) 54.8 54.1 1.318 58.2
Ramezanee_SUT_task1_2 SUT Ramezanee2025 25 6 7 55.5 (55.3 - 55.7) 56.2 54.1 1.262 58.2
Ramezanee_SUT_task1_3 Sharif Ramezanee2025 18 6 7 57.9 (57.7 - 58.2) 59.8 54.1 1.176 58.2
DCASE2025 baseline Baseline 28 3 11 53.2 (53.0 - 53.5) 55.4 49.0 1.686 50.7
Tan_SNTLNTU_task1_1 SNTLNTU_T1_1 Tan2025 5 1 2 59.9 (59.7 - 60.2) 62.2 55.4 1.136 60.4
Tan_SNTLNTU_task1_2 SNTLNTU_T1_2 Tan2025 9 2 2 59.0 (58.8 - 59.3) 61.6 53.9 1.179 60.2
Zhang_AITHU-SJTU_task1_1 Agp_c64_s1 Zhang2025 13 10 14 58.8 (58.5 - 59.0) 60.7 54.8 1.118 59.0
Zhang_AITHU-SJTU_task1_2 Agp_c64_s2 Zhang2025 11 10 14 58.9 (58.7 - 59.2) 60.9 55.0 1.110 58.5
Zhang_AITHU-SJTU_task1_3 Agp_c96_s1 Zhang2025 7 8 9 59.3 (59.0 - 59.5) 60.8 56.2 1.108 59.0
Zhang_AITHU-SJTU_task1_4 Agp_c96_s2 Zhang2025 8 8 9 59.3 (59.0 - 59.5) 60.8 56.1 1.105 58.7
Zhou_XJTLU_task1_1 Baseline Ziyang2025 24 9 12 55.5 (55.3 - 55.8) 60.0 46.6 1.232 58.5

System complexity

Submission information Accuracy / Evaluation dataset Acoustic model System
Rank Submission label Technical
Report
System
rank
Overall Known
devices
Unknown
devices
Size MACS Parameters Complexity
management
Chang_HYU_task1_1 Han2025 17 58.1 60.6 53.3 125156 18758084 62578 precision_16, network design, knowledge distillation
Chang_HYU_task1_2 Han2025 10 59.0 61.4 54.1 125156 29302844 62578 precision_16, network design, knowledge distillation
Chang_HYU_task1_3 Han2025 15 58.7 60.7 54.5 125156 29512940 62578 precision_16, network design, knowledge distillation
Chang_HYU_task1_4 Han2025 16 58.6 60.8 54.2 125156 29512940 62578 precision_16, network design, knowledge distillation
Chen_GXU_task1_1 Chen2025 21 56.6 58.5 52.8 122296 29419156 61148 knowledge distillation, precision_16
Chen_GXU_task1_2 Chen2025 22 56.5 58.4 52.9 122296 29419156 61148 knowledge distillation, precision_16
Chen_GXU_task1_3 Chen2025 26 55.3 57.0 51.7 122296 29419156 61148 knowledge distillation, precision_16
Han_CSU_task1_1 Han2025a 31 26.4 26.6 26.2 122296 298637 61148 network design
Han_CSU_task1_2 Han2025a 32 25.1 25.3 24.7 122296 298637 61148 knowledge distillation
Han_CSU_task1_3 Han2025a 29 32.6 33.2 31.4 122296 298637 61148 knowledge distillation
Han_CSU_task1_4 Han2025a 30 30.9 31.5 29.6 122296 298637 61148 knowledge distillation
Jeong_SEOULTECH_task1_1 Jeong2025 20 56.9 60.1 50.3 122296 26059412 61148 knowledge distillation
Jeong_SEOULTECH_task1_2 Jeong2025 19 57.9 60.3 53.0 122296 26059412 61148 knowledge distillation
Karasin_JKU_task1_1 Karasin2025 2 61.4 64.0 56.2 122296 29419156 61148 precision_16, network design, knowledge distillation
Karasin_JKU_task1_2 Karasin2025 4 60.1 62.1 56.1 122296 29419156 61148 precision_16, network design, knowledge distillation
Karasin_JKU_task1_3 Karasin2025 3 60.3 62.8 55.2 122296 29419156 61148 precision_16, network design, knowledge distillation
Karasin_JKU_task1_4 Karasin2025 1 61.5 64.1 56.2 122296 29419156 61148 precision_16, network design, knowledge distillation
Krishna_SRIB_task1_1 Gurugubelli2025 23 56.1 58.2 51.8 122320 27862676 61160 precision_16, network design
Li_NTU_task1_1 Li2025 14 58.7 60.6 55.1 122296 17050260 61160 knowledge distillation, network design, precision_16
Li_NTU_task1_2 Li2025 12 58.8 60.5 55.5 122296 17050260 61160 knowledge distillation, network design
Luo_CQUPT_task1_1 Luo2025 6 59.6 61.9 55.0 123300 28938900 61650 knowledge distillation, precision_16, network design
Ramezanee_SUT_task1_1 Ramezanee2025 27 54.6 54.8 54.1 125040 28642220 31260 network design, knowledge distillation, pruning, reparametrization
Ramezanee_SUT_task1_2 Ramezanee2025 25 55.5 56.2 54.1 125040 28642220 31260 network design, knowledge distillation, pruning, reparametrization
Ramezanee_SUT_task1_3 Ramezanee2025 18 57.9 59.8 54.1 125040 28642220 31260 network design, knowledge distillation, pruning, reparametrization
DCASE2025 baseline 28 53.2 55.4 49.0 122296 29419156 61148 precision_16, network design
Tan_SNTLNTU_task1_1 Tan2025 5 59.9 62.2 55.4 116342 10902300 116342 precision_16, network design
Tan_SNTLNTU_task1_2 Tan2025 9 59.0 61.6 53.9 117210 10902300 117210 precision_16, network design
Zhang_AITHU-SJTU_task1_1 Zhang2025 13 58.8 60.7 54.8 127496 29982132 63748 precision_16, network design, knowledge distillation, pruning
Zhang_AITHU-SJTU_task1_2 Zhang2025 11 58.9 60.9 55.0 127496 29982132 63748 precision_16, network design, knowledge distillation, pruning
Zhang_AITHU-SJTU_task1_3 Zhang2025 7 59.3 60.8 56.2 126430 29221122 63215 precision_16, network design, knowledge distillation, pruning
Zhang_AITHU-SJTU_task1_4 Zhang2025 8 59.3 60.8 56.1 126430 29221122 63215 precision_16, network design, knowledge distillation, pruning
Zhou_XJTLU_task1_1 Ziyang2025 24 55.5 60.0 46.6 126858 29419648 126858 network design, weight quantization, knowledge distillation


Generalization performance

All results with evaluation dataset.

Class-wise performance

Overall Split
Rank Submission label Technical
Report
System
rank
Accuracy Airport Bus Metro Metro
station
Park Public
square
Shopping
mall
Street
pedestrian
Street
traffic
Tram
Chang_HYU_task1_1 Han2025 17 58.1 42.6 76.8 59.3 52.1 80.4 34.2 59.3 38.2 74.5 64.1
Chang_HYU_task1_2 Han2025 10 59.0 43.7 78.4 57.6 53.8 81.0 37.3 61.9 36.2 75.7 64.2
Chang_HYU_task1_3 Han2025 15 58.7 43.9 74.4 59.1 54.5 80.6 35.6 61.4 35.5 75.6 65.9
Chang_HYU_task1_4 Han2025 16 58.6 44.9 77.5 59.3 50.6 80.3 35.6 62.9 38.8 74.5 61.5
Chen_GXU_task1_1 Chen2025 21 56.6 36.8 77.9 54.9 50.4 80.1 36.0 63.1 33.5 74.3 59.3
Chen_GXU_task1_2 Chen2025 22 56.5 39.9 72.4 54.7 50.9 87.5 33.4 58.7 30.6 72.8 64.5
Chen_GXU_task1_3 Chen2025 26 55.3 37.3 72.0 54.1 44.9 82.5 28.8 59.4 38.8 71.6 63.4
Han_CSU_task1_1 Han2025a 31 26.4 4.6 38.8 3.2 25.8 16.7 10.7 17.8 28.2 78.2 40.4
Han_CSU_task1_2 Han2025a 32 25.1 19.2 61.2 52.3 9.9 14.6 8.0 9.5 28.6 26.3 21.4
Han_CSU_task1_3 Han2025a 29 32.6 12.0 26.9 18.0 32.6 30.5 25.5 49.8 35.9 66.6 28.0
Han_CSU_task1_4 Han2025a 30 30.9 37.4 24.6 45.9 34.7 25.8 11.3 50.7 41.5 11.1 25.9
Jeong_SEOULTECH_task1_1 Jeong2025 20 56.9 44.8 68.8 53.3 48.8 85.9 39.0 67.5 24.7 76.6 59.0
Jeong_SEOULTECH_task1_2 Jeong2025 19 57.9 48.1 68.2 50.6 49.4 84.0 41.5 66.8 30.3 77.3 62.3
Karasin_JKU_task1_1 Karasin2025 2 61.4 51.9 83.5 61.5 50.7 87.6 35.7 68.0 33.5 77.4 64.2
Karasin_JKU_task1_2 Karasin2025 4 60.1 50.0 81.5 58.3 48.0 85.4 37.8 63.8 33.2 77.6 65.8
Karasin_JKU_task1_3 Karasin2025 3 60.3 45.3 76.7 59.9 49.0 85.7 33.6 70.9 33.2 79.6 68.7
Karasin_JKU_task1_4 Karasin2025 1 61.5 52.6 83.2 61.8 50.4 87.6 35.9 67.6 33.5 77.5 64.4
Krishna_SRIB_task1_1 Gurugubelli2025 23 56.1 40.0 76.4 55.1 46.7 81.5 32.4 57.2 35.6 75.3 60.3
Li_NTU_task1_1 Li2025 14 58.7 44.6 70.2 59.8 52.2 85.7 36.6 66.6 31.4 75.1 65.2
Li_NTU_task1_2 Li2025 12 58.8 39.8 73.7 58.0 53.4 84.0 39.9 69.6 31.3 74.5 64.3
Luo_CQUPT_task1_1 Luo2025 6 59.6 44.8 77.3 58.8 54.5 84.9 37.7 61.1 37.6 76.5 62.6
Ramezanee_SUT_task1_1 Ramezanee2025 27 54.6 46.4 82.3 48.6 45.9 85.4 31.8 54.9 28.4 69.2 52.7
Ramezanee_SUT_task1_2 Ramezanee2025 25 55.5 44.6 83.0 47.8 51.6 84.3 34.1 57.2 26.3 70.6 55.5
Ramezanee_SUT_task1_3 Ramezanee2025 18 57.9 43.0 82.8 49.7 51.9 83.7 38.6 61.1 34.6 68.3 65.4
DCASE2025 baseline 28 53.2 40.5 69.7 47.2 42.1 79.8 36.1 53.5 34.8 74.8 53.9
Tan_SNTLNTU_task1_1 Tan2025 5 59.9 50.6 83.4 54.8 47.0 85.4 37.7 63.9 35.9 70.6 70.0
Tan_SNTLNTU_task1_2 Tan2025 9 59.0 49.0 78.9 58.7 50.6 82.6 36.1 59.4 38.7 70.0 66.6
Zhang_AITHU-SJTU_task1_1 Zhang2025 13 58.8 46.5 70.3 52.5 53.3 84.2 34.9 65.1 36.1 75.7 68.9
Zhang_AITHU-SJTU_task1_2 Zhang2025 11 58.9 43.6 72.1 53.3 49.8 83.2 36.4 65.5 40.0 78.9 66.7
Zhang_AITHU-SJTU_task1_3 Zhang2025 7 59.3 50.7 66.4 54.0 51.1 85.2 35.3 66.3 37.7 77.2 68.9
Zhang_AITHU-SJTU_task1_4 Zhang2025 8 59.3 47.8 68.3 53.2 52.0 87.0 32.8 70.6 34.1 78.3 68.4
Zhou_XJTLU_task1_1 Ziyang2025 24 55.5 41.1 68.8 60.1 48.4 65.6 34.9 67.1 33.9 74.9 60.5

Device-wise performance

Overall Devices
Split Unseen devices Seen devices
Rank Submission label Technical
Report
System
rank
Accuracy Accuracy /
Unseen
Accuracy /
Seen
D S7 S8 S9 S10 A B C S1 S2 S3
Chang_HYU_task1_1 Han2025 17 58.1 53.3 60.6 44.3 57.9 57.3 52.4 54.6 67.3 60.9 61.7 57.1 56.8 59.5
Chang_HYU_task1_2 Han2025 10 59.0 54.1 61.4 46.8 58.5 56.1 53.9 55.3 67.0 61.9 62.2 57.3 59.1 61.1
Chang_HYU_task1_3 Han2025 15 58.7 54.5 60.7 48.1 58.8 57.7 54.0 53.8 67.3 60.4 61.8 57.2 57.8 59.9
Chang_HYU_task1_4 Han2025 16 58.6 54.2 60.8 46.3 58.2 57.6 53.2 55.5 67.2 60.0 61.8 58.0 57.8 60.0
Chen_GXU_task1_1 Chen2025 21 56.6 52.8 58.5 48.8 57.7 56.9 46.6 54.2 67.9 60.6 61.5 53.5 50.9 56.8
Chen_GXU_task1_2 Chen2025 22 56.5 52.9 58.4 46.4 57.0 55.6 49.3 56.1 68.2 60.6 61.1 53.2 50.1 57.1
Chen_GXU_task1_3 Chen2025 26 55.3 51.7 57.0 48.4 54.6 56.6 46.5 52.6 67.8 59.3 59.1 52.4 49.8 53.9
Han_CSU_task1_1 Han2025a 31 26.4 26.2 26.6 27.0 27.7 25.5 24.1 26.5 29.5 29.4 27.1 25.6 22.4 25.5
Han_CSU_task1_2 Han2025a 32 25.1 24.7 25.3 26.6 24.6 24.6 23.4 24.5 28.2 27.2 26.7 22.3 22.8 24.7
Han_CSU_task1_3 Han2025a 29 32.6 31.4 33.2 36.5 31.6 29.3 29.4 30.3 41.1 34.1 38.9 30.4 26.1 28.3
Han_CSU_task1_4 Han2025a 30 30.9 29.6 31.5 34.2 28.9 27.2 29.3 28.5 39.7 33.8 38.1 27.5 23.1 26.8
Jeong_SEOULTECH_task1_1 Jeong2025 20 56.9 50.3 60.1 39.0 54.2 53.6 53.1 51.3 68.8 60.0 63.0 54.5 55.5 59.2
Jeong_SEOULTECH_task1_2 Jeong2025 19 57.9 53.0 60.3 44.7 57.2 54.2 54.8 54.2 68.7 60.5 63.7 55.4 54.6 58.8
Karasin_JKU_task1_1 Karasin2025 2 61.4 56.2 64.0 47.5 60.9 58.0 58.3 56.1 70.6 64.6 65.4 59.7 59.8 63.9
Karasin_JKU_task1_2 Karasin2025 4 60.1 56.1 62.1 49.6 59.3 56.9 58.0 56.9 70.0 62.0 63.1 58.1 57.7 61.7
Karasin_JKU_task1_3 Karasin2025 3 60.3 55.2 62.8 45.3 59.6 55.5 58.2 57.1 69.1 62.3 64.8 59.6 58.2 63.0
Karasin_JKU_task1_4 Karasin2025 1 61.5 56.2 64.1 47.5 60.9 58.0 58.3 56.1 70.6 64.6 65.7 60.2 59.8 63.9
Krishna_SRIB_task1_1 Gurugubelli2025 23 56.1 51.8 58.2 45.0 56.0 54.9 48.6 54.4 66.8 59.4 61.8 52.4 52.0 56.9
Li_NTU_task1_1 Li2025 14 58.7 55.1 60.6 51.3 58.6 58.6 52.2 54.7 67.0 59.7 62.2 57.5 56.8 60.3
Li_NTU_task1_2 Li2025 12 58.8 55.5 60.5 53.0 58.6 58.2 52.7 54.7 66.4 59.9 62.1 57.9 56.7 60.2
Luo_CQUPT_task1_1 Luo2025 6 59.6 55.0 61.9 42.1 59.7 60.1 56.4 56.7 70.7 62.1 64.7 57.3 56.7 59.6
Ramezanee_SUT_task1_1 Ramezanee2025 27 54.6 54.1 54.8 41.4 59.6 58.0 56.1 55.3 64.2 53.9 53.0 52.5 50.4 54.9
Ramezanee_SUT_task1_2 Ramezanee2025 25 55.5 54.1 56.2 41.4 59.6 58.0 56.1 55.3 63.7 54.8 55.4 54.4 52.3 56.8
Ramezanee_SUT_task1_3 Ramezanee2025 18 57.9 54.1 59.8 41.4 59.6 58.0 56.1 55.3 65.6 59.2 58.8 57.3 58.4 59.8
DCASE2025 baseline 28 53.2 49.0 55.4 47.5 51.6 48.8 45.3 51.7 64.8 57.2 59.9 48.9 48.7 52.7
Tan_SNTLNTU_task1_1 Tan2025 5 59.9 55.4 62.2 49.3 58.6 59.2 52.2 57.5 67.8 61.3 64.5 59.9 59.2 60.7
Tan_SNTLNTU_task1_2 Tan2025 9 59.0 53.9 61.6 44.6 58.7 59.2 50.0 56.9 67.7 60.6 63.4 59.5 57.4 61.1
Zhang_AITHU-SJTU_task1_1 Zhang2025 13 58.8 54.8 60.7 46.7 58.6 57.4 55.2 56.4 69.2 60.0 62.7 56.6 55.0 61.0
Zhang_AITHU-SJTU_task1_2 Zhang2025 11 58.9 55.0 60.9 48.7 58.5 57.5 53.5 56.8 69.6 60.8 62.8 55.9 55.7 60.8
Zhang_AITHU-SJTU_task1_3 Zhang2025 7 59.3 56.2 60.8 51.7 58.9 57.4 55.6 57.2 69.3 59.8 62.4 56.5 55.9 61.0
Zhang_AITHU-SJTU_task1_4 Zhang2025 8 59.3 56.1 60.8 52.2 59.0 58.4 54.8 56.0 69.4 60.8 62.6 55.9 56.0 60.4
Zhou_XJTLU_task1_1 Ziyang2025 24 55.5 46.6 60.0 49.3 51.5 50.0 42.7 39.7 67.5 60.7 60.5 55.3 57.7 58.1

Cities

Submission information Overall Split
Rank Submission label Technical
Report
System
rank
Accuracy Unseen Seen
Chang_HYU_task1_1 Han2025 17 58.1 56.06 58.59
Chang_HYU_task1_2 Han2025 10 59.0 58.13 59.18
Chang_HYU_task1_3 Han2025 15 58.7 57.80 58.86
Chang_HYU_task1_4 Han2025 16 58.6 56.79 58.98
Chen_GXU_task1_1 Chen2025 21 56.6 54.71 57.05
Chen_GXU_task1_2 Chen2025 22 56.5 54.38 57.01
Chen_GXU_task1_3 Chen2025 26 55.3 53.89 55.59
Han_CSU_task1_1 Han2025a 31 26.4 28.12 26.11
Han_CSU_task1_2 Han2025a 32 25.1 25.57 25.02
Han_CSU_task1_3 Han2025a 29 32.6 33.91 32.31
Han_CSU_task1_4 Han2025a 30 30.9 32.42 30.57
Jeong_SEOULTECH_task1_1 Jeong2025 20 56.9 56.03 57.06
Jeong_SEOULTECH_task1_2 Jeong2025 19 57.9 57.03 58.07
Karasin_JKU_task1_1 Karasin2025 2 61.4 61.95 61.30
Karasin_JKU_task1_2 Karasin2025 4 60.1 60.10 60.17
Karasin_JKU_task1_3 Karasin2025 3 60.3 60.95 60.16
Karasin_JKU_task1_4 Karasin2025 1 61.5 62.06 61.38
Krishna_SRIB_task1_1 Gurugubelli2025 23 56.1 54.69 56.37
Li_NTU_task1_1 Li2025 14 58.7 57.92 58.94
Li_NTU_task1_2 Li2025 12 58.8 58.52 58.94
Luo_CQUPT_task1_1 Luo2025 6 59.6 57.91 59.96
Ramezanee_SUT_task1_1 Ramezanee2025 27 54.6 53.90 54.74
Ramezanee_SUT_task1_2 Ramezanee2025 25 55.5 54.72 55.69
Ramezanee_SUT_task1_3 Ramezanee2025 18 57.9 56.96 58.15
DCASE2025 baseline 28 53.2 52.95 53.33
Tan_SNTLNTU_task1_1 Tan2025 5 59.9 58.45 60.28
Tan_SNTLNTU_task1_2 Tan2025 9 59.0 58.85 59.11
Zhang_AITHU-SJTU_task1_1 Zhang2025 13 58.8 57.87 58.98
Zhang_AITHU-SJTU_task1_2 Zhang2025 11 58.9 58.39 59.09
Zhang_AITHU-SJTU_task1_3 Zhang2025 7 59.3 58.32 59.51
Zhang_AITHU-SJTU_task1_4 Zhang2025 8 59.3 58.18 59.51
Zhou_XJTLU_task1_1 Ziyang2025 24 55.5 55.60 55.53

System characteristics

General characteristics

Rank Submission label Technical
Report
Rank Accuracy Sampling
rate
Data
augmentation
Features
Chang_HYU_task1_1 Han2025 17 58.1 32kHz freq-mixstyle, frequency masking, time rolling, DIR log-mel energies
Chang_HYU_task1_2 Han2025 10 59.0 32kHz freq-mixstyle, frequency masking, time rolling, DIR log-mel energies
Chang_HYU_task1_3 Han2025 15 58.7 32kHz freq-mixstyle, frequency masking, time rolling, DIR log-mel energies
Chang_HYU_task1_4 Han2025 16 58.6 32kHz freq-mixstyle, frequency masking, time rolling, DIR log-mel energies
Chen_GXU_task1_1 Chen2025 21 56.6 32kHz freq-mixstyle, time rolling log-mel energies
Chen_GXU_task1_2 Chen2025 22 56.5 32kHz freq-mixstyle, time rolling log-mel energies
Chen_GXU_task1_3 Chen2025 26 55.3 32kHz freq-mixstyle, time rolling log-mel energies
Han_CSU_task1_1 Han2025a 31 26.4 44.1kHz MixUp, MixStyle, SpecAug, FiltAug, AddNoise, FrameShift,TimeMask, FreqMask, DIR log-mel spectrogram
Han_CSU_task1_2 Han2025a 32 25.1 44.1kHz MixUp, MixStyle, SpecAug, FiltAug, AddNoise, FrameShift,TimeMask, FreqMask, DIR log-mel spectrogram
Han_CSU_task1_3 Han2025a 29 32.6 44.1kHz MixUp, MixStyle, SpecAug, FiltAug, AddNoise, FrameShift,TimeMask, FreqMask, DIR log-mel spectrogram
Han_CSU_task1_4 Han2025a 30 30.9 44.1kHz MixUp, MixStyle, SpecAug, FiltAug, AddNoise, FrameShift, TimeMask, FreqMask, DIR log-mel spectrogram
Jeong_SEOULTECH_task1_1 Jeong2025 20 56.9 44.1kHz freq-mixstyle, mixup log-mel energies
Jeong_SEOULTECH_task1_2 Jeong2025 19 57.9 44.1kHz freq-mixstyle, mixup log-mel energies
Karasin_JKU_task1_1 Karasin2025 2 61.4 32kHz freq-mixstyle, DIR, time masking, frequency masking, time rolling log-mel energies
Karasin_JKU_task1_2 Karasin2025 4 60.1 32kHz freq-mixstyle, DIR, time masking, frequency masking, time rolling log-mel energies
Karasin_JKU_task1_3 Karasin2025 3 60.3 32kHz freq-mixstyle, DIR, time masking, frequency masking, time rolling log-mel energies
Karasin_JKU_task1_4 Karasin2025 1 61.5 32kHz freq-mixstyle, DIR, time masking, frequency masking, time rolling log-mel energies
Krishna_SRIB_task1_1 Gurugubelli2025 23 56.1 32kHz freq-mixstyle, frequency masking, time rolling log-mel energies
Li_NTU_task1_1 Li2025 14 58.7 32kHz freq-mixstyle, time rolling, DIR log-mel energies
Li_NTU_task1_2 Li2025 12 58.8 32kHz freq-mixstyle, time rolling, DIR log-mel energies
Luo_CQUPT_task1_1 Luo2025 6 59.6 44.1kHz freq-mixstyle, pitch shifting, time rolling log-mel energies
Ramezanee_SUT_task1_1 Ramezanee2025 27 54.6 32kHz freq-mixstyle, frequency masking, time masking, random noise, random gain, DIR log-mel energies
Ramezanee_SUT_task1_2 Ramezanee2025 25 55.5 32kHz freq-mixstyle, frequency masking, time masking, random noise, random gain, DIR log-mel energies
Ramezanee_SUT_task1_3 Ramezanee2025 18 57.9 32kHz freq-mixstyle, frequency masking, time masking, random noise, random gain, DIR log-mel energies
DCASE2025 baseline 28 53.2 32kHz freq-mixstyle, pitch shifting, time rolling log-mel energies
Tan_SNTLNTU_task1_1 Tan2025 5 59.9 44.1kHz freq-mixstyle, DIR, SpecAug log-mel energies
Tan_SNTLNTU_task1_2 Tan2025 9 59.0 44.1kHz freq-mixstyle, DIR, SpecAug log-mel energies
Zhang_AITHU-SJTU_task1_1 Zhang2025 13 58.8 32kHz freq-mixstyle, frequency masking, time masking, time rolling log-mel energies
Zhang_AITHU-SJTU_task1_2 Zhang2025 11 58.9 32kHz freq-mixstyle, frequency masking, time masking, time rolling log-mel energies
Zhang_AITHU-SJTU_task1_3 Zhang2025 7 59.3 32kHz freq-mixstyle, frequency masking, time masking, time rolling log-mel energies
Zhang_AITHU-SJTU_task1_4 Zhang2025 8 59.3 32kHz freq-mixstyle, frequency masking, time masking, time rolling log-mel energies
Zhou_XJTLU_task1_1 Ziyang2025 24 55.5 32kHz mixup, freq-mixstyle, DIR log-mel spectrogram



Machine learning characteristics

Rank Code Technical
Report
Rank Accuracy External
data usage
External
data sources
Model
complexity
Model
MACS
Classifier Framework Pipeline Device
information
Number of
models
Model
weight
sharing
Chang_HYU_task1_1 Han2025 17 58.1 pre-trained model MicIRP, PaSST 62578 18758084 RF-regularized CNN, CTFAttention pytorch train teachers, ensemble teachers, train general student model with knowledge distillation, device-specific end-to-end fine-tuning per-device end-to-end fine-tuning 7 fully device-specific
Chang_HYU_task1_2 Han2025 10 59.0 pre-trained model MicIRP, PaSST 62578 29302844 RF-regularized CNN, CTFAttention pytorch train teachers, ensemble teachers, train general student model with knowledge distillation, device-specific end-to-end fine-tuning per-device end-to-end fine-tuning 7 fully device-specific
Chang_HYU_task1_3 Han2025 15 58.7 pre-trained model MicIRP, PaSST 62578 29512940 RF-regularized CNN, CTFAttention pytorch train teachers, ensemble teachers, train general student model with knowledge distillation, device-specific end-to-end fine-tuning per-device end-to-end fine-tuning 7 fully device-specific
Chang_HYU_task1_4 Han2025 16 58.6 pre-trained model MicIRP, PaSST 62578 29512940 RF-regularized CNN, CTFAttention pytorch train teachers, ensemble teachers, train general student model with knowledge distillation, device-specific end-to-end fine-tuning per-device end-to-end fine-tuning 7 fully device-specific
Chen_GXU_task1_1 Chen2025 21 56.6 pre-trained model AudioSet 61148 29419156 CP-Mobile pytorch train teachers, ensemble teachers, train general student model with knowledge distillation 1 fully device-specific
Chen_GXU_task1_2 Chen2025 22 56.5 pre-trained model 61148 29419156 CP-Mobile pytorch train teachers, ensemble teachers, train general student model with knowledge distillation 1 fully device-specific
Chen_GXU_task1_3 Chen2025 26 55.3 pre-trained model AudioSet 61148 29419156 CP-Mobile pytorch train teachers, ensemble teachers, train general student model with knowledge distillation 1 fully device-specific
Han_CSU_task1_1 Han2025a 31 26.4 None 61148 298637 CNN (SepNet) pytorch data augmentation, train baseline model per-device end-to-end fine-tuning 7 fully device-specific
Han_CSU_task1_2 Han2025a 32 25.1 pre-trained model, BEATs 61148 298637 CNN pytorch train transformer teachers, knowledge distillation to SepNet student, device-specific end-to-end fine-tuning per-device end-to-end fine-tuning 7 fully device-specific
Han_CSU_task1_3 Han2025a 29 32.6 pre-trained model, BEATs, EfficientAT 61148 298637 CNN pytorch train transformer teachers, knowledge distillation to SepNet student, device-specific end-to-end fine-tuning per-device end-to-end fine-tuning 7 fully device-specific
Han_CSU_task1_4 Han2025a 30 30.9 pre-trained model, BEATs, EfficientAT 61148 298637 CNN pytorch train transformer teachers, knowledge distillation to SepNet student, device-specific end-to-end fine-tuning per-device end-to-end fine-tuning 7 fully device-specific
Jeong_SEOULTECH_task1_1 Jeong2025 20 56.9 pre-trained model 61148 26059412 CNN, Transformer pytorch train general teacher model, ensemble teachers, device-specific fine-tuning per-device end-to-end fine-tuning 7 fully device-specific
Jeong_SEOULTECH_task1_2 Jeong2025 19 57.9 pre-trained model 61148 26059412 CNN, Transformer pytorch train general teacher model, ensemble teachers, device-specific fine-tuning per-device end-to-end fine-tuning 7 fully device-specific
Karasin_JKU_task1_1 Karasin2025 2 61.4 dataset, pre-trained model PretrainedSED, MicIRP, CochlScene 61148 29419156 RF-regularized CNN pytorch pretrain CP-ResNet teacher on CochlScene, train teachers (CP-ResNet;BEATs) on TAU22, device-specific end-to-end fine-tuning the CP-ResNet teacher, pretrain student model on CochlScene, train general student model with knowledge distillation, device-specific end-to-end fine-tuning with knowledge distillation per-device end-to-end fine-tuning 7 fully device-specific
Karasin_JKU_task1_2 Karasin2025 4 60.1 dataset, pre-trained model PretrainedSED, MicIRP, CochlScene 61148 29419156 RF-regularized CNN pytorch pretrain CP-ResNet teacher on CochlScene, train teachers (CP-ResNet;BEATs) on TAU22, device-specific end-to-end fine-tuning the CP-ResNet teacher, pretrain student model on CochlScene, train general student model with knowledge distillation, device-specific end-to-end fine-tuning with knowledge distillation per-device end-to-end fine-tuning 7 fully device-specific
Karasin_JKU_task1_3 Karasin2025 3 60.3 dataset, pre-trained model MicIRP, CochlScene, PaSST 61148 29419156 RF-regularized CNN pytorch pretrain CP-ResNet teacher on CochlScene, train teachers (CP-ResNet;PaSST) on TAU22, device-specific end-to-end fine-tuning the CP-ResNet teacher, pretrain student model on CochlScene, train general student model with knowledge distillation, device-specific end-to-end fine-tuning per-device end-to-end fine-tuning 7 fully device-specific
Karasin_JKU_task1_4 Karasin2025 1 61.5 dataset, pre-trained model PretrainedSED, MicIRP, CochlScene 61148 29419156 RF-regularized CNN pytorch pretrain CP-ResNet teacher on CochlScene, train teachers (CP-ResNet;BEATs) on TAU22, device-specific end-to-end fine-tuning the CP-ResNet teacher, pretrain student model on CochlScene, train general student model with knowledge distillation, device-specific end-to-end fine-tuning for device s1, device-specific end-to-end fine-tuning with knowledge distillation for the rest of the devices per-device end-to-end fine-tuning 7 fully device-specific
Krishna_SRIB_task1_1 Gurugubelli2025 23 56.1 61160 27862676 RF-regularized CNN pytorch train general model, device-specific end-to-end fine-tuning per-device end-to-end fine-tuning 7 fully device-specific
Li_NTU_task1_1 Li2025 14 58.7 dataset, micIRP, pre-trained model, PaSST 61160 17050260 RF-regularized CNN pytorch train teachers, ensemble teachers, train general student model with knowledge distillation (both stage-wise and output-wise), model soup, device-specific end-to-end fine-tuning per-device end-to-end fine-tuning, device-IR augmentation 1 fully shared
Li_NTU_task1_2 Li2025 12 58.8 dataset, micIRP, pre-trained model, PaSST MicIRP 61160 17050260 RF-regularized CNN pytorch train teachers, ensemble teachers, train general student model with knowledge distillation (both stage-wise and output-wise), model soup, device-specific end-to-end fine-tuning per-device end-to-end fine-tuning, DIR 1 fully shared
Luo_CQUPT_task1_1 Luo2025 6 59.6 pre-trained model 61650 28938900 RF-regularized CNN pytorch train teachers, ensemble teachers, train general student model with knowledge distillation, device-specific end-to-end fine-tuning per-device end-to-end fine-tuning 7 fully device-specific
Ramezanee_SUT_task1_1 Ramezanee2025 27 54.6 dataset MicIRP 31260 28642220 CNN pytorch train teachers, ensemble teachers, train general model, device-specific end-to-end fine-tuning, train student models with knowledge distillation per-device end-to-end fine-tuning 7 fully device-specific
Ramezanee_SUT_task1_2 Ramezanee2025 25 55.5 dataset MicIRP 31260 28642220 CNN pytorch train teachers, ensemble teachers, train general model, device-specific end-to-end fine-tuning, train student models with knowledge distillation per-device end-to-end fine-tuning 7 fully device-specific
Ramezanee_SUT_task1_3 Ramezanee2025 18 57.9 dataset MicIRP 31260 28642220 CNN pytorch train teachers, ensemble teachers, train general model, device-specific end-to-end fine-tuning, train student models with knowledge distillation per-device end-to-end fine-tuning 7 fully device-specific
DCASE2025 baseline 28 53.2 61148 29419156 RF-regularized CNN pytorch training per-device end-to-end fine-tuning 7 fully device-specific
Tan_SNTLNTU_task1_1 Tan2025 5 59.9 116342 10902300 GRU-CNN pytorch train general model, device-specific end-to-end fine-tuning per-device end-to-end fine-tuning 7 fully device-specific
Tan_SNTLNTU_task1_2 Tan2025 9 59.0 117210 10902300 GRU-CNN pytorch train general model, device-specific end-to-end fine-tuning per-device end-to-end fine-tuning 7 fully device-specific
Zhang_AITHU-SJTU_task1_1 Zhang2025 13 58.8 pre-trained model EfficientAT 63748 29982132 CNN pytorch train teachers, ensemble teachers, train student using knowledge distillation, pruning 1 fully shared
Zhang_AITHU-SJTU_task1_2 Zhang2025 11 58.9 pre-trained model EfficientAT 63748 29982132 CNN pytorch train teachers, ensemble teachers, train student using knowledge distillation, pruning 1 fully shared
Zhang_AITHU-SJTU_task1_3 Zhang2025 7 59.3 pre-trained model EfficientAT 63215 29221122 CNN pytorch train teachers, ensemble teachers, train student using knowledge distillation, pruning 1 fully shared
Zhang_AITHU-SJTU_task1_4 Zhang2025 8 59.3 pre-trained model EfficientAT 63215 29221122 CNN pytorch train teachers, ensemble teachers, train student using knowledge distillation, pruning 1 fully shared
Zhou_XJTLU_task1_1 Ziyang2025 24 55.5 dataset, embeddings, pre-trained model AudioSet_balanced 126858 29419648 CNN (TF-SepNet) pytorch_lighting train general model, device-specific end-to-end fine-tuning per-device end-to-end fine-tuning 7 fully device-specific

Technical reports

McCi Submission to DCASE 2025: Training Low-Complexity Acoustic Scene Classification System with Knowledge Distillation and Curriculum

Xuanyan Chen and Wei Xie
School of Computer, Electronics and Information, Guangxi University, Guangxi, China

Abstract

The Task 1 of DCASE 2025 focuses on different aspects of Acoustic Scene Classification(ASC) including recording device mismatch, low complexity constraints, data efficiency and the development of recording-device-specific models. This technical report describes the system we submitted. We first trained several teacher models on the ASC dataset through Self-Distillation and Curriculum Learning techniques.These teacher models included a model pre-trained on the AudioSet. Then we distill the knowledge from the teacher model into the student model via curriculum learning. We used the same inference model (i.e., student model) and data augmentation settings as provided in the baseline system. In experiments, our best system achieved an accuracy of 57.66%.

System characteristics
Sampling rate 32kHz
Data augmentation freq-mixstyle, time rolling
Sampling rate 32kHz
Features log-mel energies
Classifier CP-Mobile
Complexity management knowledge distillation, precision_16
Number of models at inference 1
Model weight sharing fully device-specific
PDF

Srib Submission for DCASE 2025 Challenge Task-1: Low-Complexity Acoustic Scene Classification with Device Information

Krishna Gurugubelli, Ravi Solanki, Sujith Viswanathan, Madhu Rayappa Kamble, Aditi Deo, Abhinandan Udupa, Ramya Viswanathan and Rajesh Krishna K S
Audio AI Team, Samsung R&D Institute India-Bangalore, Bangalore, India

Abstract

This report details our submission for Task 1: Low-Complexity Acoustic Scene Classification with Device Information in the DCASE2025 challenge[1]. Our method builds upon the leading system from the DCASE2023 competition. Specifically, we have explored the CP-Mobile architecture in this work. To improve the generalization across devices, we incorporate several data augmentation strategies, including Freq-Mix-Style, frequency masking, and time rolling. To meet the model complexity requirements of the competition, we have evaluated the model with 16-bit precision. Hence, we have incorporated the mixed precision training to achieve the better performance during inference with 16-bit model. Our results show significant improvements in test accuracy over the baseline, confirming the effectiveness of our approach across all subsets.

System characteristics
Sampling rate 32kHz
Data augmentation freq-mixstyle, frequency masking, time rolling
Sampling rate 32kHz
Features log-mel energies
Classifier RF-regularized CNN
Complexity management precision_16, network design
Device information per-device end-to-end fine-tuning
Number of models at inference 7
Model weight sharing fully device-specific
PDF

Hyu Submission for DCASE 2025 Task 1: Low-Complexity Acoustic Scene Classification Using Reparameterizable CNN with Channel-Time-Frequency Attention

Seung-Gyu Han1, Pil Moo Byun2 and Joon-Hyuk Chang1,2
1Artificial Intelligence Semiconductor Engineering, Hanyang University, Seoul, Republic of Korea, 2Artificial Intelligence, Hanyang University, Seoul, Republic of Korea

Abstract

This paper presents the Hanyang University team’s submission for the DCASE 2025 Challenge Task 1: Low-Complexity Acoustic Scene Classification with Device Information. The task focuses on developing compact and efficient models that generalize well across both seen and unseen recording devices, under strict constraints on model size and computational cost. To address these challenges, we propose Rep-CTFA, a lightweight convolutional neural network that integrates two key design elements: (1) reparameterizable convolutional blocks with learnable branch scaling coefficients, and (2) a Channel-Time-Frequency Attention (CTFA) module. In addition, we explore input resolution variation by adjusting the hop length and number of mel bins to control time-frequency granularity. Knowledge distillation from a PaSST-based teacher ensemble is used to guide the training of the student model, improving generalization. Finally, we adopt a device-aware fine-tuning scheme that updates lightweight classification heads per device while keeping the shared backbone intact.

System characteristics
Sampling rate 32kHz
Data augmentation freq-mixstyle, frequency masking, time rolling, DIR
Sampling rate 32kHz
Features log-mel energies
Classifier RF-regularized CNN, CTFAttention
Complexity management precision_16, network design, knowledge distillation
Device information per-device end-to-end fine-tuning
Number of models at inference 7
Model weight sharing fully device-specific
PDF

Confidence-Aware Ensemble Knowledge Distillation for Low-Complexity Acoustic Scene Classification

Sarang Han1, Dong Ho Lee2, Min Sik Jo1, Eun Seo Ha1, Min Ju Chae1 and Geon Woo Lee1
1Intelligence Speech and Processing Language, ChoSun University (CSU) Gwangju, Gwangju, South Korea, 2Institute of Computational Perception (CP), Johannes Kepler University (JKU) Linz, Linz, Austria

Abstract

We propose a confidence-aware ensemble knowledge distillation method for acoustic scene classification under low-complexity and limited-data settings. Our approach utilizes heterogeneous teacher models—BEATs, and EfficientAT—fine-tuned on the DCASE 2025 Task 1 dataset, to guide the training of a lightweight student model, TFSepNet. To improve over naive ensemble distillation, we introduce a confidence-weighted strategy that emphasizes reliable teacher outputs. Experimental results show improved generalization on unseen devices and domains, outperforming single-teacher and uniform ensemble baselines.

System characteristics
Sampling rate 44.1kHz
Data augmentation MixUp, MixStyle, SpecAug, FiltAug, AddNoise, FrameShift,TimeMask, FreqMask, DIR; MixUp, MixStyle, SpecAug, FiltAug, AddNoise, FrameShift, TimeMask, FreqMask, DIR
Sampling rate 44.1kHz
Features log-mel spectrogram
Classifier CNN (SepNet); CNN
Complexity management network design; knowledge distillation
Device information per-device end-to-end fine-tuning
Number of models at inference 7
Model weight sharing fully device-specific
PDF

Adaptive Knowledge Distillation Using a Device-Aware Teacher for Low-Complexity Acoustic Scene Classification

Seunggyu Jeong and Seongeon Kim
Department of Artificial Intelligence, Seoul National University of Science and Technology, Seoul, South Korea

Abstract

In this technical report, we describe our submission for Task 1, Low-Complexity Device-Robust Acoustic Scene Classification, of the DCASE 2025 Challenge. Our work tackles the dual challenges of strict complexity constraints and robust generalization to both seen and unseen devices, while also leveraging the new rule allowing the use of device labels at test time. Our proposed system is based on a knowledge distillation framework where an efficient CP-MobileNet student learns from a compact, specialized two-teacher ensemble. This ensemble combines a baseline PaSST teacher, trained with standard cross-entropy, and a ’generalization expert’ teacher. This expert is trained using our novel Device-Aware Feature Alignment (DAFA) loss, adapted from prior work, which explicitly structures the feature space for device robustness. To capitalize on the availability of test-time device labels, the distilled student model then undergoes a final device-specific fine-tuning stage. Our proposed system achieves a final accuracy of 57.93% on the development set, demonstrating a significant improvement over the official baseline, particularly on unseen devices.

System characteristics
Sampling rate 44.1kHz
Data augmentation freq-mixstyle, mixup
Sampling rate 44.1kHz
Features log-mel energies
Classifier CNN, Transformer
Complexity management knowledge distillation
Device information per-device end-to-end fine-tuning
Number of models at inference 7
Model weight sharing fully device-specific
PDF

Domain-Specific External Data Pre-Training and Device-Aware Distillation for Data-Efficient Acoustic Scene Classification

Dominik Karasin, Ioan-Cristian Olariu, Michael Schöpf and Anna Szymańska
Institute of Computational Perception (CP), Johannes Kepler University (JKU) Linz, Linz, Austria

Abstract

In this technical report, we present our submission to the DCASE 2025 Challenge Task 1: Low-Complexity Acoustic Scene Classification with Device Information. Our approach centers on a compact CP-Mobile student model distilled via Bayesian ensemble averaging from different combinations of three teacher architectures: CP-ResNet, BEATs, and PaSST—using AudioSet pretrained check-points for the last two. We then fine-tune the student on each recording device to improve per-device classification accuracy. To compensate for the limited 25% train-split, we pre-train both teacher and student on CochlScene and apply data augmentation, of which Device Impulse Response augmentation was particularly effective.

System characteristics
Sampling rate 32kHz
Data augmentation freq-mixstyle, DIR, time masking, frequency masking, time rolling
Sampling rate 32kHz
Features log-mel energies
Classifier RF-regularized CNN
Complexity management precision_16, network design, knowledge distillation
Device information per-device end-to-end fine-tuning
Number of models at inference 7
Model weight sharing fully device-specific
PDF

Joint Feature and Output Distillation for Low-Complexity Acoustic Scene Classification

Haowen Li1, Ziyi Yang1, Mou Wang2, Ee-Leng Tan1, Junwei Yeow1, Santi Peksi1 and Woon-Seng Gan1
1Smart Nation TRANS Lab, Nanyang Technological University, Singapore, 2Institute of Acoustics, Chinese Academy of Sciences, Beijing, China

Abstract

This report presents a dual-level knowledge distillation framework with multi-teacher guidance for low-complexity acoustic scene classification (ASC) in DCASE2025 Task 1. We propose a distillation strategy that jointly transfers both soft logits and intermediate feature representations. Specifically, we pre-trained PaSST and CP-ResNet models as teacher models. Logits from teachers are averaged to generate soft targets, while one CP-ResNet is selected for feature-level distillation. This enables the compact student model (CP-Mobile) to capture both semantic distribution and structural information from teacher guidance. Experiments on the TAU Urban Acoustic Scenes 2022 Mobile dataset (development set) demonstrate that our submitted systems achieve up to 59.30% accuracy.

System characteristics
Sampling rate 32kHz
Data augmentation freq-mixstyle, time rolling, DIR
Sampling rate 32kHz
Features log-mel energies
Classifier RF-regularized CNN
Complexity management knowledge distillation, network design, precision_16; knowledge distillation, network design
Device information per-device end-to-end fine-tuning, device-IR augmentation; per-device end-to-end fine-tuning, DIR
Number of models at inference 1
Model weight sharing fully shared
PDF

Dynacp: Dynamic Parallel Selective Convolution in Cp-Mobile Under Multi-Teacher Distillation for Acoustic Scene Classification

Yuandong Luo1, Hongqing Liu1, Liming Shi2 and Lu Gan3
1Chongqing Key Lab of Mobile Communications Technology, Chongqing University of Posts and Telecommunications, Chongqing, China, 2School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China, 3College of Engineering, Design and Physical Science, Brunel University, London, U.K.

Abstract

This report introduces the acoustic scene classification (ASC) architecture submitted by the Chongqing University of Posts and Telecommunications – Audio Lab (CQUPT-AUL) for DCASE 2025 Task 1. The architecture is a lightweight and efficient network structure, termed as DynaCP. Built upon CP-Mobile, DynaCP dynamically selects between dilated convolutions with pooling or depth-wise convolutions with pooling at different network layers, thereby enhancing multi-scale feature representation with minimal computational overhead, while also alleviating the issue of information sparsity caused by dilated convolutions. To improve classification accuracy, a multi-teacher knowledge distillation approach is employed using pre-trained models of DYMN and MN. Experimental results demonstrate that DynaCP achieves competitive performance while maintaining low computational complexity.

System characteristics
Sampling rate 44.1kHz
Data augmentation freq-mixstyle, pitch shifting, time rolling
Sampling rate 44.1kHz
Features log-mel energies
Classifier RF-regularized CNN
Complexity management knowledge distillation, precision_16, network design
Device information per-device end-to-end fine-tuning
Number of models at inference 7
Model weight sharing fully device-specific
PDF

Acoustic Scene Classification with Knowledge Distillation and Device-Specific Fine-Tuning for DCASE 2025

Mohamad Mahdee Ramezanee, Hossein Sharify, Amir Mohamad Mehrani Kia and Behnam Raoufi
Electrical Engineering, Sharif University of Technology, Tehran, Iran

Abstract

The objective of the acoustic scene classification task is to categorize audio recordings into one of ten predetermined environmental sound categories, such as urban parks or metro stations. This report to Task 1 of the DCASE 2025 Challenge, which emphasizes developing data-efficient, low-complexity systems for acoustic scene classification, addressing real-world constraints like limited training data and device mismatches [1]. Our model is designed with a reparameterizable convolutional structure that unifies multiple asymmetric kernels into a single efficient layer during inference, enabling both rich spatial representation and computational efficiency. It further integrates a novel attention-guided pooling strategy and a hybrid normalization scheme to enhance feature discrimination and stability throughout the network. Finally, we utilized ensemble learning of the newly defined teacher models and minimized the KL divergence between the student and teacher models to improve the results.

System characteristics
Sampling rate 32kHz
Data augmentation freq-mixstyle, frequency masking, time masking, random noise, random gain, DIR
Sampling rate 32kHz
Features log-mel energies
Classifier CNN
Complexity management network design, knowledge distillation, pruning, reparametrization
Device information per-device end-to-end fine-tuning
Number of models at inference 7
Model weight sharing fully device-specific
PDF

SNTL-Ntu Dcase25 Submission: Acoustic Scene Classification Using CNN-GRU Model Without Knowledge Distillation

Ee-Leng Tan1, Jun Wei Yeow2, Santi Peksi2, Haowen Li2, Ziyi Yang2 and Woon-Seng Gan2
1Smart Nation TRANS Lab, Nanyang Technological University, Singapore, 2Smart Nation TRANS Lab, Nanyang Technological Univeristy, Singapore, Singapore

Abstract

In this technical report, we present the SNTL-NTU team’s Task 1 submission for the Low-Complexity Acoustic Scene Classification of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2025 challenge [1]. This submission departs from the typical application of knowledge distillation from a teacher to a student model, aiming to achieve high performance with limited complexity. The proposed model is based on a CNN-GRU model and is trained solely using the TAU Urban Acoustic Scene 2022 Mobile development dataset [2], without utilizing any external datasets, except for MicIRP [3], which is used for device impulse response (DIR) augmentation. Two models have been submitted to this challenge with memory usage not more than 117 KB and requiring 10.9M multiply-and-accumulate (MAC) operations. Using the development dataset, the proposed model achieved an accuracy of 60.25%.

System characteristics
Sampling rate 44.1kHz
Data augmentation freq-mixstyle, DIR, SpecAug
Sampling rate 44.1kHz
Features log-mel energies
Classifier GRU-CNN
Complexity management precision_16, network design
Device information per-device end-to-end fine-tuning
Number of models at inference 7
Model weight sharing fully device-specific
PDF

Data-Efficient Acoustic Scene Classification via Ensemble Teachers Distillation and Pruning

Shuwei Zhang1, Bing Han2, Anbai Jiang3, Xinhu Zheng2, Wei-Qiang Zhang3, Xie Chen2, Pingyi Fan3, Cheng Lu4, Jia Liu1,3 and Yanmin Qian2
1Huakong AI, Beijing, China, 2Shanghai Jiao Tong University, Shanghai, China, 3Tsinghua University, Beijing, China, 4North China Electric Power University, Beijing, China

Abstract

The goal of the acoustic scene classification task is to classify recordings into one of the ten predefined acoustic scene classes. In this report, we describe the submission of the THU-SJTU team for Task 1 Data-Efficient Low-Complexity Acoustic Scene Classification of the DCASE 2025 challenge. Our methods are consistent with those of last year. Firstly, we use an architecture named SSCP-Mobile (spatially separable), which enhances the CP-Mobile with spatially separable convolution structure and achieves lower computation expenses and better performance. Then we adopt several pre-trained PaSST models as ensemble teachers to teach CP-Mobile with knowledge distillation. After that, we use model pruning techniques to trim the model to meet the computational and parameter requirements of the competition. Finally, we will use knowledge distillation techniques again to fine-tune the pruned model and further improve its performance. Due to some reasons, our submissions included four systems that contain only general models, but we also attempted to use device type information to increase the performance of the system S1.

System characteristics
Sampling rate 32kHz
Data augmentation freq-mixstyle, frequency masking, time masking, time rolling
Sampling rate 32kHz
Features log-mel energies
Classifier CNN
Complexity management precision_16, network design, knowledge distillation, pruning
Number of models at inference 1
Model weight sharing fully shared
PDF

Adaptf-Sepnet: Audioset-Driven Adaptive Pre-Training of Tf-Sepnet for Multi-Device Acoustic Scene Classification

Zhou Ziyang1, Yin Zeyu1, Cai Yiqiang1, Li Shengchen1 and Shao Xi2
1School of Advanced Technology, Xi'an Jiaotong Liverpool University, Suzhou, China, 2Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China

Abstract

This technical report presents our submission to DCASE 2025 Challenge Task 1: Low-Complexity Acoustic Scene Classification with Device Information. We propose a multi-device framework that leverages device-specific models trained with knowledge distillation techniques and enhanced through AudioSet pre-training. Our approach utilizes TF-SepNet as the backbone architecture, pre-trained on the large-scale AudioSet dataset to learn robust acoustic representations. For each of the known devices, a dedicated model is trained. At inference time, the system identifies the device source of the audio clip and selects the corresponding pre-trained model for classification. Evaluated on the test set, our device-specific system achieves an overall accuracy of 59.5%.

System characteristics
Sampling rate 32kHz
Data augmentation mixup, freq-mixstyle, DIR
Sampling rate 32kHz
Features log-mel spectrogram
Classifier CNN (TF-SepNet)
Complexity management network design, weight quantization, knowledge distillation
Device information per-device end-to-end fine-tuning
Number of models at inference 7
Model weight sharing fully device-specific
PDF