Low-Complexity Acoustic Scene Classification


Challenge results

Task description

The goal of acoustic scene classification is to classify a test recording into one of the predefined ten acoustic scene classes. This targets acoustic scene classification with devices with low computational and memory allowance, which impose certain limits on the model complexity, such as the model’s number of parameters and the multiply-accumulate operations count. In addition to low-complexity, the aim is generalization across a number of different devices. For this purpose, the task will use audio data recorded and simulated with a variety of devices.

The development dataset consists of recordings from 10 European cities using 9 different devices: 3 real devices (A, B, C) and 6 simulated devices (S1-S6). Data from devices B, C, and S1-S6 consists of randomly selected segments from the simultaneous recordings, therefore all overlap with the data from device A, but not necessarily with each other. The total amount of audio in the development set is 64 hours.

The evaluation dataset contains data from 12 cities, 10 acoustic scenes, 11 devices. There are five new devices (not available in the development set): real device D and simulated devices S7-S11. Evaluation data contains 22 hours of audio.

The device A consists in a Soundman OKM II Klassik/studio A3, electret binaural microphone and a Zoom F8 audio recorder using 48kHz sampling rate and 24-bit resolution. The other devices are commonly available customer devices: device B is a Samsung Galaxy S7, device C is iPhone SE, and device D is a GoPro Hero5 Session.

More detailed task description can be found in the task description page

Systems ranking

Submission information Evaluation dataset Development dataset
Rank Submission label Name Technical
Report
Official
system rank
Rank value Performance
rank
Memory
rank
MACs
rank
Accuracy
with 95% confidence interval
Logloss
with 95% confidence interval
Accuracy Logloss
AI4EDGE_IPL_task1_1 AI4EDGE_1 Almeida2023 28 30.25 34 14 39 51.9 (51.7 - 52.2) 1.920 (1.786 - 2.054) 67.5 0.932
AI4EDGE_IPL_task1_2 AI4EDGE_2 Almeida2023 47 43.75 51 25 48 48.8 (48.5 - 49.1) 1.996 (1.859 - 2.133) 69.9 0.872
AI4EDGE_IPL_task1_3 AI4EDGE_3 Almeida2023 38 37.25 43 23 40 50.8 (50.5 - 51.1) 1.364 (1.308 - 1.420) 66.0 0.969
Bai_JLESS_task1_1 JLESS Du2023 54 50.25 56 48 41 47.9 (47.6 - 48.2) 1.825 (1.727 - 1.923) 50.8 1.436
Bai_JLESS_task1_2 JLESS Du2023 47 43.75 52 47 24 48.8 (48.5 - 49.1) 1.791 (1.691 - 1.891) 50.5 1.467
Cai_TENCENT_task1_1 Cai_1 Cai2023 17 25.75 7 45 44 56.6 (56.3 - 56.9) 1.174 (1.122 - 1.225) 57.5 1.147
Cai_TENCENT_task1_2 Cai_2 Cai2023 11 21.25 10 31 34 56.2 (55.9 - 56.5) 1.246 (1.196 - 1.295) 58.1 1.178
Cai_TENCENT_task1_3 Cai_3 Cai2023 14 22.25 12 31 34 55.8 (55.5 - 56.1) 1.241 (1.194 - 1.288) 57.4 1.190
Cai_TENCENT_task1_4 Cai_4 Cai2023 11 21.25 17 21 30 55.4 (55.1 - 55.7) 1.252 (1.207 - 1.297) 57.0 1.198
Cai_XJTLU_task1_1 TFSepNet1 Cai2023a 9 19.75 36 2 5 51.9 (51.6 - 52.2) 1.307 (1.261 - 1.352) 53.9 1.257
Cai_XJTLU_task1_2 TFSepNet2 Cai2023a 8 18.25 33 2 5 52.5 (52.3 - 52.8) 1.292 (1.246 - 1.337) 51.9 1.307
Cai_XJTLU_task1_3 TFSepNet3 Cai2023a 6 14.00 22 5 7 55.1 (54.8 - 55.4) 1.223 (1.173 - 1.273) 57.5 1.160
Cai_XJTLU_task1_4 TFSepNet4 Cai2023a 3 11.50 5 18 18 57.0 (56.7 - 57.3) 1.241 (1.184 - 1.299) 64.3 0.989
Fei_vv_task1_1 vv_1 Fei2023 15 24.75 19 38 23 55.2 (55.0 - 55.5) 1.282 (1.223 - 1.341) 59.3 1.145
Fei_vv_task1_2 vv_2 Fei2023 13 21.75 24 26 13 54.5 (54.2 - 54.8) 1.290 (1.232 - 1.348) 56.7 1.204
Fei_vv_task1_3 vv_3 Fei2023 26 29.25 28 38 23 53.2 (53.0 - 53.5) 1.349 (1.286 - 1.412) 58.0 1.231
Fei_vv_task1_4 vv_4 Fei2023 23 28.25 37 26 13 51.8 (51.5 - 52.0) 1.370 (1.306 - 1.434) 55.4 1.284
Han_SZU_task1_1 Han_SZU_1 Han2023 39 37.75 44 13 50 50.5 (50.3 - 50.8) 2.011 (1.886 - 2.137) 51.4 1.378
LAM_AEV_task1_1 AEV_sys_1 Pham2023 25 29.00 18 33 47 55.3 (55.1 - 55.6) 1.847 (1.726 - 1.968) 56.8 1.349
LAM_AEV_task1_2 AEV_sys_2 Pham2023 31 31.50 23 33 47 55.0 (54.7 - 55.3) 2.083 (1.941 - 2.224) 57.4 1.333
LAM_AEV_task1_3 AEV_sys_3 Pham2023 20 27.00 14 33 47 55.6 (55.3 - 55.9) 1.933 (1.810 - 2.055) 57.4 1.333
Liang_NTES_task1_1 NTES_1 Liang2023 40 38.50 32 41 49 52.6 (52.3 - 52.8) 1.402 (1.339 - 1.465) 54.9 1.293
MALACH23_JKU_task1_1 RFR-CNN-1 Pichler2023 8 18.25 6 36 25 57.0 (56.7 - 57.3) 1.230 (1.180 - 1.279) 55.2 1.280
MALACH23_JKU_task1_2 RFR-CNN-2 Pichler2023 7 16.75 8 32 19 56.6 (56.3 - 56.8) 1.242 (1.196 - 1.288) 53.5 1.323
MALACH23_JKU_task1_3 S4-1 Pichler2023 42 39.50 67 22 2 9.9 (9.7 - 10.1) 4.354 (4.289 - 4.418) 46.7 1.496
MALACH23_JKU_task1_4 S4-2 Pichler2023 43 39.75 68 22 1 9.8 (9.7 - 10.0) 3.224 (3.184 - 3.265) 45.1 1.509
DCASE2023 baseline Baseline 52 46.75 64 13 46 44.8 (44.5 - 45.1) 1.523 (1.478 - 1.568) 42.9 1.575
Park_KT_task1_1 KT_1 Kim2023 10 20.75 9 34 31 56.3 (56.0 - 56.6) 1.495 (1.410 - 1.580) 72.5 0.824
Park_KT_task1_2 KT_2 Kim2023 29 30.75 29 34 31 53.0 (52.7 - 53.3) 1.660 (1.569 - 1.751) 56.1 1.446
Park_KT_task1_3 KT_3 Kim2023 26 29.25 26 34 31 54.2 (54.0 - 54.5) 2.230 (2.107 - 2.353) 70.5 1.167
Park_KT_task1_4 KT_MF Kim2023 20 27.00 49 7 3 49.2 (48.9 - 49.5) 1.510 (1.469 - 1.550) 54.2 1.427
Schmid_CPJKU_task1_1 CPM bc=8 Schmid2023 5 13.75 25 1 4 54.4 (54.1 - 54.6) 1.313 (1.245 - 1.380) 52.6 1.370
Schmid_CPJKU_task1_2 CPM bc=16 Schmid2023 1 5.25 4 4 9 58.7 (58.4 - 59.0) 1.256 (1.181 - 1.332) 58.4 1.200
Schmid_CPJKU_task1_3 CPM bc=24 Schmid2023 2 7.00 2 8 16 61.4 (61.2 - 61.7) 1.153 (1.085 - 1.221) 61.8 1.090
Schmid_CPJKU_task1_4 CPM bc=32 Schmid2023 3 11.50 1 16 28 62.7 (62.4 - 63.0) 1.117 (1.047 - 1.187) 64.1 1.007
Schmidt_FAU_task1_1 30mmacs Schmidt2023 26 29.25 13 46 45 55.7 (55.4 - 56.0) 1.322 (1.271 - 1.373) 57.5 1.217
Schmidt_FAU_task1_2 20mmacs Schmidt2023 12 21.50 15 24 32 55.6 (55.3 - 55.9) 1.337 (1.281 - 1.392) 57.2 1.211
Schmidt_FAU_task1_3 10mmacs Schmidt2023 19 26.75 31 28 17 52.7 (52.4 - 53.0) 1.398 (1.346 - 1.450) 54.5 1.298
Schmidt_FAU_task1_4 5mmacs Schmidt2023 24 28.75 48 9 10 49.7 (49.4 - 50.0) 1.482 (1.427 - 1.536) 50.6 1.417
Tan_NTU_task1_1 TYPG_T1_1 Tan2023 33 33.50 59 10 6 47.1 (46.8 - 47.4) 1.508 (1.461 - 1.554) 50.3 1.397
Tan_NTU_task1_2 TYPG_T1_2 Tan2023 35 34.00 54 17 11 48.5 (48.2 - 48.8) 1.461 (1.417 - 1.505) 52.1 1.372
Tan_NTU_task1_3 TYPG_T1_3 Tan2023 37 37.00 60 17 11 46.3 (46.1 - 46.6) 1.492 (1.449 - 1.534) 50.0 1.381
Tan_SCUT_task1_1 BSConv1_1 Tan2023a 4 13.50 3 27 21 60.8 (60.6 - 61.1) 1.192 (1.119 - 1.265) 55.7 1.318
Tan_SCUT_task1_2 BSConv1_2 Tan2023a 30 31.00 38 27 21 51.7 (51.4 - 52.0) 1.444 (1.378 - 1.509) 54.3 1.243
Tan_SCUT_task1_3 BSConv1_3 Tan2023a 16 25.50 27 27 21 53.5 (53.2 - 53.8) 1.441 (1.370 - 1.513) 55.5 1.320
Tan_SCUT_task1_4 BSConv1_4 Tan2023a 32 33.00 42 27 21 50.9 (50.6 - 51.2) 1.525 (1.455 - 1.595) 54.5 1.305
Vo_DU_task1_1 HKD-MLA-1 Vo2023 52 46.75 63 35 26 45.0 (44.7 - 45.2) 2.157 (2.035 - 2.279) 46.0 1.591
Vo_DU_task1_2 HKD-MLA-2 Vo2023 53 47.75 65 35 26 44.8 (44.5 - 45.1) 2.116 (2.003 - 2.229) 45.4 1.640
Vo_DU_task1_3 HKD-MLA-3 Vo2023 50 46.25 62 35 26 45.2 (44.9 - 45.5) 2.092 (1.973 - 2.211) 45.8 1.624
Vo_DU_task1_4 HKD-MLA-4 Vo2023 49 45.75 61 35 26 45.5 (45.2 - 45.8) 1.793 (1.717 - 1.869) 46.7 1.617
Wang_SCUT_task1_1 DSSDM1 Wang2023 31 31.50 50 11 15 49.1 (48.9 - 49.4) 1.493 (1.434 - 1.553) 53.3 1.287
Wang_SCUT_task1_2 DSSDM2 Wang2023 18 26.50 30 19 27 52.9 (52.6 - 53.2) 1.348 (1.300 - 1.397) 56.4 1.191
Wang_SCUT_task1_3 DSSDM3 Wang2023 46 43.50 58 20 38 47.1 (46.8 - 47.4) 1.702 (1.621 - 1.782) 50.8 1.477
Wang_SCUT_task1_4 DSSDM4 Wang2023 48 45.00 55 37 33 48.5 (48.2 - 48.8) 1.472 (1.416 - 1.529) 52.4 1.368
XuQianHu_BIT&NUDT_task1_1 DYXS_t1_1 Yu2023 45 40.50 41 44 36 51.0 (50.7 - 51.3) 1.364 (1.319 - 1.409) 59.0 1.164
XuQianHu_BIT&NUDT_task1_2 DYXS_t1_2 Yu2023 44 40.00 39 40 42 51.6 (51.3 - 51.9) 1.355 (1.308 - 1.401) 60.6 1.141
XuQianHu_BIT&NUDT_task1_3 DYXS_t1_3 Yu2023 41 39.25 46 43 22 50.0 (49.8 - 50.3) 1.395 (1.346 - 1.445) 59.6 1.168
XuQianHu_BIT&NUDT_task1_4 DYXS_t1_4 Yu2023 36 35.50 40 42 20 51.1 (50.9 - 51.4) 1.367 (1.324 - 1.411) 61.3 1.139
Yang_GZHU_task1_1 dml_kd Weng2023 15 24.75 16 30 37 55.5 (55.3 - 55.8) 1.280 (1.220 - 1.339) 59.7 1.151
Yang_GZHU_task1_2 dml_kd_tta Weng2023 14 22.25 11 30 37 55.9 (55.6 - 56.2) 1.241 (1.184 - 1.298) 59.9 1.115
Yang_GZHU_task1_3 dml Weng2023 21 27.25 21 30 37 55.1 (54.8 - 55.4) 1.279 (1.217 - 1.340) 57.9 1.170
Yang_GZHU_task1_4 dml_tta Weng2023 19 26.75 20 30 37 55.2 (54.9 - 55.5) 1.259 (1.201 - 1.318) 58.0 1.163
Zhang_NCUT_task1_1 Zhang1_NCUT Zhang2023 49 45.75 66 39 12 43.3 (43.0 - 43.6) 1.757 (1.692 - 1.821) 47.0 1.671
Zhang_NCUT_task1_2 Zhang2_NCUT Zhang2023 51 46.50 57 29 43 47.9 (47.6 - 48.2) 1.533 (1.476 - 1.590) 52.8 1.347
Zhang_SATLab_task1_1 SATLab_1 Bing2023 26 29.25 53 3 8 48.8 (48.5 - 49.1) 3.248 (3.031 - 3.465) 65.4 0.941
Zhang_SATLab_task1_2 SATLab_2 Bing2023 34 33.75 47 12 29 50.0 (49.7 - 50.2) 4.213 (3.928 - 4.497) 54.5 1.437
Zhang_SATLab_task1_3 SATLab_3 Bing2023 27 30.00 35 15 35 51.9 (51.6 - 52.2) 1.704 (1.620 - 1.789) 57.0 1.342
Zhang_SATLab_task1_4 SATLab_4 Bing2023 22 27.50 45 6 14 50.3 (50.0 - 50.6) 1.542 (1.483 - 1.601) 53.2 1.439

Teams ranking

Submission information Evaluation dataset Development dataset
Rank Submission label Name Technical
Report
Official
team rank
Rank value Performance
rank
Memory
rank
MACs
rank
Accuracy
with 95% confidence interval
Logloss
with 95% confidence interval
Accuracy Logloss
AI4EDGE_IPL_task1_1 AI4EDGE_1 Almeida2023 13 30.25 34 14 39 51.9 (51.7 - 52.2) 1.920 (1.786 - 2.054) 67.5 0.932
Bai_JLESS_task1_2 JLESS Du2023 18 43.75 52 47 24 48.8 (48.5 - 49.1) 1.791 (1.691 - 1.891) 50.5 1.467
Cai_TENCENT_task1_2 Cai_2 Cai2023 6 21.25 10 31 34 56.2 (55.9 - 56.5) 1.246 (1.196 - 1.295) 58.1 1.178
Cai_XJTLU_task1_4 TFSepNet4 Cai2023a 2 11.50 5 18 18 57.0 (56.7 - 57.3) 1.241 (1.184 - 1.299) 64.3 0.989
Fei_vv_task1_2 vv_2 Fei2023 8 21.75 24 26 13 54.5 (54.2 - 54.8) 1.290 (1.232 - 1.348) 56.7 1.204
Han_SZU_task1_1 Han_SZU_1 Han2023 16 37.75 44 13 50 50.5 (50.3 - 50.8) 2.011 (1.886 - 2.137) 51.4 1.378
LAM_AEV_task1_3 AEV_sys_3 Pham2023 11 27.00 14 33 47 55.6 (55.3 - 55.9) 1.933 (1.810 - 2.055) 57.4 1.333
Liang_NTES_task1_1 NTES_1 Liang2023 17 38.50 32 41 49 52.6 (52.3 - 52.8) 1.402 (1.339 - 1.465) 54.9 1.293
MALACH23_JKU_task1_2 RFR-CNN-2 Pichler2023 4 16.75 8 32 19 56.6 (56.3 - 56.8) 1.242 (1.196 - 1.288) 53.5 1.323
DCASE2023 baseline Baseline 21 46.75 64 13 46 44.8 (44.5 - 45.1) 1.523 (1.478 - 1.568) 42.9 1.575
Park_KT_task1_1 KT_1 Kim2023 5 20.75 9 34 31 56.3 (56.0 - 56.6) 1.495 (1.410 - 1.580) 72.5 0.824
Schmid_CPJKU_task1_2 CPM bc=16 Schmid2023 1 5.25 4 4 9 58.7 (58.4 - 59.0) 1.256 (1.181 - 1.332) 58.4 1.200
Schmidt_FAU_task1_2 20mmacs Schmidt2023 7 21.50 15 24 32 55.6 (55.3 - 55.9) 1.337 (1.281 - 1.392) 57.2 1.211
Tan_NTU_task1_1 TYPG_T1_1 Tan2023 14 33.50 59 10 6 47.1 (46.8 - 47.4) 1.508 (1.461 - 1.554) 50.3 1.397
Tan_SCUT_task1_1 BSConv1_1 Tan2023a 3 13.50 3 27 21 60.8 (60.6 - 61.1) 1.192 (1.119 - 1.265) 55.7 1.318
Vo_DU_task1_4 HKD-MLA-4 Vo2023 19 45.75 61 35 26 45.5 (45.2 - 45.8) 1.793 (1.717 - 1.869) 46.7 1.617
Wang_SCUT_task1_2 DSSDM2 Wang2023 10 26.50 30 19 27 52.9 (52.6 - 53.2) 1.348 (1.300 - 1.397) 56.4 1.191
XuQianHu_BIT&NUDT_task1_4 DYXS_t1_4 Yu2023 15 35.50 40 42 20 51.1 (50.9 - 51.4) 1.367 (1.324 - 1.411) 61.3 1.139
Yang_GZHU_task1_2 dml_kd_tta Weng2023 9 22.25 11 30 37 55.9 (55.6 - 56.2) 1.241 (1.184 - 1.298) 59.9 1.115
Zhang_NCUT_task1_1 Zhang1_NCUT Zhang2023 20 45.75 66 39 12 43.3 (43.0 - 43.6) 1.757 (1.692 - 1.821) 47.0 1.671
Zhang_SATLab_task1_4 SATLab_4 Bing2023 12 27.50 45 6 14 50.3 (50.0 - 50.6) 1.542 (1.483 - 1.601) 53.2 1.439

System complexity

Submission information Evaluation dataset Acoustic model System
Rank Submission label Technical
Report
Official
system
rank
Rank value Accuracy Logloss MACS Memory use Parameters Non-zero
parameters
Sparsity Complexity
management
AI4EDGE_IPL_task1_1 Almeida2023 28 30.25 51.9 1.920 25475456 62720 52852 51986 0.016385378036781972 weight quantization
AI4EDGE_IPL_task1_2 Almeida2023 47 43.75 48.8 1.996 29304736 53760 68996 67826 0.016957504782885935 weight quantization
AI4EDGE_IPL_task1_3 Almeida2023 38 37.25 50.8 1.364 26711936 49280 65192 64034 0.017762915695177295 weight quantization, pruning
Bai_JLESS_task1_1 Du2023 54 50.25 47.9 1.825 27931612 78252 78252 78252 0.0 model compression
Bai_JLESS_task1_2 Du2023 47 43.75 48.8 1.791 14130372 60458 60458 60458 0.0 model compression
Cai_TENCENT_task1_1 Cai2023 17 25.75 56.6 1.174 28840396 127684 127684 127684 0.0 weight quantization
Cai_TENCENT_task1_2 Cai2023 11 21.25 56.2 1.246 21990724 79942 79942 79942 0.0 weight quantization
Cai_TENCENT_task1_3 Cai2023 14 22.25 55.8 1.241 21990724 79942 79942 79942 0.0 weight quantization
Cai_TENCENT_task1_4 Cai2023 11 21.25 55.4 1.252 19533124 63558 63558 63558 0.0 weight quantization
Cai_XJTLU_task1_1 Cai2023a 9 19.75 51.9 1.307 1649349 6828 6828 6828 0.0 knowledge distillation, weight quantization
Cai_XJTLU_task1_2 Cai2023a 8 18.25 52.5 1.292 1649349 6828 6828 6828 0.0 weight quantization
Cai_XJTLU_task1_3 Cai2023a 6 14.00 55.1 1.223 3424245 15890 15890 15890 0.0 weight quantization
Cai_XJTLU_task1_4 Cai2023a 3 11.50 57.0 1.241 10219540 54260 54260 54260 0.0 weight quantization
Fei_vv_task1_1 Fei2023 15 24.75 55.2 1.282 13402932 123636 123636 123636 0.0 weight quantization
Fei_vv_task1_2 Fei2023 13 21.75 54.5 1.290 7802348 70588 70588 70588 0.0 weight quantization
Fei_vv_task1_3 Fei2023 26 29.25 53.2 1.349 13402932 123636 123636 123636 0.0 weight quantization
Fei_vv_task1_4 Fei2023 23 28.25 51.8 1.370 7802348 70588 70588 70588 0.0 weight quantization
Han_SZU_task1_1 Han2023 39 37.75 50.5 2.012 29.349M 80845 80845 80845 0.0 knowledge distillation
LAM_AEV_task1_1 Pham2023 25 29.00 55.3 1.847 29267550 88704 22962 22176 0.034230467729291836
LAM_AEV_task1_2 Pham2023 31 31.50 55.0 2.083 29267550 88704 22962 22176 0.034230467729291836
LAM_AEV_task1_3 Pham2023 20 27.00 55.6 1.933 29267550 88704 22962 22176 0.034230467729291836
Liang_NTES_task1_1 Liang2023 40 38.50 52.6 1.402 29591778 143345 31260 31260 0.0
MALACH23_JKU_task1_1 Pichler2023 8 18.25 57.0 1.230 14686940 119608 59804 59804 0.0
MALACH23_JKU_task1_2 Pichler2023 7 16.75 56.6 1.242 10819292 87160 43580 43580 0.0 efficient models
MALACH23_JKU_task1_3 Pichler2023 42 39.50 9.9 4.354 572340 116008 116648 29162 0.75 efficient models
MALACH23_JKU_task1_4 Pichler2023 43 39.75 9.8 3.224 214420 63592 15994 15994 0.0 efficient models
DCASE2023 baseline 52 46.75 44.8 1.523 29234920 65280 46512 46512 0.0 weight quantization
Park_KT_task1_1 Kim2023 10 20.75 56.3 1.495 19556096 250000 92070 92070 0.0 weight quantization
Park_KT_task1_2 Kim2023 29 30.75 53.0 1.660 19556096 250000 92070 92070 0.0 weight quantization
Park_KT_task1_3 Kim2023 26 29.25 54.2 2.230 19556096 250000 92070 92070 0.0 weight quantization
Park_KT_task1_4 Kim2023 20 27.00 49.2 1.510 617000 30000 20516 20516 0.0 weight quantization
Schmid_CPJKU_task1_1 Schmid2023 5 13.75 54.4 1.313 1582336 5722 5722 5722 0.0 knowledge distillation, weight quantization
Schmid_CPJKU_task1_2 Schmid2023 1 5.25 58.7 1.256 4354304 12310 12310 12310 0.0 knowledge distillation, weight quantization
Schmid_CPJKU_task1_3 Schmid2023 2 7.00 61.4 1.153 9638144 30106 30106 30106 0.0 knowledge distillation, weight quantization
Schmid_CPJKU_task1_4 Schmid2023 3 11.50 62.7 1.117 16803072 54182 54182 54182 0.0 knowledge distillation, weight quantization, structured pruning
Schmidt_FAU_task1_1 Schmidt2023 26 29.25 55.7 1.322 28931380 127988 127988 127988 0.0 weight quantization, pruning
Schmidt_FAU_task1_2 Schmidt2023 12 21.50 55.6 1.337 19910080 68456 68456 68456 0.0 weight quantization, pruning
Schmidt_FAU_task1_3 Schmidt2023 19 26.75 52.7 1.398 9996775 74700 74700 74700 0.0 weight quantization, pruning
Schmidt_FAU_task1_4 Schmidt2023 24 28.75 49.7 1.482 4938255 34616 34616 34616 0.0 weight quantization, pruning
Tan_NTU_task1_1 Tan2023 33 33.50 47.1 1.508 2960384 40960 37434 37306 0.0034193513917828433 weight quantization
Tan_NTU_task1_2 Tan2023 35 34.00 48.5 1.461 6462656 54304 54242 54098 0.002654769366911225 weight quantization
Tan_NTU_task1_3 Tan2023 37 37.00 46.3 1.492 6462656 54304 54242 54098 0.002654769366911225 weight quantization
Tan_SCUT_task1_1 Tan2023a 4 13.50 60.8 1.192 13180000 73386 73386 73386 0.0 weight quantization
Tan_SCUT_task1_2 Tan2023a 30 31.00 51.7 1.444 13180000 73386 73386 73386 0.0 weight quantization
Tan_SCUT_task1_3 Tan2023a 16 25.50 53.5 1.441 13180000 73386 73386 73386 0.0 weight quantization
Tan_SCUT_task1_4 Tan2023a 32 33.00 50.9 1.525 13180000 73386 73386 73386 0.0 weight quantization
Vo_DU_task1_1 Vo2023 52 46.75 45.0 2.157 15600000 503316 119526 119526 0.0 weight quantization
Vo_DU_task1_2 Vo2023 53 47.75 44.8 2.116 15600000 503316 119526 119526 0.0 weight quantization
Vo_DU_task1_3 Vo2023 50 46.25 45.2 2.092 15600000 503316 119526 119526 0.0 weight quantization
Vo_DU_task1_4 Vo2023 49 45.75 45.5 1.793 15600000 503316 119526 119526 0.0 weight quantization
Wang_SCUT_task1_1 Wang2023 31 31.50 49.1 1.493 8646000 62080 45164 45164 0.0 weight quantization
Wang_SCUT_task1_2 Wang2023 18 26.50 52.9 1.348 16746000 81150 56172 56172 0.0 weight quantization
Wang_SCUT_task1_3 Wang2023 46 43.50 47.1 1.702 25442000 82280 56556 56556 0.0 weight quantization
Wang_SCUT_task1_4 Wang2023 48 45.00 48.5 1.472 20902000 148618 121812 121812 0.0 weight quantization
XuQianHu_BIT&NUDT_task1_1 Yu2023 45 40.50 51.0 1.364 23803968 125885 52288 52288 0.0 weight quantization
XuQianHu_BIT&NUDT_task1_2 Yu2023 44 40.00 51.6 1.355 28400320 123654 51648 51648 0.0 weight quantization
XuQianHu_BIT&NUDT_task1_3 Yu2023 41 39.25 50.0 1.395 13402688 125650 57392 57392 0.0 weight quantization
XuQianHu_BIT&NUDT_task1_4 Yu2023 36 35.50 51.1 1.367 11878580 125057 66114 66114 0.0 weight quantization
Yang_GZHU_task1_1 Weng2023 15 24.75 55.5 1.280 23970000 76906 76906 76906 0.0 weight quantization
Yang_GZHU_task1_2 Weng2023 14 22.25 55.9 1.241 23970000 76906 76906 76906 0.0 weight quantization
Yang_GZHU_task1_3 Weng2023 21 27.25 55.1 1.279 23970000 76906 76906 76906 0.0 weight quantization
Yang_GZHU_task1_4 Weng2023 19 26.75 55.2 1.259 23970000 76906 76906 76906 0.0 weight quantization
Zhang_NCUT_task1_1 Zhang2023 49 45.75 43.3 1.757 7375000 574464 123648 123648 0.0 weight quantization
Zhang_NCUT_task1_2 Zhang2023 51 46.50 47.9 1.533 28461000 1622016 76224 76224 0.0 weight quantization
Zhang_SATLab_task1_1 Bing2023 26 29.25 48.8 3.248 3972096 7946 7946 7434 0.06443493581676318 weight quantization
Zhang_SATLab_task1_2 Bing2023 34 33.75 50.0 4.213 19466240 46232 46232 45996 0.005104689392628536 pruning, weight quantization
Zhang_SATLab_task1_3 Bing2023 27 30.00 51.9 1.704 23438336 54178 54178 53430 0.013806342057661736 pruning, weight quantization
Zhang_SATLab_task1_4 Bing2023 22 27.50 50.3 1.542 7944192 15892 15892 14868 0.06443493581676318 weight quantization


Energy Consumption

Submission information Evaluation dataset Acoustic model Normalized energy consumption Energy consumption (kWh)
Rank Submission label Technical
Report
Official
system
rank
Rank value Accuracy MACS Memory use Training Inference Training Inference
AI4EDGE_IPL_task1_1 Almeida2023 28 30.25 51.9 25475456 62720 0.7283 0.7906 0.2540 0.2340
AI4EDGE_IPL_task1_2 Almeida2023 47 43.75 48.8 29304736 53760 0.6146 0.6401 0.3010 0.2890
AI4EDGE_IPL_task1_3 Almeida2023 38 37.25 50.8 26711936 49280 0.6336 0.6655 0.2920 0.2780
Bai_JLESS_task1_1 Du2023 54 50.25 47.9 27931612 78252 2.5391 2.3548 0.1150 0.1240
Bai_JLESS_task1_2 Du2023 47 43.75 48.8 14130372 60458 2.5614 2.4132 0.1140 0.1210
Cai_TENCENT_task1_1 Cai2023 17 25.75 56.6 28840396 127684 0.0962 0.2854 0.8510 0.2870
Cai_TENCENT_task1_2 Cai2023 11 21.25 56.2 21990724 79942 0.1091 0.2925 0.7510 0.2800
Cai_TENCENT_task1_3 Cai2023 14 22.25 55.8 21990724 79942 0.1091 0.2925 0.7510 0.2800
Cai_TENCENT_task1_4 Cai2023 11 21.25 55.4 19533124 63558 0.1185 0.3302 0.6910 0.2480
Cai_XJTLU_task1_1 Cai2023a 9 19.75 51.9 1649349 6828 0.0010 0.2500 2.0300 0.0080
Cai_XJTLU_task1_2 Cai2023a 8 18.25 52.5 1649349 6828 0.0013 0.2500 1.5910 0.0080
Cai_XJTLU_task1_3 Cai2023a 6 14.00 55.1 3424245 15890 0.0012 0.1250 1.7280 0.0160
Cai_XJTLU_task1_4 Cai2023a 3 11.50 57.0 10219540 54260 0.0011 0.0952 1.7550 0.0210
Fei_vv_task1_1 Fei2023 15 24.75 55.2 13402932 123636 0.6742 8.0000 1.7800 0.1500
Fei_vv_task1_2 Fei2023 13 21.75 54.5 7802348 70588 0.7595 8.0000 1.5800 0.1500
Fei_vv_task1_3 Fei2023 26 29.25 53.2 13402932 123636 0.6742 8.0000 1.7800 0.1500
Fei_vv_task1_4 Fei2023 23 28.25 51.8 7802348 70588 0.7595 8.0000 1.5800 0.1500
Han_SZU_task1_1 Han2023 39 37.75 50.5 29.349M 80845 1.0845 308.0000 0.2840 0.0010
LAM_AEV_task1_1 Pham2023 25 29.00 55.3 29267550 88704 0.0276 0.5152 82.7400 4.4270
LAM_AEV_task1_2 Pham2023 31 31.50 55.0 29267550 88704 0.0138 0.5152 164.7400 4.4270
LAM_AEV_task1_3 Pham2023 20 27.00 55.6 29267550 88704 0.0138 0.5152 164.7400 4.4270
Liang_NTES_task1_1 Liang2023 40 38.50 52.6 29591778 143345 5.3234 0.1670
MALACH23_JKU_task1_1 Pichler2023 8 18.25 57.0 14686940 119608 0.0152 0.9375 2.9640 0.0480
MALACH23_JKU_task1_2 Pichler2023 7 16.75 56.6 10819292 87160 0.0185 0.9574 2.4310 0.0470
MALACH23_JKU_task1_3 Pichler2023 42 39.50 9.9 572340 116008 0.0507 5.7273 1.2420 0.0110
MALACH23_JKU_task1_4 Pichler2023 43 39.75 9.8 214420 63592 0.0098 1.1538 1.5300 0.0130
DCASE2023 baseline 52 46.75 44.8 29234920 65280 0.9669 1.0000 0.3020 0.2920
Park_KT_task1_1 Kim2023 10 20.75 56.3 19556096 250000 4.4282 56.5613 0.0530 0.0042
Park_KT_task1_2 Kim2023 29 30.75 53.0 19556096 250000 4.4282 56.5613 0.0530 0.0042
Park_KT_task1_3 Kim2023 26 29.25 54.2 19556096 250000 4.4282 56.5613 0.0530 0.0042
Park_KT_task1_4 Kim2023 20 27.00 49.2 617000 30000 248.0005 1287.1071 0.0009 0.0002
Schmid_CPJKU_task1_1 Schmid2023 5 13.75 54.4 1582336 5722 0.1249 7.1515 1.8890 0.0330
Schmid_CPJKU_task1_2 Schmid2023 1 5.25 58.7 4354304 12310 0.1204 6.7429 1.9600 0.0350
Schmid_CPJKU_task1_3 Schmid2023 2 7.00 61.4 9638144 30106 0.1111 6.3784 2.1250 0.0370
Schmid_CPJKU_task1_4 Schmid2023 3 11.50 62.7 16803072 54182 0.0663 6.5556 3.5600 0.0360
Schmidt_FAU_task1_1 Schmidt2023 26 29.25 55.7 28931380 127988 0.0678 50.1029 5.2587 0.0071
Schmidt_FAU_task1_2 Schmidt2023 12 21.50 55.6 19910080 68456 0.0747 49.8043 4.7693 0.0072
Schmidt_FAU_task1_3 Schmidt2023 19 26.75 52.7 9996775 74700 0.0745 44.6759 4.7830 0.0080
Schmidt_FAU_task1_4 Schmidt2023 24 28.75 49.7 4938255 34616 0.0726 53.7101 4.9099 0.0066
Tan_NTU_task1_1 Tan2023 33 33.50 47.1 2960384 40960 1.7114 8.8426 0.2692 0.0521
Tan_NTU_task1_2 Tan2023 35 34.00 48.5 6462656 54304 1.0784 4.3258 0.4272 0.1065
Tan_NTU_task1_3 Tan2023 37 37.00 46.3 6462656 54304 0.8352 4.3340 0.5516 0.1063
Tan_SCUT_task1_1 Tan2023a 4 13.50 60.8 13180000 73386 0.0543 7.6842 5.3790 0.0380
Tan_SCUT_task1_2 Tan2023a 30 31.00 51.7 13180000 73386 0.1614 9.6689 1.8090 0.0302
Tan_SCUT_task1_3 Tan2023a 16 25.50 53.5 13180000 73386 0.0553 8.5882 5.2810 0.0340
Tan_SCUT_task1_4 Tan2023a 32 33.00 50.9 13180000 73386 0.1614 9.1250 1.8090 0.0320
Vo_DU_task1_1 Vo2023 52 46.75 45.0 15600000 503316 0.0064 0.6503 3.3220 0.0326
Vo_DU_task1_2 Vo2023 53 47.75 44.8 15600000 503316 0.0068 0.9021 3.1220 0.0235
Vo_DU_task1_3 Vo2023 50 46.25 45.2 15600000 503316 0.0071 1.8120 2.9780 0.0117
Vo_DU_task1_4 Vo2023 49 45.75 45.5 15600000 503316 0.0121 17.6667 1.7560 0.0012
Wang_SCUT_task1_1 Wang2023 31 31.50 49.1 8646000 62080 1.0735 1.1542 0.2720 0.2530
Wang_SCUT_task1_2 Wang2023 18 26.50 52.9 16746000 81150 0.7449 0.7871 0.3920 0.3710
Wang_SCUT_task1_3 Wang2023 46 43.50 47.1 25442000 82280 0.8044 0.8439 0.3630 0.3460
Wang_SCUT_task1_4 Wang2023 48 45.00 48.5 20902000 148618 0.6173 0.8957 0.4730 0.3260
XuQianHu_BIT&NUDT_task1_1 Yu2023 45 40.50 51.0 23803968 125885 0.0622 2.0603 1.3576 0.0410
XuQianHu_BIT&NUDT_task1_2 Yu2023 44 40.00 51.6 28400320 123654 0.0504 2.0093 1.6751 0.0420
XuQianHu_BIT&NUDT_task1_3 Yu2023 41 39.25 50.0 13402688 125650 0.0439 2.2005 1.9233 0.0384
XuQianHu_BIT&NUDT_task1_4 Yu2023 36 35.50 51.1 11878580 125057 0.0439 2.2438 1.9208 0.0376
Yang_GZHU_task1_1 Weng2023 15 24.75 55.5 23970000 76906 0.0615 31.4878 21.0000 0.0410
Yang_GZHU_task1_2 Weng2023 14 22.25 55.9 23970000 76906 0.0615 5.3792 21.0000 0.2400
Yang_GZHU_task1_3 Weng2023 21 27.25 55.1 23970000 76906 0.0759 31.4878 17.0000 0.0410
Yang_GZHU_task1_4 Weng2023 19 26.75 55.2 23970000 76906 0.0759 5.3792 17.0000 0.2400
Zhang_NCUT_task1_1 Zhang2023 49 45.75 43.3 7375000 574464 0.8202 18.8125 0.3670 0.0160
Zhang_NCUT_task1_2 Zhang2023 51 46.50 47.9 28461000 1622016 0.7679 6.5435 0.3920 0.0460
Zhang_SATLab_task1_1 Bing2023 26 29.25 48.8 3972096 7946 1.8354 5.8000 0.0790 0.0250
Zhang_SATLab_task1_2 Bing2023 34 33.75 50.0 19466240 46232 0.2757 0.5142 0.5260 0.2820
Zhang_SATLab_task1_3 Bing2023 27 30.00 51.9 23438336 54178 0.9477 3.1522 0.1530 0.0460
Zhang_SATLab_task1_4 Bing2023 22 27.50 50.3 7944192 15892 0.2397 0.4723 0.6050 0.3070


Generalization performance

All results with evaluation dataset.

Submission information Overall Devices Cities
Evaluation dataset Unseen Seen Unseen Seen
Rank Submission label Technical
Report
Official
system
rank
Rank value Accuracy Logloss Accuracy Logloss Accuracy Logloss Accuracy Logloss Accuracy Logloss
AI4EDGE_IPL_task1_1 Almeida2023 28 30.25 51.9 1.920 47.0 2.370 56.1 1.544 51.6 1.931 52.3 1.905
AI4EDGE_IPL_task1_2 Almeida2023 47 43.75 48.8 1.996 42.4 2.391 54.1 1.667 49.2 1.847 48.8 2.005
AI4EDGE_IPL_task1_3 Almeida2023 38 37.25 50.8 1.364 45.3 1.528 55.4 1.227 52.5 1.302 50.7 1.372
Bai_JLESS_task1_1 Du2023 54 50.25 47.9 1.825 39.5 2.335 55.0 1.399 44.2 1.944 48.9 1.799
Bai_JLESS_task1_2 Du2023 47 43.75 48.8 1.791 40.6 2.225 55.6 1.430 44.0 1.817 50.0 1.794
Cai_TENCENT_task1_1 Cai2023 17 25.75 56.6 1.174 50.7 1.347 61.5 1.029 54.9 1.202 57.1 1.167
Cai_TENCENT_task1_2 Cai2023 11 21.25 56.2 1.246 51.8 1.369 59.8 1.143 55.1 1.275 56.5 1.242
Cai_TENCENT_task1_3 Cai2023 14 22.25 55.8 1.241 51.9 1.356 59.1 1.144 55.3 1.255 56.0 1.241
Cai_TENCENT_task1_4 Cai2023 11 21.25 55.4 1.252 51.3 1.358 58.7 1.164 55.7 1.259 55.3 1.255
Cai_XJTLU_task1_1 Cai2023a 9 19.75 51.9 1.307 49.1 1.390 54.2 1.237 50.9 1.325 52.1 1.306
Cai_XJTLU_task1_2 Cai2023a 8 18.25 52.5 1.292 49.7 1.389 55.0 1.211 53.5 1.285 52.5 1.298
Cai_XJTLU_task1_3 Cai2023a 6 14.00 55.1 1.223 51.6 1.331 58.0 1.133 56.2 1.218 55.0 1.226
Cai_XJTLU_task1_4 Cai2023a 3 11.50 57.0 1.241 53.5 1.385 59.9 1.121 56.8 1.259 57.5 1.234
Fei_vv_task1_1 Fei2023 15 24.75 55.2 1.282 51.7 1.393 58.2 1.190 54.0 1.298 55.8 1.274
Fei_vv_task1_2 Fei2023 13 21.75 54.5 1.290 50.9 1.410 57.5 1.191 52.9 1.296 55.2 1.281
Fei_vv_task1_3 Fei2023 26 29.25 53.2 1.349 49.9 1.457 56.1 1.259 51.5 1.373 54.1 1.337
Fei_vv_task1_4 Fei2023 23 28.25 51.8 1.370 48.8 1.460 54.2 1.295 49.9 1.389 52.6 1.359
Han_SZU_task1_1 Han2023 39 37.75 50.5 2.012 44.9 2.565 55.2 1.550 49.2 1.966 51.1 2.020
LAM_AEV_task1_1 Pham2023 25 29.00 55.3 1.847 49.6 2.329 60.1 1.445 55.4 1.884 55.6 1.849
LAM_AEV_task1_2 Pham2023 31 31.50 55.0 2.083 48.9 2.681 60.1 1.584 54.1 2.134 55.5 2.074
LAM_AEV_task1_3 Pham2023 20 27.00 55.6 1.933 49.6 2.447 60.5 1.504 54.9 1.937 56.0 1.938
Liang_NTES_task1_1 Liang2023 40 38.50 52.6 1.402 44.9 1.704 58.9 1.150 51.3 1.408 53.0 1.397
MALACH23_JKU_task1_1 Pichler2023 8 18.25 57.0 1.230 52.3 1.349 60.9 1.130 56.1 1.239 57.4 1.227
MALACH23_JKU_task1_2 Pichler2023 7 16.75 56.6 1.242 51.9 1.353 60.4 1.150 55.5 1.265 57.1 1.235
MALACH23_JKU_task1_3 Pichler2023 42 39.50 9.9 4.354 10.0 4.358 9.8 4.350 9.9 4.360 9.9 4.349
MALACH23_JKU_task1_4 Pichler2023 43 39.75 9.8 3.224 9.8 3.228 9.9 3.221 10.0 3.225 9.8 3.223
DCASE2023 baseline 52 46.75 44.8 1.523 38.0 1.730 50.5 1.351 42.4 1.563 45.6 1.513
Park_KT_task1_1 Kim2023 10 20.75 56.3 1.495 49.9 1.827 61.7 1.218 54.4 1.543 57.1 1.471
Park_KT_task1_2 Kim2023 29 30.75 53.0 1.660 46.2 2.056 58.7 1.329 51.7 1.720 53.9 1.629
Park_KT_task1_3 Kim2023 26 29.25 54.2 2.230 47.1 2.840 60.2 1.721 53.6 2.240 54.7 2.212
Park_KT_task1_4 Kim2023 20 27.00 49.2 1.510 45.8 1.573 52.0 1.457 49.2 1.480 49.4 1.512
Schmid_CPJKU_task1_1 Schmid2023 5 13.75 54.4 1.313 52.3 1.395 56.1 1.244 53.7 1.318 54.5 1.317
Schmid_CPJKU_task1_2 Schmid2023 1 5.25 58.7 1.256 54.6 1.445 62.1 1.099 59.4 1.215 58.8 1.264
Schmid_CPJKU_task1_3 Schmid2023 2 7.00 61.4 1.153 57.8 1.318 64.4 1.015 60.8 1.133 61.8 1.156
Schmid_CPJKU_task1_4 Schmid2023 3 11.50 62.7 1.117 57.4 1.334 67.1 0.936 61.2 1.152 63.4 1.107
Schmidt_FAU_task1_1 Schmidt2023 26 29.25 55.7 1.322 51.0 1.465 59.6 1.203 55.3 1.336 56.0 1.322
Schmidt_FAU_task1_2 Schmidt2023 12 21.50 55.6 1.337 50.8 1.488 59.5 1.210 55.6 1.347 55.8 1.336
Schmidt_FAU_task1_3 Schmidt2023 19 26.75 52.7 1.398 47.6 1.562 57.0 1.261 52.3 1.408 52.9 1.396
Schmidt_FAU_task1_4 Schmidt2023 24 28.75 49.7 1.482 44.8 1.650 53.8 1.342 50.0 1.489 50.1 1.477
Tan_NTU_task1_1 Tan2023 33 33.50 47.1 1.508 39.7 1.720 53.3 1.331 45.8 1.509 47.6 1.503
Tan_NTU_task1_2 Tan2023 35 34.00 48.5 1.461 41.4 1.634 54.4 1.317 47.4 1.466 49.2 1.454
Tan_NTU_task1_3 Tan2023 37 37.00 46.3 1.492 40.1 1.632 51.5 1.375 44.2 1.517 47.1 1.482
Tan_SCUT_task1_1 Tan2023a 4 13.50 60.8 1.192 55.7 1.420 65.1 1.002 59.3 1.249 61.5 1.180
Tan_SCUT_task1_2 Tan2023a 30 31.00 51.7 1.444 45.8 1.726 56.6 1.208 50.9 1.481 52.1 1.434
Tan_SCUT_task1_3 Tan2023a 16 25.50 53.5 1.441 48.6 1.675 57.5 1.247 52.9 1.487 53.9 1.431
Tan_SCUT_task1_4 Tan2023a 32 33.00 50.9 1.525 45.5 1.843 55.4 1.259 51.1 1.542 51.1 1.519
Vo_DU_task1_1 Vo2023 52 46.75 45.0 2.157 36.2 2.963 52.3 1.485 42.6 2.334 45.8 2.111
Vo_DU_task1_2 Vo2023 53 47.75 44.8 2.116 36.3 2.877 51.9 1.482 42.3 2.289 45.7 2.069
Vo_DU_task1_3 Vo2023 50 46.25 45.2 2.092 36.6 2.827 52.4 1.479 43.0 2.286 46.2 2.056
Vo_DU_task1_4 Vo2023 49 45.75 45.5 1.793 38.0 2.169 51.8 1.479 43.7 1.793 46.3 1.771
Wang_SCUT_task1_1 Wang2023 31 31.50 49.1 1.493 41.8 1.823 55.3 1.218 47.5 1.530 49.8 1.478
Wang_SCUT_task1_2 Wang2023 18 26.50 52.9 1.348 47.3 1.510 57.5 1.214 52.0 1.375 53.4 1.339
Wang_SCUT_task1_3 Wang2023 46 43.50 47.1 1.702 38.4 2.228 54.4 1.263 45.2 1.830 47.7 1.680
Wang_SCUT_task1_4 Wang2023 48 45.00 48.5 1.472 42.9 1.654 53.1 1.321 45.1 1.556 49.4 1.455
XuQianHu_BIT&NUDT_task1_1 Yu2023 45 40.50 51.0 1.364 47.5 1.450 53.9 1.292 50.3 1.377 51.0 1.365
XuQianHu_BIT&NUDT_task1_2 Yu2023 44 40.00 51.6 1.355 47.4 1.455 55.1 1.271 51.0 1.369 51.6 1.357
XuQianHu_BIT&NUDT_task1_3 Yu2023 41 39.25 50.0 1.395 45.5 1.509 53.8 1.301 49.5 1.408 50.2 1.396
XuQianHu_BIT&NUDT_task1_4 Yu2023 36 35.50 51.1 1.367 47.6 1.450 54.1 1.298 49.4 1.400 51.6 1.363
Yang_GZHU_task1_1 Weng2023 15 24.75 55.5 1.280 52.6 1.409 58.0 1.172 55.6 1.278 55.9 1.276
Yang_GZHU_task1_2 Weng2023 14 22.25 55.9 1.241 53.2 1.352 58.2 1.149 56.0 1.242 56.3 1.237
Yang_GZHU_task1_3 Weng2023 21 27.25 55.1 1.279 52.1 1.406 57.6 1.172 56.7 1.253 55.2 1.283
Yang_GZHU_task1_4 Weng2023 19 26.75 55.2 1.259 52.4 1.364 57.5 1.172 56.8 1.226 55.2 1.263
Zhang_NCUT_task1_1 Zhang2023 49 45.75 43.3 1.757 33.3 2.083 51.6 1.485 40.8 1.831 44.0 1.736
Zhang_NCUT_task1_2 Zhang2023 51 46.50 47.9 1.533 39.7 1.813 54.7 1.300 46.4 1.570 48.2 1.530
Zhang_SATLab_task1_1 Bing2023 26 29.25 48.8 3.248 39.2 5.085 56.8 1.717 48.3 3.481 49.2 3.229
Zhang_SATLab_task1_2 Bing2023 34 33.75 50.0 4.213 40.6 6.653 57.7 2.179 48.7 4.433 50.4 4.145
Zhang_SATLab_task1_3 Bing2023 27 30.00 51.9 1.704 41.8 2.237 60.3 1.261 50.7 1.823 52.4 1.679
Zhang_SATLab_task1_4 Bing2023 22 27.50 50.3 1.542 41.6 1.802 57.6 1.325 49.8 1.585 50.7 1.532

Class-wise performance

Accuracy

Rank Submission label Technical
Report
Official
system
rank
Rank value Accuracy Airport Bus Metro Metro
station
Park Public
square
Shopping
mall
Street
pedestrian
Street
traffic
Tram
AI4EDGE_IPL_task1_1 Almeida2023 28 30.25 51.9 45.9 72.9 46.4 44.6 73.2 26.7 59.7 27.9 57.3 64.7
AI4EDGE_IPL_task1_2 Almeida2023 47 43.75 48.8 40.8 72.2 51.1 40.0 73.1 21.3 43.6 31.8 59.7 54.5
AI4EDGE_IPL_task1_3 Almeida2023 38 37.25 50.8 49.4 67.4 52.5 38.1 82.0 28.4 39.2 31.3 56.5 63.2
Bai_JLESS_task1_1 Du2023 54 50.25 47.9 25.5 60.8 51.2 51.7 69.0 22.9 52.3 24.9 65.8 55.0
Bai_JLESS_task1_2 Du2023 47 43.75 48.8 29.4 67.1 49.8 44.5 70.1 25.9 51.9 22.9 70.2 56.2
Cai_TENCENT_task1_1 Cai2023 17 25.75 56.6 40.5 59.7 61.8 57.8 79.9 35.6 54.8 35.3 73.4 67.4
Cai_TENCENT_task1_2 Cai2023 11 21.25 56.2 34.4 64.5 58.6 62.0 83.4 36.0 56.7 31.1 68.8 66.3
Cai_TENCENT_task1_3 Cai2023 14 22.25 55.8 34.4 64.9 54.2 56.9 81.7 33.6 57.1 36.3 69.9 69.3
Cai_TENCENT_task1_4 Cai2023 11 21.25 55.4 34.6 60.5 55.1 58.6 78.9 35.5 54.0 36.1 71.9 68.5
Cai_XJTLU_task1_1 Cai2023a 9 19.75 51.9 34.8 66.1 46.4 43.4 79.2 28.4 61.7 33.8 67.4 57.7
Cai_XJTLU_task1_2 Cai2023a 8 18.25 52.5 31.6 66.0 50.0 48.1 78.7 31.1 61.4 33.6 66.4 58.5
Cai_XJTLU_task1_3 Cai2023a 6 14.00 55.1 39.7 64.2 56.0 49.4 80.2 33.7 64.2 34.0 64.7 64.7
Cai_XJTLU_task1_4 Cai2023a 3 11.50 57.0 44.7 68.7 57.6 52.5 83.0 36.8 52.4 34.5 70.5 69.4
Fei_vv_task1_1 Fei2023 15 24.75 55.2 45.6 71.9 58.1 45.8 76.3 31.0 59.7 31.7 67.0 65.3
Fei_vv_task1_2 Fei2023 13 21.75 54.5 37.1 73.4 54.7 45.1 77.7 28.1 60.4 36.6 67.5 64.4
Fei_vv_task1_3 Fei2023 26 29.25 53.2 57.9 84.5 54.4 42.5 73.1 24.9 54.2 27.8 62.0 51.3
Fei_vv_task1_4 Fei2023 23 28.25 51.8 56.6 83.2 54.8 38.8 79.2 21.2 59.5 18.2 61.7 44.4
Han_SZU_task1_1 Han2023 39 37.75 50.5 39.7 60.8 54.6 43.5 80.6 28.0 51.6 32.1 62.8 51.8
LAM_AEV_task1_1 Pham2023 25 29.00 55.3 50.1 80.2 52.2 40.9 74.9 43.8 57.8 23.6 69.4 60.4
LAM_AEV_task1_2 Pham2023 31 31.50 55.0 35.6 79.9 44.5 54.3 82.3 33.4 58.0 32.0 68.4 61.7
LAM_AEV_task1_3 Pham2023 20 27.00 55.6 47.7 77.1 48.6 52.8 78.1 38.0 56.8 27.0 67.0 62.9
Liang_NTES_task1_1 Liang2023 40 38.50 52.6 38.8 72.7 47.9 45.1 74.8 27.2 56.8 36.5 56.2 69.7
MALACH23_JKU_task1_1 Pichler2023 8 18.25 57.0 34.8 72.3 60.5 55.4 85.2 37.0 51.0 35.3 68.9 69.6
MALACH23_JKU_task1_2 Pichler2023 7 16.75 56.6 34.8 69.7 54.0 57.8 81.4 37.0 51.2 40.2 69.3 70.0
MALACH23_JKU_task1_3 Pichler2023 42 39.50 9.9 12.0 11.7 8.7 9.4 12.4 8.2 11.6 12.3 7.4 5.2
MALACH23_JKU_task1_4 Pichler2023 43 39.75 9.8 16.6 10.3 7.2 9.3 11.9 6.5 13.0 4.8 9.8 9.1
DCASE2023 baseline 52 46.75 44.8 40.5 37.8 48.8 38.9 58.2 22.4 54.1 29.8 58.0 59.7
Park_KT_task1_1 Kim2023 10 20.75 56.3 36.2 80.8 59.8 55.1 70.1 29.6 52.6 37.4 69.5 72.1
Park_KT_task1_2 Kim2023 29 30.75 53.0 26.6 75.3 53.1 49.7 66.5 29.7 62.2 35.1 63.8 68.3
Park_KT_task1_3 Kim2023 26 29.25 54.2 38.1 78.5 57.0 52.5 73.1 29.4 45.4 33.3 67.9 67.1
Park_KT_task1_4 Kim2023 20 27.00 49.2 48.7 75.7 48.3 28.2 76.2 29.5 46.4 30.3 49.9 59.0
Schmid_CPJKU_task1_1 Schmid2023 5 13.75 54.4 56.4 72.1 58.0 44.2 82.9 32.1 58.7 14.7 70.2 54.3
Schmid_CPJKU_task1_2 Schmid2023 1 5.25 58.7 49.9 74.0 59.4 48.6 87.2 35.7 59.9 26.4 70.7 75.4
Schmid_CPJKU_task1_3 Schmid2023 2 7.00 61.4 50.4 78.0 65.6 60.3 84.7 35.1 65.7 33.6 69.6 71.3
Schmid_CPJKU_task1_4 Schmid2023 3 11.50 62.7 48.0 88.1 70.1 62.8 84.3 35.7 56.9 35.7 75.6 69.8
Schmidt_FAU_task1_1 Schmidt2023 26 29.25 55.7 40.0 66.2 59.3 53.7 78.8 36.8 56.3 31.7 71.9 62.2
Schmidt_FAU_task1_2 Schmidt2023 12 21.50 55.6 32.5 65.4 61.1 55.1 86.9 32.3 47.7 41.8 69.0 64.1
Schmidt_FAU_task1_3 Schmidt2023 19 26.75 52.7 34.0 67.6 59.7 51.8 78.1 25.4 53.3 37.1 64.5 55.2
Schmidt_FAU_task1_4 Schmidt2023 24 28.75 49.7 31.5 59.8 61.0 49.2 68.5 31.2 67.8 17.3 57.3 53.5
Tan_NTU_task1_1 Tan2023 33 33.50 47.1 33.8 63.3 41.0 39.7 62.8 21.8 58.3 28.5 61.4 60.3
Tan_NTU_task1_2 Tan2023 35 34.00 48.5 35.8 71.5 44.0 37.1 70.5 23.1 60.7 25.7 61.7 54.7
Tan_NTU_task1_3 Tan2023 37 37.00 46.3 40.6 55.1 45.2 48.2 75.9 13.7 58.4 14.1 63.6 48.5
Tan_SCUT_task1_1 Tan2023a 4 13.50 60.8 42.7 76.8 62.5 49.7 83.6 44.3 60.1 43.4 73.5 71.7
Tan_SCUT_task1_2 Tan2023a 30 31.00 51.7 37.7 70.8 47.8 42.0 75.3 32.7 50.0 32.5 67.0 61.3
Tan_SCUT_task1_3 Tan2023a 16 25.50 53.5 38.5 70.9 49.3 39.8 75.0 40.0 56.4 34.0 68.4 62.5
Tan_SCUT_task1_4 Tan2023a 32 33.00 50.9 35.5 79.1 42.7 49.4 74.3 31.3 42.9 38.1 65.7 50.2
Vo_DU_task1_1 Vo2023 52 46.75 45.0 45.6 58.1 40.8 39.1 65.9 20.0 39.6 26.7 62.5 51.3
Vo_DU_task1_2 Vo2023 53 47.75 44.8 42.6 59.5 40.9 40.2 64.9 22.0 35.8 28.1 65.2 48.8
Vo_DU_task1_3 Vo2023 50 46.25 45.2 45.2 60.1 38.6 42.3 68.4 21.9 40.9 23.8 61.8 49.2
Vo_DU_task1_4 Vo2023 49 45.75 45.5 41.5 50.4 57.0 29.1 71.7 16.6 43.8 34.0 61.7 49.5
Wang_SCUT_task1_1 Wang2023 31 31.50 49.1 52.9 68.7 43.2 29.2 71.3 23.1 42.3 34.5 58.5 67.7
Wang_SCUT_task1_2 Wang2023 18 26.50 52.9 58.1 63.7 44.0 28.7 71.7 35.4 54.1 34.1 63.1 75.8
Wang_SCUT_task1_3 Wang2023 46 43.50 47.1 24.5 66.6 62.7 69.1 68.4 21.1 36.7 24.4 63.3 34.5
Wang_SCUT_task1_4 Wang2023 48 45.00 48.5 45.9 48.3 37.3 49.7 79.3 23.5 55.1 23.2 65.8 56.6
XuQianHu_BIT&NUDT_task1_1 Yu2023 45 40.50 51.0 16.2 59.7 52.8 54.0 81.3 30.4 55.8 42.3 65.2 52.4
XuQianHu_BIT&NUDT_task1_2 Yu2023 44 40.00 51.6 22.0 73.9 47.4 57.2 79.9 33.5 53.5 33.9 60.7 53.8
XuQianHu_BIT&NUDT_task1_3 Yu2023 41 39.25 50.0 22.4 67.5 50.1 56.4 82.3 31.6 52.1 26.4 59.0 52.6
XuQianHu_BIT&NUDT_task1_4 Yu2023 36 35.50 51.1 24.2 67.2 46.5 59.4 75.2 31.8 58.2 33.0 64.5 51.5
Yang_GZHU_task1_1 Weng2023 15 24.75 55.5 40.3 62.4 60.1 50.2 77.2 33.2 60.6 32.8 73.9 64.7
Yang_GZHU_task1_2 Weng2023 14 22.25 55.9 40.8 60.9 58.9 48.9 78.9 34.7 61.2 32.6 74.1 68.2
Yang_GZHU_task1_3 Weng2023 21 27.25 55.1 42.1 74.0 52.2 43.3 82.7 36.9 62.2 26.5 68.9 62.4
Yang_GZHU_task1_4 Weng2023 19 26.75 55.2 44.7 72.8 49.4 42.4 82.2 35.3 61.5 28.0 70.4 65.1
Zhang_NCUT_task1_1 Zhang2023 49 45.75 43.3 26.0 54.3 49.7 43.6 62.7 18.8 50.2 17.1 60.8 49.6
Zhang_NCUT_task1_2 Zhang2023 51 46.50 47.9 26.9 71.4 44.0 35.1 79.9 32.3 46.5 33.9 59.1 49.7
Zhang_SATLab_task1_1 Bing2023 26 29.25 48.8 35.6 76.1 39.6 36.0 77.7 33.4 50.4 34.5 57.4 47.1
Zhang_SATLab_task1_2 Bing2023 34 33.75 50.0 37.5 72.7 46.6 39.9 76.1 29.9 51.9 31.8 58.6 54.7
Zhang_SATLab_task1_3 Bing2023 27 30.00 51.9 36.7 78.5 48.9 39.8 82.0 31.3 54.1 38.9 54.9 53.9
Zhang_SATLab_task1_4 Bing2023 22 27.50 50.3 33.6 75.0 42.7 37.4 84.9 29.4 52.1 41.7 55.3 51.0

Log loss

Rank Submission label Technical
Report
Official
system
rank
Rank value Logloss Airport Bus Metro Metro
station
Park Public
square
Shopping
mall
Street
pedestrian
Street
traffic
Tram
AI4EDGE_IPL_task1_1 Almeida2023 28 30.25 1.920 1.521 1.162 1.712 2.025 2.107 2.513 2.220 2.167 2.708 1.062
AI4EDGE_IPL_task1_2 Almeida2023 47 43.75 1.996 1.979 1.132 1.686 1.710 1.728 2.664 3.129 2.401 2.236 1.295
AI4EDGE_IPL_task1_3 Almeida2023 38 37.25 1.364 1.382 0.939 1.269 1.773 0.632 1.946 1.623 1.773 1.293 1.007
Bai_JLESS_task1_1 Du2023 54 50.25 1.825 2.324 1.215 1.402 1.546 1.724 2.876 1.531 2.857 1.436 1.336
Bai_JLESS_task1_2 Du2023 47 43.75 1.791 2.241 1.049 1.574 1.965 1.239 2.638 1.684 2.946 1.194 1.379
Cai_TENCENT_task1_1 Cai2023 17 25.75 1.174 1.446 1.060 0.971 1.180 0.646 1.776 1.219 1.695 0.862 0.883
Cai_TENCENT_task1_2 Cai2023 11 21.25 1.246 1.628 0.988 1.103 1.149 0.629 1.805 1.266 1.903 1.039 0.948
Cai_TENCENT_task1_3 Cai2023 14 22.25 1.241 1.598 0.986 1.188 1.275 0.666 1.775 1.240 1.770 1.013 0.894
Cai_TENCENT_task1_4 Cai2023 11 21.25 1.252 1.599 1.096 1.161 1.244 0.766 1.723 1.307 1.752 0.954 0.918
Cai_XJTLU_task1_1 Cai2023a 9 19.75 1.307 1.535 1.008 1.317 1.596 0.750 1.859 1.153 1.715 1.005 1.129
Cai_XJTLU_task1_2 Cai2023a 8 18.25 1.292 1.609 0.974 1.221 1.451 0.750 1.826 1.150 1.761 1.077 1.096
Cai_XJTLU_task1_3 Cai2023a 6 14.00 1.223 1.446 0.998 1.103 1.394 0.655 1.824 1.043 1.729 1.107 0.932
Cai_XJTLU_task1_4 Cai2023a 3 11.50 1.241 1.502 0.908 1.126 1.429 0.562 1.873 1.320 1.889 0.955 0.849
Fei_vv_task1_1 Fei2023 15 24.75 1.282 1.418 0.835 1.178 1.567 0.753 1.959 1.141 1.884 1.100 0.986
Fei_vv_task1_2 Fei2023 13 21.75 1.290 1.636 0.796 1.238 1.551 0.729 1.977 1.146 1.719 1.117 0.995
Fei_vv_task1_3 Fei2023 26 29.25 1.349 1.592 0.695 1.491 1.644 0.930 2.276 1.159 1.603 1.031 1.069
Fei_vv_task1_4 Fei2023 23 28.25 1.370 1.671 0.818 1.469 1.730 0.702 2.297 0.941 1.847 1.003 1.226
Han_SZU_task1_1 Han2023 39 37.75 2.012 2.062 1.571 1.586 2.090 0.780 3.228 2.380 2.773 2.029 1.616
LAM_AEV_task1_1 Pham2023 25 29.00 1.847 1.943 0.801 1.551 2.243 1.235 2.526 1.813 3.416 1.577 1.364
LAM_AEV_task1_2 Pham2023 31 31.50 2.083 2.826 0.828 2.244 1.996 0.842 3.829 2.003 3.330 1.515 1.416
LAM_AEV_task1_3 Pham2023 20 27.00 1.933 2.060 0.936 1.820 1.912 1.074 3.087 1.947 3.500 1.652 1.339
Liang_NTES_task1_1 Liang2023 40 38.50 1.402 1.610 0.783 1.515 1.710 0.845 2.014 1.272 1.822 1.588 0.861
MALACH23_JKU_task1_1 Pichler2023 8 18.25 1.230 1.676 0.883 1.110 1.313 0.596 1.672 1.381 1.755 0.997 0.913
MALACH23_JKU_task1_2 Pichler2023 7 16.75 1.242 1.661 0.953 1.239 1.262 0.716 1.637 1.350 1.659 1.000 0.945
MALACH23_JKU_task1_3 Pichler2023 42 39.50 4.354 4.368 4.889 4.315 3.618 5.426 3.395 4.872 3.433 4.845 4.376
MALACH23_JKU_task1_4 Pichler2023 43 39.75 3.224 3.160 3.690 3.201 2.570 3.634 2.692 3.373 2.840 3.927 3.157
DCASE2023 baseline 52 46.75 1.523 1.487 1.585 1.294 1.689 1.452 2.085 1.360 1.817 1.365 1.100
Park_KT_task1_1 Kim2023 10 20.75 1.495 2.157 0.591 1.287 1.530 1.067 2.541 1.623 2.210 1.106 0.836
Park_KT_task1_2 Kim2023 29 30.75 1.660 2.786 0.766 1.556 1.807 1.128 2.542 1.342 2.335 1.342 0.993
Park_KT_task1_3 Kim2023 26 29.25 2.230 2.871 0.933 2.170 2.357 1.302 3.469 2.722 3.498 1.467 1.509
Park_KT_task1_4 Kim2023 20 27.00 1.510 1.618 1.014 1.459 1.883 1.047 1.889 1.598 1.797 1.535 1.254
Schmid_CPJKU_task1_1 Schmid2023 5 13.75 1.313 1.219 0.901 1.115 1.634 0.584 2.156 1.080 2.250 0.989 1.198
Schmid_CPJKU_task1_2 Schmid2023 1 5.25 1.256 1.421 0.842 1.175 1.596 0.425 2.134 1.140 2.266 0.906 0.657
Schmid_CPJKU_task1_3 Schmid2023 2 7.00 1.153 1.364 0.697 1.009 1.193 0.470 2.095 0.960 1.970 0.988 0.783
Schmid_CPJKU_task1_4 Schmid2023 3 11.50 1.117 1.498 0.388 0.873 1.087 0.518 2.034 1.233 1.950 0.774 0.814
Schmidt_FAU_task1_1 Schmidt2023 26 29.25 1.322 1.770 1.142 1.253 1.397 0.802 1.839 1.263 1.801 0.949 1.007
Schmidt_FAU_task1_2 Schmidt2023 12 21.50 1.337 1.930 1.175 1.199 1.374 0.583 1.982 1.532 1.568 1.041 0.980
Schmidt_FAU_task1_3 Schmidt2023 19 26.75 1.398 1.837 1.101 1.207 1.449 0.845 2.227 1.311 1.671 1.191 1.137
Schmidt_FAU_task1_4 Schmidt2023 24 28.75 1.482 2.030 1.351 1.158 1.477 1.132 2.101 1.012 2.060 1.370 1.127
Tan_NTU_task1_1 Tan2023 33 33.50 1.508 1.675 1.089 1.389 1.687 1.253 2.095 1.402 2.048 1.316 1.124
Tan_NTU_task1_2 Tan2023 35 34.00 1.461 1.614 0.956 1.376 1.719 1.078 2.011 1.408 1.973 1.282 1.192
Tan_NTU_task1_3 Tan2023 37 37.00 1.492 1.547 1.311 1.363 1.519 1.044 2.122 1.392 2.074 1.181 1.363
Tan_SCUT_task1_1 Tan2023a 4 13.50 1.192 1.627 0.664 1.014 1.608 0.537 1.663 1.148 2.026 0.818 0.818
Tan_SCUT_task1_2 Tan2023a 30 31.00 1.444 1.789 0.889 1.294 1.656 0.864 1.979 1.537 2.237 1.116 1.075
Tan_SCUT_task1_3 Tan2023a 16 25.50 1.441 1.767 0.869 1.312 1.922 0.912 1.908 1.320 2.234 1.055 1.112
Tan_SCUT_task1_4 Tan2023a 32 33.00 1.525 1.938 0.654 1.470 1.528 0.993 2.129 1.728 2.201 1.202 1.405
Vo_DU_task1_1 Vo2023 52 46.75 2.157 1.492 1.532 2.737 1.898 1.042 3.724 2.305 4.132 1.336 1.373
Vo_DU_task1_2 Vo2023 53 47.75 2.116 1.603 1.474 2.640 1.871 1.086 3.194 2.548 4.103 1.230 1.412
Vo_DU_task1_3 Vo2023 50 46.25 2.092 1.541 1.508 2.746 1.762 0.993 3.112 2.224 4.321 1.325 1.387
Vo_DU_task1_4 Vo2023 49 45.75 1.793 1.633 1.567 1.534 2.211 0.976 3.063 1.866 2.395 1.295 1.390
Wang_SCUT_task1_1 Wang2023 31 31.50 1.493 1.234 0.904 1.480 2.014 1.130 2.168 1.615 2.120 1.312 0.957
Wang_SCUT_task1_2 Wang2023 18 26.50 1.348 1.130 0.989 1.427 2.059 1.186 1.579 1.191 1.818 1.317 0.788
Wang_SCUT_task1_3 Wang2023 46 43.50 1.702 2.422 0.984 1.010 0.940 1.218 2.515 2.111 2.914 1.289 1.614
Wang_SCUT_task1_4 Wang2023 48 45.00 1.472 1.363 1.565 1.891 1.454 0.760 1.956 1.241 2.119 1.053 1.321
XuQianHu_BIT&NUDT_task1_1 Yu2023 45 40.50 1.364 1.867 1.240 1.337 1.412 0.716 1.842 1.278 1.569 1.140 1.235
XuQianHu_BIT&NUDT_task1_2 Yu2023 44 40.00 1.355 1.827 0.902 1.344 1.339 0.746 1.851 1.326 1.764 1.279 1.170
XuQianHu_BIT&NUDT_task1_3 Yu2023 41 39.25 1.395 1.845 1.033 1.291 1.324 0.678 1.955 1.348 1.963 1.314 1.202
XuQianHu_BIT&NUDT_task1_4 Yu2023 36 35.50 1.367 1.743 1.083 1.413 1.299 0.875 1.877 1.214 1.738 1.177 1.255
Yang_GZHU_task1_1 Weng2023 15 24.75 1.280 1.519 1.091 1.098 1.464 0.741 2.027 1.113 1.916 0.861 0.968
Yang_GZHU_task1_2 Weng2023 14 22.25 1.241 1.467 1.099 1.100 1.475 0.680 1.920 1.111 1.867 0.845 0.849
Yang_GZHU_task1_3 Weng2023 21 27.25 1.279 1.443 0.753 1.287 1.630 0.558 1.981 1.092 2.067 0.970 1.005
Yang_GZHU_task1_4 Weng2023 19 26.75 1.259 1.378 0.795 1.340 1.627 0.591 1.937 1.120 1.972 0.913 0.920
Zhang_NCUT_task1_1 Zhang2023 49 45.75 1.757 2.159 1.382 1.482 1.701 1.165 2.575 1.538 2.788 1.292 1.484
Zhang_NCUT_task1_2 Zhang2023 51 46.50 1.533 2.046 0.927 1.587 1.903 0.735 2.025 1.641 1.830 1.301 1.337
Zhang_SATLab_task1_1 Bing2023 26 29.25 3.248 4.626 1.072 2.657 4.508 1.552 3.713 3.538 5.323 3.042 2.446
Zhang_SATLab_task1_2 Bing2023 34 33.75 4.213 4.392 1.559 3.338 4.560 2.582 6.580 4.340 7.767 4.126 2.881
Zhang_SATLab_task1_3 Bing2023 27 30.00 1.704 1.810 0.713 1.606 2.119 0.817 2.119 1.555 1.766 2.093 2.443
Zhang_SATLab_task1_4 Bing2023 22 27.50 1.542 1.876 0.816 1.767 1.824 0.643 2.127 1.511 1.712 1.551 1.595

Device-wise performance

Accuracy

Unseen devices Seen devices
Rank Submission label Technical
Report
Official
system
rank
Rank value Accuracy Accuracy /
Unseen
Accuracy /
Seen
D S7 S8 S9 S10 A B C S1 S2 S3
AI4EDGE_IPL_task1_1 Almeida2023 28 30.25 51.9 47.0 56.1 32.2 51.5 50.8 50.5 49.8 59.3 55.9 58.6 56.5 51.1 55.1
AI4EDGE_IPL_task1_2 Almeida2023 47 43.75 48.8 42.4 54.1 30.3 46.6 44.9 44.1 46.1 56.0 55.1 56.2 52.8 50.8 53.9
AI4EDGE_IPL_task1_3 Almeida2023 38 37.25 50.8 45.3 55.4 42.7 46.4 45.9 44.6 46.9 58.2 55.2 56.5 54.4 52.2 55.9
Bai_JLESS_task1_1 Du2023 54 50.25 47.9 39.5 55.0 29.8 48.8 47.3 34.3 37.3 64.5 56.0 60.6 49.3 49.3 50.0
Bai_JLESS_task1_2 Du2023 47 43.75 48.8 40.6 55.6 31.5 48.6 47.8 35.1 40.0 64.6 56.6 61.8 49.9 49.8 50.8
Cai_TENCENT_task1_1 Cai2023 17 25.75 56.6 50.7 61.5 46.0 57.2 58.0 45.1 47.4 69.1 60.3 65.0 57.7 57.9 59.2
Cai_TENCENT_task1_2 Cai2023 11 21.25 56.2 51.8 59.8 37.6 57.7 57.6 53.4 52.9 66.0 57.6 61.5 57.0 58.3 58.4
Cai_TENCENT_task1_3 Cai2023 14 22.25 55.8 51.9 59.1 39.3 58.0 57.1 52.5 52.5 65.8 56.3 61.7 56.3 56.3 58.4
Cai_TENCENT_task1_4 Cai2023 11 21.25 55.4 51.3 58.7 39.2 56.5 55.8 52.8 52.4 65.1 56.7 61.2 56.1 56.0 57.3
Cai_XJTLU_task1_1 Cai2023a 9 19.75 51.9 49.1 54.2 44.6 54.3 52.7 46.5 47.3 60.6 52.7 56.2 52.7 50.4 52.7
Cai_XJTLU_task1_2 Cai2023a 8 18.25 52.5 49.7 55.0 44.0 54.5 54.5 47.6 47.7 62.6 51.6 56.5 53.6 51.3 54.1
Cai_XJTLU_task1_3 Cai2023a 6 14.00 55.1 51.6 58.0 42.9 57.3 57.2 50.4 50.3 65.0 54.7 58.7 57.2 55.6 56.7
Cai_XJTLU_task1_4 Cai2023a 3 11.50 57.0 53.5 59.9 46.2 58.3 58.4 52.7 51.9 66.5 57.5 59.8 57.6 59.0 59.2
Fei_vv_task1_1 Fei2023 15 24.75 55.2 51.7 58.2 44.9 57.6 53.7 53.1 49.4 64.8 57.1 60.4 56.0 55.1 55.8
Fei_vv_task1_2 Fei2023 13 21.75 54.5 50.9 57.5 42.8 56.8 54.0 52.3 48.4 64.8 55.1 59.4 56.7 53.9 55.2
Fei_vv_task1_3 Fei2023 26 29.25 53.2 49.9 56.1 42.6 55.6 53.3 50.8 47.0 62.2 55.9 58.8 54.1 53.0 52.4
Fei_vv_task1_4 Fei2023 23 28.25 51.8 48.8 54.2 41.2 54.2 53.7 48.8 46.3 60.5 52.1 56.8 53.5 50.7 51.5
Han_SZU_task1_1 Han2023 39 37.75 50.5 44.9 55.2 41.0 51.3 50.0 38.3 43.8 65.0 52.6 59.0 51.9 50.0 53.1
LAM_AEV_task1_1 Pham2023 25 29.00 55.3 49.6 60.1 43.6 57.7 57.3 39.4 50.3 67.8 56.6 64.3 58.7 53.8 59.3
LAM_AEV_task1_2 Pham2023 31 31.50 55.0 48.9 60.1 44.7 56.0 54.5 38.8 50.6 68.0 56.4 63.8 56.9 56.4 58.9
LAM_AEV_task1_3 Pham2023 20 27.00 55.6 49.6 60.5 44.6 58.7 55.4 39.7 49.8 69.2 56.4 64.6 57.9 56.0 59.1
Liang_NTES_task1_1 Liang2023 40 38.50 52.6 44.9 58.9 32.5 54.4 53.9 39.8 44.0 67.5 58.2 63.7 57.0 52.2 55.0
MALACH23_JKU_task1_1 Pichler2023 8 18.25 57.0 52.3 60.9 44.7 57.7 59.1 49.5 50.6 68.4 57.3 63.5 58.0 57.9 60.2
MALACH23_JKU_task1_2 Pichler2023 7 16.75 56.6 51.9 60.4 42.9 56.6 58.3 51.4 50.4 67.7 56.7 63.1 57.7 57.7 59.7
MALACH23_JKU_task1_3 Pichler2023 42 39.50 9.9 10.0 9.8 9.8 10.1 9.6 9.9 10.6 10.1 9.9 9.8 9.9 9.5 9.8
MALACH23_JKU_task1_4 Pichler2023 43 39.75 9.8 9.8 9.9 9.3 10.2 9.9 9.6 10.0 9.9 9.9 9.8 9.8 9.6 10.2
DCASE2023 baseline 52 46.75 44.8 38.0 50.5 34.7 46.2 43.6 31.1 34.2 60.4 50.1 54.9 45.1 45.0 47.7
Park_KT_task1_1 Kim2023 10 20.75 56.3 49.9 61.7 33.3 60.3 59.3 50.3 46.4 68.8 60.7 63.8 59.0 57.6 60.2
Park_KT_task1_2 Kim2023 29 30.75 53.0 46.2 58.7 34.6 54.9 52.5 45.7 43.2 64.1 58.4 61.3 56.2 56.2 56.2
Park_KT_task1_3 Kim2023 26 29.25 54.2 47.1 60.2 32.6 56.6 56.2 47.0 43.2 67.3 59.0 61.3 57.3 56.8 59.3
Park_KT_task1_4 Kim2023 20 27.00 49.2 45.8 52.0 41.9 50.0 48.8 44.2 44.3 60.4 48.8 52.7 51.9 46.8 51.5
Schmid_CPJKU_task1_1 Schmid2023 5 13.75 54.4 52.3 56.1 42.9 56.8 55.9 53.1 53.0 59.7 53.7 57.5 56.0 53.8 55.5
Schmid_CPJKU_task1_2 Schmid2023 1 5.25 58.7 54.6 62.1 47.9 59.6 58.5 52.4 54.6 66.9 59.5 63.1 61.0 60.7 61.6
Schmid_CPJKU_task1_3 Schmid2023 2 7.00 61.4 57.8 64.4 48.1 63.3 62.1 58.7 57.0 68.8 61.2 64.5 64.7 63.3 64.0
Schmid_CPJKU_task1_4 Schmid2023 3 11.50 62.7 57.4 67.1 45.1 63.9 64.1 56.4 57.6 71.0 64.9 67.9 66.4 66.6 66.0
Schmidt_FAU_task1_1 Schmidt2023 26 29.25 55.7 51.0 59.6 40.5 57.5 56.2 51.0 49.6 66.5 57.6 63.1 56.6 56.7 57.3
Schmidt_FAU_task1_2 Schmidt2023 12 21.50 55.6 50.8 59.5 38.4 55.5 56.3 52.2 51.7 66.8 56.2 62.3 56.9 56.5 58.4
Schmidt_FAU_task1_3 Schmidt2023 19 26.75 52.7 47.6 57.0 37.7 55.2 54.2 44.4 46.3 64.5 54.5 61.0 53.0 53.6 55.0
Schmidt_FAU_task1_4 Schmidt2023 24 28.75 49.7 44.8 53.8 38.1 52.2 51.6 38.7 43.6 61.7 52.2 55.5 53.1 48.7 51.3
Tan_NTU_task1_1 Tan2023 33 33.50 47.1 39.7 53.3 36.5 49.3 40.5 35.8 36.4 63.3 51.4 57.7 48.4 47.6 51.2
Tan_NTU_task1_2 Tan2023 35 34.00 48.5 41.4 54.4 36.0 50.1 42.2 37.3 41.4 64.2 50.6 58.8 50.0 49.1 53.5
Tan_NTU_task1_3 Tan2023 37 37.00 46.3 40.1 51.5 29.5 47.0 45.8 37.8 40.4 61.0 45.5 55.9 47.7 48.3 50.6
Tan_SCUT_task1_1 Tan2023a 4 13.50 60.8 55.7 65.1 42.8 62.6 60.0 56.3 57.1 70.0 61.5 66.2 64.1 64.1 64.6
Tan_SCUT_task1_2 Tan2023a 30 31.00 51.7 45.8 56.6 33.5 55.4 54.2 41.3 44.8 62.9 52.5 57.9 55.0 55.1 56.1
Tan_SCUT_task1_3 Tan2023a 16 25.50 53.5 48.6 57.5 36.2 56.9 55.6 47.8 46.5 62.8 53.8 58.7 56.3 55.6 58.0
Tan_SCUT_task1_4 Tan2023a 32 33.00 50.9 45.5 55.4 30.9 54.6 52.8 42.2 46.9 59.4 51.7 57.1 53.9 54.4 56.1
Vo_DU_task1_1 Vo2023 52 46.75 45.0 36.2 52.3 26.8 44.0 41.0 33.3 35.8 62.3 53.2 56.9 48.3 46.4 46.6
Vo_DU_task1_2 Vo2023 53 47.75 44.8 36.3 51.9 28.4 43.5 40.0 33.8 35.5 62.3 52.8 57.1 47.4 46.1 45.8
Vo_DU_task1_3 Vo2023 50 46.25 45.2 36.6 52.4 27.6 44.3 41.2 34.7 35.4 62.1 53.2 57.7 47.3 47.0 46.9
Vo_DU_task1_4 Vo2023 49 45.75 45.5 38.0 51.8 29.5 43.2 44.7 36.5 35.9 61.7 51.3 56.9 46.3 46.4 48.3
Wang_SCUT_task1_1 Wang2023 31 31.50 49.1 41.8 55.3 26.1 51.3 48.0 39.3 44.1 61.5 52.6 57.5 54.8 52.9 52.6
Wang_SCUT_task1_2 Wang2023 18 26.50 52.9 47.3 57.5 38.8 55.9 53.5 42.1 46.3 61.8 54.8 60.1 57.0 56.2 55.3
Wang_SCUT_task1_3 Wang2023 46 43.50 47.1 38.4 54.4 22.8 49.1 48.1 39.3 32.9 61.9 53.1 56.8 48.6 53.5 52.2
Wang_SCUT_task1_4 Wang2023 48 45.00 48.5 42.9 53.1 33.1 50.5 50.1 38.0 42.9 59.8 50.2 53.4 53.2 51.6 50.5
XuQianHu_BIT&NUDT_task1_1 Yu2023 45 40.50 51.0 47.5 53.9 41.4 51.3 52.3 46.4 46.2 58.0 53.6 57.2 52.5 50.1 52.1
XuQianHu_BIT&NUDT_task1_2 Yu2023 44 40.00 51.6 47.4 55.1 43.5 50.2 52.7 45.0 45.4 58.4 55.7 58.8 53.5 51.4 52.7
XuQianHu_BIT&NUDT_task1_3 Yu2023 41 39.25 50.0 45.5 53.8 41.5 48.8 50.0 42.0 45.2 58.5 53.6 57.7 51.3 49.6 52.0
XuQianHu_BIT&NUDT_task1_4 Yu2023 36 35.50 51.1 47.6 54.1 43.4 50.5 50.6 47.2 46.2 59.4 55.3 58.0 51.6 48.9 51.4
Yang_GZHU_task1_1 Weng2023 15 24.75 55.5 52.6 58.0 46.9 58.4 55.6 51.8 50.3 64.1 56.4 60.6 55.6 55.6 55.7
Yang_GZHU_task1_2 Weng2023 14 22.25 55.9 53.2 58.2 47.2 58.7 56.4 52.5 51.2 63.5 56.8 61.0 56.2 55.6 55.8
Yang_GZHU_task1_3 Weng2023 21 27.25 55.1 52.1 57.6 42.2 57.3 56.2 54.7 50.0 62.5 55.7 61.4 56.5 54.7 55.0
Yang_GZHU_task1_4 Weng2023 19 26.75 55.2 52.4 57.5 43.4 57.5 56.1 54.6 50.6 62.3 56.2 61.1 56.1 54.3 54.7
Zhang_NCUT_task1_1 Zhang2023 49 45.75 43.3 33.3 51.6 34.0 38.8 38.0 27.6 27.8 63.2 55.0 57.6 45.1 45.9 43.0
Zhang_NCUT_task1_2 Zhang2023 51 46.50 47.9 39.7 54.7 37.9 48.3 48.0 27.0 37.4 63.0 54.8 61.0 49.4 50.0 50.0
Zhang_SATLab_task1_1 Bing2023 26 29.25 48.8 39.2 56.8 32.2 46.7 37.9 27.7 51.4 64.0 54.1 61.1 54.0 53.6 53.8
Zhang_SATLab_task1_2 Bing2023 34 33.75 50.0 40.6 57.7 28.6 50.5 47.5 30.0 46.7 64.7 57.5 62.9 54.3 53.3 53.8
Zhang_SATLab_task1_3 Bing2023 27 30.00 51.9 41.8 60.3 27.2 51.8 46.9 32.7 50.6 67.5 59.4 65.2 57.0 56.4 56.4
Zhang_SATLab_task1_4 Bing2023 22 27.50 50.3 41.6 57.6 34.0 47.2 43.7 33.3 49.6 65.0 55.7 62.4 54.0 54.0 54.5

Log loss

Unseen devices Seen devices
Rank Submission label Technical
Report
Official
system
rank
Rank value Log loss Log loss /
Unseen
Log loss /
Seen
D S7 S8 S9 S10 A B C S1 S2 S3
AI4EDGE_IPL_task1_1 Almeida2023 28 30.25 1.920 2.370 1.544 4.428 1.659 1.969 1.831 1.965 1.325 1.601 1.421 1.480 1.815 1.622
AI4EDGE_IPL_task1_2 Almeida2023 47 43.75 1.996 2.391 1.667 3.613 1.860 2.071 2.198 2.211 1.523 1.672 1.468 1.729 1.848 1.762
AI4EDGE_IPL_task1_3 Almeida2023 38 37.25 1.364 1.528 1.227 1.568 1.485 1.534 1.565 1.490 1.156 1.215 1.183 1.261 1.319 1.225
Bai_JLESS_task1_1 Du2023 54 50.25 1.825 2.335 1.399 3.859 1.607 1.631 2.131 2.448 1.095 1.349 1.163 1.579 1.650 1.560
Bai_JLESS_task1_2 Du2023 47 43.75 1.791 2.225 1.430 2.609 1.820 1.623 2.561 2.510 1.129 1.303 1.160 1.654 1.669 1.662
Cai_TENCENT_task1_1 Cai2023 17 25.75 1.174 1.347 1.029 1.463 1.168 1.138 1.513 1.454 0.850 1.071 0.929 1.113 1.123 1.088
Cai_TENCENT_task1_2 Cai2023 11 21.25 1.246 1.369 1.143 1.843 1.221 1.198 1.258 1.326 1.003 1.191 1.107 1.214 1.177 1.166
Cai_TENCENT_task1_3 Cai2023 14 22.25 1.241 1.356 1.144 1.768 1.195 1.191 1.296 1.332 0.996 1.200 1.094 1.206 1.208 1.159
Cai_TENCENT_task1_4 Cai2023 11 21.25 1.252 1.358 1.164 1.695 1.230 1.218 1.301 1.345 1.019 1.200 1.112 1.234 1.219 1.200
Cai_XJTLU_task1_1 Cai2023a 9 19.75 1.307 1.390 1.237 1.538 1.248 1.267 1.465 1.431 1.082 1.284 1.187 1.290 1.317 1.265
Cai_XJTLU_task1_2 Cai2023a 8 18.25 1.292 1.389 1.211 1.605 1.247 1.231 1.432 1.430 1.029 1.275 1.158 1.263 1.304 1.235
Cai_XJTLU_task1_3 Cai2023a 6 14.00 1.223 1.331 1.133 1.620 1.165 1.166 1.336 1.367 0.957 1.222 1.107 1.153 1.198 1.163
Cai_XJTLU_task1_4 Cai2023a 3 11.50 1.241 1.385 1.121 1.779 1.192 1.200 1.358 1.398 0.953 1.193 1.130 1.169 1.140 1.144
Fei_vv_task1_1 Fei2023 15 24.75 1.282 1.393 1.190 1.613 1.189 1.302 1.339 1.519 0.995 1.218 1.112 1.257 1.298 1.261
Fei_vv_task1_2 Fei2023 13 21.75 1.290 1.410 1.191 1.650 1.212 1.304 1.361 1.522 0.990 1.246 1.128 1.232 1.313 1.236
Fei_vv_task1_3 Fei2023 26 29.25 1.349 1.457 1.259 1.617 1.263 1.406 1.415 1.583 1.067 1.268 1.175 1.329 1.377 1.339
Fei_vv_task1_4 Fei2023 23 28.25 1.370 1.460 1.295 1.614 1.297 1.419 1.418 1.555 1.083 1.355 1.236 1.321 1.432 1.346
Han_SZU_task1_1 Han2023 39 37.75 2.012 2.565 1.550 2.860 1.664 1.923 4.114 2.263 1.370 1.748 1.420 1.574 1.721 1.469
LAM_AEV_task1_1 Pham2023 25 29.00 1.847 2.329 1.445 3.322 1.521 1.618 3.158 2.026 1.282 1.655 1.289 1.463 1.567 1.416
LAM_AEV_task1_2 Pham2023 31 31.50 2.083 2.681 1.584 3.367 1.903 2.028 3.894 2.215 1.350 1.745 1.425 1.688 1.729 1.566
LAM_AEV_task1_3 Pham2023 20 27.00 1.933 2.447 1.504 3.117 1.577 1.816 3.558 2.166 1.218 1.781 1.304 1.570 1.631 1.521
Liang_NTES_task1_1 Liang2023 40 38.50 1.402 1.704 1.150 2.196 1.328 1.364 1.938 1.694 0.922 1.154 0.995 1.207 1.357 1.266
MALACH23_JKU_task1_1 Pichler2023 8 18.25 1.230 1.349 1.130 1.513 1.211 1.194 1.428 1.399 0.957 1.207 1.081 1.195 1.212 1.130
MALACH23_JKU_task1_2 Pichler2023 7 16.75 1.242 1.353 1.150 1.558 1.236 1.205 1.365 1.402 0.974 1.226 1.099 1.220 1.218 1.161
MALACH23_JKU_task1_3 Pichler2023 42 39.50 4.354 4.358 4.350 4.363 4.339 4.366 4.391 4.330 4.340 4.341 4.367 4.359 4.339 4.355
MALACH23_JKU_task1_4 Pichler2023 43 39.75 3.224 3.228 3.221 3.245 3.203 3.226 3.249 3.216 3.215 3.224 3.214 3.225 3.225 3.225
DCASE2023 baseline 52 46.75 1.523 1.730 1.351 1.847 1.476 1.535 1.929 1.862 1.101 1.366 1.262 1.448 1.500 1.430
Park_KT_task1_1 Kim2023 10 20.75 1.495 1.827 1.218 2.779 1.275 1.410 1.746 1.927 1.024 1.210 1.103 1.310 1.356 1.302
Park_KT_task1_2 Kim2023 29 30.75 1.660 2.056 1.329 2.983 1.509 1.699 1.941 2.150 1.190 1.315 1.198 1.454 1.396 1.423
Park_KT_task1_3 Kim2023 26 29.25 2.230 2.840 1.721 4.363 2.054 2.076 2.688 3.021 1.436 1.762 1.608 1.816 1.888 1.816
Park_KT_task1_4 Kim2023 20 27.00 1.510 1.573 1.457 1.614 1.496 1.530 1.642 1.581 1.276 1.526 1.422 1.488 1.561 1.468
Schmid_CPJKU_task1_1 Schmid2023 5 13.75 1.313 1.395 1.244 1.822 1.216 1.244 1.341 1.353 1.120 1.360 1.191 1.226 1.311 1.255
Schmid_CPJKU_task1_2 Schmid2023 1 5.25 1.256 1.445 1.099 1.833 1.188 1.267 1.502 1.437 0.962 1.202 1.075 1.128 1.123 1.101
Schmid_CPJKU_task1_3 Schmid2023 2 7.00 1.153 1.318 1.015 1.840 1.056 1.153 1.252 1.290 0.892 1.134 1.000 1.018 1.038 1.010
Schmid_CPJKU_task1_4 Schmid2023 3 11.50 1.117 1.334 0.936 1.902 1.044 1.045 1.310 1.366 0.847 1.001 0.919 0.949 0.935 0.966
Schmidt_FAU_task1_1 Schmidt2023 26 29.25 1.322 1.465 1.203 1.816 1.269 1.317 1.461 1.461 1.039 1.258 1.125 1.281 1.279 1.240
Schmidt_FAU_task1_2 Schmidt2023 12 21.50 1.337 1.488 1.210 1.972 1.322 1.321 1.405 1.420 1.038 1.292 1.140 1.274 1.277 1.241
Schmidt_FAU_task1_3 Schmidt2023 19 26.75 1.398 1.562 1.261 1.961 1.323 1.344 1.620 1.562 1.083 1.332 1.177 1.355 1.329 1.288
Schmidt_FAU_task1_4 Schmidt2023 24 28.75 1.482 1.650 1.342 1.946 1.407 1.434 1.785 1.677 1.143 1.390 1.314 1.357 1.451 1.395
Tan_NTU_task1_1 Tan2023 33 33.50 1.508 1.720 1.331 1.789 1.427 1.696 1.879 1.809 1.107 1.387 1.260 1.406 1.446 1.379
Tan_NTU_task1_2 Tan2023 35 34.00 1.461 1.634 1.317 1.706 1.405 1.630 1.758 1.671 1.097 1.387 1.245 1.400 1.425 1.346
Tan_NTU_task1_3 Tan2023 37 37.00 1.492 1.632 1.375 1.868 1.461 1.489 1.692 1.650 1.180 1.473 1.302 1.445 1.462 1.388
Tan_SCUT_task1_1 Tan2023a 4 13.50 1.192 1.420 1.002 2.215 1.096 1.205 1.304 1.279 0.899 1.100 0.952 1.018 1.027 1.019
Tan_SCUT_task1_2 Tan2023a 30 31.00 1.444 1.726 1.208 2.504 1.262 1.341 1.902 1.619 1.037 1.335 1.164 1.254 1.248 1.214
Tan_SCUT_task1_3 Tan2023a 16 25.50 1.441 1.675 1.247 2.407 1.278 1.366 1.662 1.661 1.095 1.395 1.192 1.261 1.322 1.214
Tan_SCUT_task1_4 Tan2023a 32 33.00 1.525 1.843 1.259 2.915 1.307 1.422 1.954 1.620 1.134 1.416 1.195 1.285 1.273 1.253
Vo_DU_task1_1 Vo2023 52 46.75 2.157 2.963 1.485 6.412 1.811 1.767 2.425 2.399 1.240 1.438 1.346 1.608 1.639 1.641
Vo_DU_task1_2 Vo2023 53 47.75 2.116 2.877 1.482 6.182 1.769 1.759 2.385 2.292 1.237 1.443 1.349 1.584 1.624 1.652
Vo_DU_task1_3 Vo2023 50 46.25 2.092 2.827 1.479 5.800 1.808 1.747 2.280 2.501 1.228 1.432 1.318 1.600 1.651 1.647
Vo_DU_task1_4 Vo2023 49 45.75 1.793 2.169 1.479 2.920 1.752 1.705 2.256 2.214 1.276 1.484 1.376 1.590 1.592 1.556
Wang_SCUT_task1_1 Wang2023 31 31.50 1.493 1.823 1.218 2.972 1.326 1.431 1.818 1.569 1.056 1.320 1.165 1.228 1.263 1.277
Wang_SCUT_task1_2 Wang2023 18 26.50 1.348 1.510 1.214 1.798 1.264 1.302 1.632 1.553 1.086 1.309 1.168 1.241 1.235 1.243
Wang_SCUT_task1_3 Wang2023 46 43.50 1.702 2.228 1.263 4.002 1.475 1.499 1.759 2.405 1.039 1.276 1.159 1.467 1.274 1.366
Wang_SCUT_task1_4 Wang2023 48 45.00 1.472 1.654 1.321 1.966 1.384 1.415 1.845 1.657 1.126 1.395 1.309 1.341 1.381 1.376
XuQianHu_BIT&NUDT_task1_1 Yu2023 45 40.50 1.364 1.450 1.292 1.630 1.351 1.345 1.455 1.469 1.220 1.297 1.248 1.317 1.363 1.307
XuQianHu_BIT&NUDT_task1_2 Yu2023 44 40.00 1.355 1.455 1.271 1.553 1.380 1.347 1.487 1.510 1.222 1.255 1.209 1.300 1.352 1.291
XuQianHu_BIT&NUDT_task1_3 Yu2023 41 39.25 1.395 1.509 1.301 1.654 1.418 1.391 1.551 1.528 1.225 1.287 1.239 1.340 1.385 1.329
XuQianHu_BIT&NUDT_task1_4 Yu2023 36 35.50 1.367 1.450 1.298 1.575 1.363 1.371 1.442 1.501 1.215 1.281 1.224 1.331 1.408 1.331
Yang_GZHU_task1_1 Weng2023 15 24.75 1.280 1.409 1.172 1.705 1.169 1.276 1.423 1.472 0.999 1.254 1.084 1.213 1.234 1.248
Yang_GZHU_task1_2 Weng2023 14 22.25 1.241 1.352 1.149 1.661 1.141 1.206 1.348 1.403 1.001 1.213 1.077 1.184 1.207 1.215
Yang_GZHU_task1_3 Weng2023 21 27.25 1.279 1.406 1.172 1.941 1.188 1.204 1.272 1.426 1.017 1.272 1.087 1.175 1.236 1.246
Yang_GZHU_task1_4 Weng2023 19 26.75 1.259 1.364 1.172 1.794 1.180 1.189 1.260 1.396 1.031 1.241 1.097 1.178 1.243 1.243
Zhang_NCUT_task1_1 Zhang2023 49 45.75 1.757 2.083 1.485 2.026 1.802 1.898 2.387 2.301 1.178 1.376 1.295 1.655 1.661 1.743
Zhang_NCUT_task1_2 Zhang2023 51 46.50 1.533 1.813 1.300 1.788 1.506 1.501 2.399 1.869 1.087 1.275 1.127 1.455 1.426 1.431
Zhang_SATLab_task1_1 Bing2023 26 29.25 3.248 5.085 1.717 9.657 2.122 3.929 7.783 1.935 1.724 1.700 1.609 1.798 1.808 1.662
Zhang_SATLab_task1_2 Bing2023 34 33.75 4.213 6.653 2.179 12.855 2.905 3.678 10.376 3.450 2.070 2.147 2.139 2.077 2.308 2.335
Zhang_SATLab_task1_3 Bing2023 27 30.00 1.704 2.237 1.261 4.419 1.475 1.685 2.116 1.487 1.139 1.288 1.124 1.332 1.352 1.329
Zhang_SATLab_task1_4 Bing2023 22 27.50 1.542 1.802 1.325 2.165 1.581 1.735 2.027 1.504 1.145 1.385 1.206 1.415 1.408 1.392

System characteristics

General characteristics

Rank Submission label Technical
Report
Official
system
rank
Rank value Accuracy
(Eval)
Logloss
(Eval)
Sampling
rate
Data
augmentation
Features Embeddings
AI4EDGE_IPL_task1_1 Almeida2023 28 30.25 51.9 1.920 8kHz pitch shifting, time stretching, mixup, time masking, frequency masking log-mel energies
AI4EDGE_IPL_task1_2 Almeida2023 47 43.75 48.8 1.996 8kHz pitch shifting, time stretching, mixup, time masking, frequency masking log-mel energies
AI4EDGE_IPL_task1_3 Almeida2023 38 37.25 50.8 1.364 8kHz pitch shifting, time stretching, mixup, time masking, frequency masking log-mel energies
Bai_JLESS_task1_1 Du2023 54 50.25 47.9 1.825 44.1kHz FMix log-mel energies
Bai_JLESS_task1_2 Du2023 47 43.75 48.8 1.791 44.1kHz FMix log-mel energies
Cai_TENCENT_task1_1 Cai2023 17 25.75 56.6 1.174 32kHz mixup, mixstyle, log-mel energies
Cai_TENCENT_task1_2 Cai2023 11 21.25 56.2 1.246 32kHz mixup, mixstyle, log-mel energies
Cai_TENCENT_task1_3 Cai2023 14 22.25 55.8 1.241 32kHz mixup, mixstyle log-mel energies
Cai_TENCENT_task1_4 Cai2023 11 21.25 55.4 1.252 32kHz mixup, mixstyle log-mel energies
Cai_XJTLU_task1_1 Cai2023a 9 19.75 51.9 1.307 32kHz device simulation, mixup, mixstyle log-mel energies
Cai_XJTLU_task1_2 Cai2023a 8 18.25 52.5 1.292 32kHz device simulation, mixup, mixstyle log-mel energies
Cai_XJTLU_task1_3 Cai2023a 6 14.00 55.1 1.223 32kHz device simulation, mixup, mixstyle log-mel energies
Cai_XJTLU_task1_4 Cai2023a 3 11.50 57.0 1.241 32kHz device simulation, mixup, mixstyle log-mel energies
Fei_vv_task1_1 Fei2023 15 24.75 55.2 1.282 16kHz mixup log-mel energies
Fei_vv_task1_2 Fei2023 13 21.75 54.5 1.290 16kHz mixup log-mel energies
Fei_vv_task1_3 Fei2023 26 29.25 53.2 1.349 16kHz mixup, time stretching log-mel energies
Fei_vv_task1_4 Fei2023 23 28.25 51.8 1.370 16kHz mixup, time stretching log-mel energies
Han_SZU_task1_1 Han2023 39 37.75 50.5 2.012 44.1kHz mixup, time masking, frequency masking, pitch shifting, random noise log-mel energies, spectral envelope, spectrum fine structure
LAM_AEV_task1_1 Pham2023 25 29.00 55.3 1.847 44.1kHz mixup CQT, Gammatonegram, Mel
LAM_AEV_task1_2 Pham2023 31 31.50 55.0 2.083 44.1kHz mixup CQT, Gammatonegram, Mel
LAM_AEV_task1_3 Pham2023 20 27.00 55.6 1.933 44.1kHz mixup CQT, Gammatonegram, Mel
Liang_NTES_task1_1 Liang2023 40 38.50 52.6 1.402 44.1kHz random cropping, SpecAugment, mixup log-mel energies
MALACH23_JKU_task1_1 Pichler2023 8 18.25 57.0 1.230 32kHz mixstyle log-mel energies
MALACH23_JKU_task1_2 Pichler2023 7 16.75 56.6 1.242 32kHz mixstyle log-mel energies
MALACH23_JKU_task1_3 Pichler2023 42 39.50 9.9 4.354 32kHz mixup log-mel energies
MALACH23_JKU_task1_4 Pichler2023 43 39.75 9.8 3.224 32kHz mixup log-mel energies
DCASE2023 baseline 52 46.75 44.8 1.523 44.1kHz log-mel energies
Park_KT_task1_1 Kim2023 10 20.75 56.3 1.495 16kHz mixup, frequency masking, temporal masking log-mel energies
Park_KT_task1_2 Kim2023 29 30.75 53.0 1.660 16kHz mixup, frequency masking, temporal masking log-mel energies
Park_KT_task1_3 Kim2023 26 29.25 54.2 2.230 16kHz mixup, frequency masking, temporal masking log-mel energies
Park_KT_task1_4 Kim2023 20 27.00 49.2 1.510 16kHz mixup, frequency masking, temporal masking log-mel energies
Schmid_CPJKU_task1_1 Schmid2023 5 13.75 54.4 1.313 32kHz device impulse response augmentation, mixup, freq-mixstyle, pitch shifting log-mel energies
Schmid_CPJKU_task1_2 Schmid2023 1 5.25 58.7 1.256 32kHz device impulse response augmentation, mixup, freq-mixstyle, pitch shifting log-mel energies
Schmid_CPJKU_task1_3 Schmid2023 2 7.00 61.4 1.153 32kHz device impulse response augmentation, mixup, freq-mixstyle, pitch shifting log-mel energies
Schmid_CPJKU_task1_4 Schmid2023 3 11.50 62.7 1.117 32kHz device impulse response augmentation, mixup, freq-mixstyle, pitch shifting log-mel energies
Schmidt_FAU_task1_1 Schmidt2023 26 29.25 55.7 1.322 32kHz random cutoff, mixstyle, pitch shifting log-mel energies
Schmidt_FAU_task1_2 Schmidt2023 12 21.50 55.6 1.337 32kHz random cutoff, mixstyle, pitch shifting log-mel energies
Schmidt_FAU_task1_3 Schmidt2023 19 26.75 52.7 1.398 32kHz random cutoff, mixstyle, pitch shifting log-mel energies
Schmidt_FAU_task1_4 Schmidt2023 24 28.75 49.7 1.482 32kHz random cutoff, mixstyle, pitch shifting log-mel energies
Tan_NTU_task1_1 Tan2023 33 33.50 47.1 1.508 44.1kHz log-mel energies
Tan_NTU_task1_2 Tan2023 35 34.00 48.5 1.461 44.1kHz log-mel energies
Tan_NTU_task1_3 Tan2023 37 37.00 46.3 1.492 44.1kHz log-mel energies
Tan_SCUT_task1_1 Tan2023a 4 13.50 60.8 1.192 44.1kHz mel-spectrogram
Tan_SCUT_task1_2 Tan2023a 30 31.00 51.7 1.444 44.1kHz mel-spectrogram
Tan_SCUT_task1_3 Tan2023a 16 25.50 53.5 1.441 44.1kHz mel-spectrogram
Tan_SCUT_task1_4 Tan2023a 32 33.00 50.9 1.525 44.1kHz mel-spectrogram
Vo_DU_task1_1 Vo2023 52 46.75 45.0 2.157 44.1kHz mixup, SpecAugment log-mel spectrogram
Vo_DU_task1_2 Vo2023 53 47.75 44.8 2.116 44.1kHz mixup, SpecAugment log-mel spectrogram
Vo_DU_task1_3 Vo2023 50 46.25 45.2 2.092 44.1kHz mixup, SpecAugment log-mel spectrogram
Vo_DU_task1_4 Vo2023 49 45.75 45.5 1.793 44.1kHz mixup, SpecAugment log-mel spectrogram
Wang_SCUT_task1_1 Wang2023 31 31.50 49.1 1.493 44.1kHz mixup,speculation and spectral modulation log-mel energies
Wang_SCUT_task1_2 Wang2023 18 26.50 52.9 1.348 44.1kHz mixup,speculation and spectral modulation log-mel energies
Wang_SCUT_task1_3 Wang2023 46 43.50 47.1 1.702 44.1kHz mixup,speculation and spectral modulation log-mel energies
Wang_SCUT_task1_4 Wang2023 48 45.00 48.5 1.472 44.1kHz mixup,speculation and spectral modulation log-mel energies
XuQianHu_BIT&NUDT_task1_1 Yu2023 45 40.50 51.0 1.364 32kHz timerolling, pitch shifting, gaussian noise, specaugment, mixup, mixstyle log-mel energies
XuQianHu_BIT&NUDT_task1_2 Yu2023 44 40.00 51.6 1.355 32kHz timerolling, pitch shifting, gaussian noise, specaugment, mixup, mixstyle log-mel energies
XuQianHu_BIT&NUDT_task1_3 Yu2023 41 39.25 50.0 1.395 32kHz timerolling, pitch shifting, gaussian noise, specaugment, mixup, mixstyle log-mel energies
XuQianHu_BIT&NUDT_task1_4 Yu2023 36 35.50 51.1 1.367 32kHz timerolling, pitch shifting, gaussian noise, specaugment, mixup, mixstyle log-mel energies
Yang_GZHU_task1_1 Weng2023 15 24.75 55.5 1.280 32kHz Conv_IR, time shifting, time-frequency masking, mixstyle log-mel energies
Yang_GZHU_task1_2 Weng2023 14 22.25 55.9 1.241 32kHz Conv_IR, time shifting, time-frequency masking, mixstyle log-mel energies
Yang_GZHU_task1_3 Weng2023 21 27.25 55.1 1.279 32kHz Conv_IR, time shifting, time-frequency masking, mixstyle log-mel energies
Yang_GZHU_task1_4 Weng2023 19 26.75 55.2 1.259 32kHz Conv_IR, time shifting, time-frequency masking, mixstyle log-mel energies
Zhang_NCUT_task1_1 Zhang2023 49 45.75 43.3 1.757 44.1kHz mixup, specaugment log-mel energies,delta and delta-delta
Zhang_NCUT_task1_2 Zhang2023 51 46.50 47.9 1.533 44.1kHz mixup, specaugment log-mel energies,delta and delta-delta
Zhang_SATLab_task1_1 Bing2023 26 29.25 48.8 3.248 44.1kHz mixup, temporal crop, time masking, frequency masking log-mel energies
Zhang_SATLab_task1_2 Bing2023 34 33.75 50.0 4.213 44.1kHz mixup, temporal crop, time masking, frequency masking log-mel energies
Zhang_SATLab_task1_3 Bing2023 27 30.00 51.9 1.704 44.1kHz mixup, temporal crop, time masking, frequency masking log-mel energies
Zhang_SATLab_task1_4 Bing2023 22 27.50 50.3 1.542 44.1kHz mixup, temporal crop, time masking, frequency masking log-mel energies



Machine learning characteristics

Rank Code Technical
Report
Official
system
rank
Rank value Accuracy
(Eval)
Logloss
(Eval)
External
data usage
External
data sources
Model
complexity
Model
MACS
Classifier Ensemble
subsystems
Decision
making
Framework Pipeline
AI4EDGE_IPL_task1_1 Almeida2023 28 30.25 51.9 1.920 52852 25475456 CNN, ensemble 10 keras/tensorflow pretraining, training, adaptation, knowledge distillation, weight quantization
AI4EDGE_IPL_task1_2 Almeida2023 47 43.75 48.8 1.996 68996 29304736 CNN, ensemble 10 keras/tensorflow pretraining, training, adaptation, knowledge distillation, weight quantization
AI4EDGE_IPL_task1_3 Almeida2023 38 37.25 50.8 1.364 65192 26711936 CNN, ensemble 10 keras/tensorflow pretraining, training, adaptation, knowledge distillation, weight quantization
Bai_JLESS_task1_1 Du2023 54 50.25 47.9 1.825 78252 27931612 CNN,ResNet,Transformer,CBAM pytorch pretraining, training, adaptation, pruning, weight quantization
Bai_JLESS_task1_2 Du2023 47 43.75 48.8 1.791 60458 14130372 CNN,ResNet,Transformer,CBAM pytorch pretraining, training, adaptation, pruning, weight quantization
Cai_TENCENT_task1_1 Cai2023 17 25.75 56.6 1.174 pre-trained model Audioset 127684 28840396 CNN pytorch pretraining, teacher model trainining, student model training, weight quantization
Cai_TENCENT_task1_2 Cai2023 11 21.25 56.2 1.246 pre-trained model Audioset 79942 21990724 CNN pytorch pretraining, teacher model trainining, student model training, weight quantization
Cai_TENCENT_task1_3 Cai2023 14 22.25 55.8 1.241 pre-trained model Audioset 79942 21990724 CNN pytorch pretraining, teacher model trainining, student model training, weight quantization
Cai_TENCENT_task1_4 Cai2023 11 21.25 55.4 1.252 pre-trained model Audioset 63558 19533124 CNN pytorch pretraining, teacher model trainining, student model training, weight quantization
Cai_XJTLU_task1_1 Cai2023a 9 19.75 51.9 1.307 device simulation MicIRP 6828 1649349 CNN, TF-SepNet pytorch training, weight quantization
Cai_XJTLU_task1_2 Cai2023a 8 18.25 52.5 1.292 device simulation MicIRP 6828 1649349 CNN, TF-SepNet pytorch training, weight quantization
Cai_XJTLU_task1_3 Cai2023a 6 14.00 55.1 1.223 device simulation MicIRP 15890 3424245 CNN, TF-SepNet pytorch training, weight quantization
Cai_XJTLU_task1_4 Cai2023a 3 11.50 57.0 1.241 device simulation MicIRP 54260 10219540 CNN, TF-SepNet pytorch training, weight quantization
Fei_vv_task1_1 Fei2023 15 24.75 55.2 1.282 pre-trained model 123636 13402932 SERFR-CNN-32 pytorch pretraining, training, adaptation, pruning, weight quantization
Fei_vv_task1_2 Fei2023 13 21.75 54.5 1.290 pre-trained model 70588 7802348 SERFR-CNN-24 pytorch pretraining, training, adaptation, pruning, weight quantization
Fei_vv_task1_3 Fei2023 26 29.25 53.2 1.349 pre-trained model 123636 13402932 SERFR-CNN-32 pytorch pretraining, training, adaptation, pruning, weight quantization
Fei_vv_task1_4 Fei2023 23 28.25 51.8 1.370 pre-trained model 70588 7802348 SERFR-CNN-24 pytorch pretraining, training, adaptation, pruning, weight quantization
Han_SZU_task1_1 Han2023 39 37.75 50.5 2.012 80845 29.349M CNN keras/tensorflow cepstrum analysis, extract features, train teacher_network, knowledge distillation, train student_network
LAM_AEV_task1_1 Pham2023 25 29.00 55.3 1.847 22962 29267550 CNN 3 Late fusion of predicted probabilities keras/tensorflow training
LAM_AEV_task1_2 Pham2023 31 31.50 55.0 2.083 22962 29267550 CNN 3 Late fusion of predicted probabilities keras/tensorflow training
LAM_AEV_task1_3 Pham2023 20 27.00 55.6 1.933 22962 29267550 CNN 3 Late fusion of predicted probabilities keras/tensorflow training
Liang_NTES_task1_1 Liang2023 40 38.50 52.6 1.402 31260 29591778 CNN pytorch training teacher, training student
MALACH23_JKU_task1_1 Pichler2023 8 18.25 57.0 1.230 59804 14686940 CNN maximum likelihood pytorch mel spectrogram, mixstyle, training
MALACH23_JKU_task1_2 Pichler2023 7 16.75 56.6 1.242 43580 10819292 CNN maximum likelihood pytorch mel spectrogram, mixstyle, training
MALACH23_JKU_task1_3 Pichler2023 42 39.50 9.9 4.354 116648 572340 S4 maximum likelihood pytorch mel spectrogram, mixup, training
MALACH23_JKU_task1_4 Pichler2023 43 39.75 9.8 3.224 15994 214420 S4 maximum likelihood pytorch mel spectrogram, mixup, training
DCASE2023 baseline 52 46.75 44.8 1.523 embeddings 46512 29234920 CNN keras/tensorflow pretraining, training, adaptation, pruning, weight quantization
Park_KT_task1_1 Kim2023 10 20.75 56.3 1.495 embeddings 92070 19556096 BCRes2Net maximum likelihood pytorch training, weight quantization
Park_KT_task1_2 Kim2023 29 30.75 53.0 1.660 embeddings 92070 19556096 BCRes2Net maximum likelihood pytorch training, weight quantization
Park_KT_task1_3 Kim2023 26 29.25 54.2 2.230 embeddings 92070 19556096 BCRes2Net maximum likelihood pytorch training, weight quantization
Park_KT_task1_4 Kim2023 20 27.00 49.2 1.510 embeddings 20516 617000 BCRes2Net maximum likelihood pytorch training, weight quantization
Schmid_CPJKU_task1_1 Schmid2023 5 13.75 54.4 1.313 pre-trained model PaSST, MicIRP 5722 1582336 RF-regularized CNNs, PaSST transformer pytorch train teachers, ensemble teacher logits, train student using knowledge distillation, quantization-aware training
Schmid_CPJKU_task1_2 Schmid2023 1 5.25 58.7 1.256 pre-trained model PaSST, MicIRP 12310 4354304 RF-regularized CNNs, PaSST transformer pytorch train teachers, ensemble teacher logits, train student using knowledge distillation, quantization-aware training
Schmid_CPJKU_task1_3 Schmid2023 2 7.00 61.4 1.153 pre-trained model PaSST, MicIRP 30106 9638144 RF-regularized CNNs, PaSST transformer pytorch train teachers, ensemble teacher logits, train student using knowledge distillation, quantization-aware training
Schmid_CPJKU_task1_4 Schmid2023 3 11.50 62.7 1.117 pre-trained model PaSST, MicIRP 54182 16803072 RF-regularized CNNs, PaSST transformer pytorch train teachers, ensemble teacher logits, train student using knowledge distillation, pruning, quantization-aware training
Schmidt_FAU_task1_1 Schmidt2023 26 29.25 55.7 1.322 pre-trained model 127988 28931380 ensemble, RF-regularized CNNs, PaSST transformer 8 generalized mean pytorch pretraining, training, adaptation, pruning, weight quantization
Schmidt_FAU_task1_2 Schmidt2023 12 21.50 55.6 1.337 pre-trained model 68456 19910080 ensemble, RF-regularized CNNs, PaSST transformer 8 weighted generalized mean pytorch pretraining, training, adaptation, pruning, weight quantization
Schmidt_FAU_task1_3 Schmidt2023 19 26.75 52.7 1.398 pre-trained model 74700 9996775 ensemble, RF-regularized CNNs, PaSST transformer 8 generalized mean pytorch pretraining, training, adaptation, pruning, weight quantization
Schmidt_FAU_task1_4 Schmidt2023 24 28.75 49.7 1.482 pre-trained model 34616 4938255 ensemble, RF-regularized CNNs, PaSST transformer 8 generalized mean pytorch pretraining, training, adaptation, pruning, weight quantization
Tan_NTU_task1_1 Tan2023 33 33.50 47.1 1.508 37434 2960384 CNN keras/tensorflow training, weight quantization
Tan_NTU_task1_2 Tan2023 35 34.00 48.5 1.461 54242 6462656 CNN keras/tensorflow training, weight quantization
Tan_NTU_task1_3 Tan2023 37 37.00 46.3 1.492 54242 6462656 CNN keras/tensorflow training, weight quantization
Tan_SCUT_task1_1 Tan2023a 4 13.50 60.8 1.192 embeddings 73386 13180000 CNN pytorch training teacher, training student, knowledge distillation, weight quantization
Tan_SCUT_task1_2 Tan2023a 30 31.00 51.7 1.444 embeddings 73386 13180000 CNN pytorch training teacher, training student, knowledge distillation, weight quantization
Tan_SCUT_task1_3 Tan2023a 16 25.50 53.5 1.441 embeddings 73386 13180000 CNN pytorch training teacher, training student, knowledge distillation, weight quantization
Tan_SCUT_task1_4 Tan2023a 32 33.00 50.9 1.525 embeddings 73386 13180000 CNN pytorch training teacher, training student, knowledge distillation, weight quantization
Vo_DU_task1_1 Vo2023 52 46.75 45.0 2.157 pre-trained model 119526 15600000 CNN, Transformer, Knowledge Distillation pytorch pretraining, training, adaptation, weight quantization
Vo_DU_task1_2 Vo2023 53 47.75 44.8 2.116 pre-trained model 119526 15600000 CNN, Transformer, Knowledge Distillation pytorch pretraining, training, adaptation, weight quantization
Vo_DU_task1_3 Vo2023 50 46.25 45.2 2.092 pre-trained model 119526 15600000 CNN, Transformer, Knowledge Distillation pytorch pretraining, training, adaptation, weight quantization
Vo_DU_task1_4 Vo2023 49 45.75 45.5 1.793 pre-trained model 119526 15600000 CNN, Transformer, Knowledge Distillation pytorch pretraining, training, adaptation, weight quantization
Wang_SCUT_task1_1 Wang2023 31 31.50 49.1 1.493 45164 8646000 CNN pytorch training, adaptation, pruning, weight quantization
Wang_SCUT_task1_2 Wang2023 18 26.50 52.9 1.348 56172 16746000 CNN pytorch training, adaptation, pruning, weight quantization
Wang_SCUT_task1_3 Wang2023 46 43.50 47.1 1.702 56556 25442000 CNN pytorch training, adaptation, pruning, weight quantization
Wang_SCUT_task1_4 Wang2023 48 45.00 48.5 1.472 121812 20902000 CNN pytorch training, adaptation, pruning, weight quantization
XuQianHu_BIT&NUDT_task1_1 Yu2023 45 40.50 51.0 1.364 pre-trained model 52288 23803968 CNN + Transformer pytorch pretraining, training, adaptation, distillation, weight quantization
XuQianHu_BIT&NUDT_task1_2 Yu2023 44 40.00 51.6 1.355 pre-trained model 51648 28400320 CNN + Transformer pytorch pretraining, training, adaptation, distillation, weight quantization
XuQianHu_BIT&NUDT_task1_3 Yu2023 41 39.25 50.0 1.395 pre-trained model 57392 13402688 CNN + Transformer pytorch pretraining, training, adaptation, distillation, weight quantization
XuQianHu_BIT&NUDT_task1_4 Yu2023 36 35.50 51.1 1.367 pre-trained model 66114 11878580 CNN + Transformer pytorch pretraining, training, adaptation, distillation, weight quantization
Yang_GZHU_task1_1 Weng2023 15 24.75 55.5 1.280 pre-trained model, convolution with IRs from MicIRP Microphone Impulse Response Project 76906 23970000 CNN pytorch pretraining, DML training, KD fine-tuning, weight quantization
Yang_GZHU_task1_2 Weng2023 14 22.25 55.9 1.241 pre-trained model, convolution with IRs from MicIRP Microphone Impulse Response Project 76906 23970000 CNN average pytorch pretraining, DML training, KD fine-tuning, weight quantization
Yang_GZHU_task1_3 Weng2023 21 27.25 55.1 1.279 pre-trained model, convolution with IRs from MicIRP Microphone Impulse Response Project 76906 23970000 CNN pytorch pretraining, DML training, weight quantization
Yang_GZHU_task1_4 Weng2023 19 26.75 55.2 1.259 pre-trained model, convolution with IRs from MicIRP Microphone Impulse Response Project 76906 23970000 CNN average pytorch pretraining, DML training, weight quantization
Zhang_NCUT_task1_1 Zhang2023 49 45.75 43.3 1.757 TAU Urban Acoustic Scenes 2022 Mobile, Development dataset 123648 7375000 GhostNet average keras/tensorflow training, weight quantization
Zhang_NCUT_task1_2 Zhang2023 51 46.50 47.9 1.533 TAU Urban Acoustic Scenes 2022 Mobile, Development dataset 76224 28461000 FHR_Mobilenet average keras/tensorflow training, weight quantization
Zhang_SATLab_task1_1 Bing2023 26 29.25 48.8 3.248 TAU Urban Acoustic Scenes 2022 Mobile, Development dataset 7946 3972096 MobileNet keras/tensorflow training, weight quantization
Zhang_SATLab_task1_2 Bing2023 34 33.75 50.0 4.213 TAU Urban Acoustic Scenes 2022 Mobile, Development dataset 46232 19466240 mini-SegNet keras/tensorflow training, pruning, retraining, weight quantization
Zhang_SATLab_task1_3 Bing2023 27 30.00 51.9 1.704 TAU Urban Acoustic Scenes 2022 Mobile, Development dataset 54178 23438336 ensemble, MobileNet, mini-SegNet 2 keras/tensorflow training, pruning, weight quantization, ensemble
Zhang_SATLab_task1_4 Bing2023 22 27.50 50.3 1.542 TAU Urban Acoustic Scenes 2022 Mobile, Development dataset 15892 7944192 MobileNet, ensemble 2 keras/tensorflow training, weight quantization, ensemble

Technical reports

Ai4edgept Submission to DCASE 2023 Low Complexity Acoustic Scene Classification Task1

Carlos Almeida1, Piovesan Federico2, Luis Bento1 and Mónica Figueiredo1
1Eletrotechnical Engineering, Instituto Politécnico de Leiria, Leiria, Portugal, 2Eletrotechnical Engineering, Politecnico di Torino, Torino, Italy

Abstract

The DCASE task 1 challenge aims to classify acoustic scenes using devices with low computational power and memory. The DCASE2023 challenge gives further importance to the size and multiply-accumulate operation count (MAC), this report aims to describe the submission to this challenge, following our research group’s previous work in this field, and the model submitted to DCASE 2022. We use a one-versus-all ten-network ensemble model and propose a knowledge distillation custom method to reduce model complexity. The ensemble model is used as the teacher network, distilling knowledge to the student. The student has 3 variations, the first model is a tuned version of the DCASE2022 baseline architecture, for the second model a slightly larger version of the first model and for the third model a larger version of the second model using structured pruning to further reduce model complexity. Data preprocessing is also conducted in order to further improve performance. Results show that the proposed knowledge distillation methods were able to improve the accuracy significantly.

System characteristics
Sampling rate 8kHz
Data augmentation pitch shifting, time stretching, mixup, time masking, frequency masking
Features log-mel energies
Classifier CNN, ensemble
Complexity management weight quantization; weight quantization, pruning
PDF

Mini-Segnet and Low-Complexity Mobilenet for Acoustic Scene Classification

Ge-Ge Bing1, Yun-Fei Shao1, Zhi Zhang2 and Wei-Qiang Zhang1
1Department of Electronic Engineering, Tsinghua University, Beijing, China, 2School of Computer Science, Beijing Institute of Technology, Beijing, China

Abstract

This report details the architecture we used to address task 1 of the DCASE2023 challenge. The goal of the task is to design an audio scene classification system for device-imbalanced datasets under the constraints of model complexity. Our architecture is based on (1) SegNet, applying structured pruning and quantization to reduce model complexity; (2) MobileNet with an additional frequency split block. Log-mel spectrograms, delta, and delta-delta features are extracted to train the acoustic scene classification model. Mixup, random crop, time and frequency domain masking are used for data augmentation. The proposed system achieves higher classification accuracies and lower log loss than the baseline system. After model compression, our single MobileNet model achieves an average accuracy of 51.3% with only 7.946K parameters, and 3.972M Multiply–Accumulate Operations (MACs), while pruned SegNet gets to an average accuracy of 54.46% with 46.232K parameters and 19.466M MACs.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup, temporal crop, time masking, frequency masking; mixup, temporal crop, time masking, frequency masking
Features log-mel energies
Classifier MobileNet; mini-SegNet; ensemble, MobileNet, mini-SegNet; MobileNet, ensemble
Complexity management weight quantization; pruning, weight quantization
PDF

Tencent Submission to Dcase23 Task1: Low-Complexity Deep Learning Solution for Acoustic Scene Classification

Weicheng Cai, Zhang Mingyuan and Zhang Xiang
Tencent Inc., Beijing, China

Abstract

In this technical report, we present the Tencent team’s entry for Task 1 Low-Complexity Acoustic Scene Classification in the DCASE 2023 challenge. We mainly follow the DCASE 2022 1st place solution from the CP-JKU team and have made some adjustments to meet the requirement of this year. Our approach involves employing knowledge distillation to train low-complexity CNN student models using Patchout Spectrogram Transformer (PaSST) models as teachers. We initially train the PaSST models on Audioset and then fine-tune them using the TAU Urban Acoustic Scenes 2022 Mobile development dataset. Lastly, we quantize the student models to enable 8-bit integer-based inference computations to meet the low-complexity constraints in edge devices.

System characteristics
Sampling rate 32kHz
Data augmentation mixup, mixstyle,; mixup, mixstyle
Features log-mel energies
Classifier CNN
Complexity management weight quantization
PDF

Dcase2023 Task1 Submission: Device Simulation and Time-Frequency Separable Convolution for Acoustic Scene Classification

Yiqiang Cai1, Minyu Lin1, Chenyang Zhu2, Shengchen Li1 and Xi Shao2
1School of Advanced Technology, Xi'an Jiaotong-Liverpool University, Suzhou, China, 2College of Tellecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China

Abstract

The task 1 of DCASE 2023 Challenge incorporates a weighted average ranking of accuracy and complexity, which encourages participants to build efficient systems for acoustic scene classification (ASC). In this report, we propose TF-SepNet, a low-complexity ASC model based on Time-Frequency Separable Convolution. Our network architecture consists of a series of separable convolutional layers that exploit time and frequency domains. We also improve the performance of ResNorm by adding a few learnable parameters. Furthermore, knowledge distillation is employed to transfer knowledge from large model to smaller model. Additionally, device simulation is introduced for data augmentation in the device domain. Overall, we evaluate the performance of our model on the DCASE 2023 Task 1 development dataset following the official cross-validation setup and achieve a classification accuracy of 53.9% with 6.83K parameters and 1.65M MACs.

System characteristics
Sampling rate 32kHz
Data augmentation device simulation, mixup, mixstyle
Features log-mel energies
Classifier CNN, TF-SepNet
Complexity management knowledge distillation, weight quantization; weight quantization
PDF

How Information on Soft Labels and Hard Labels Mutually Benefits Sound Event Detection Tasks

Yutong Du1, Jisheng Bai1,2, Pu Zijun1 and Chen Jianfeng1,2
1Joint Laboratory of Environmental Sound Sensing, School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, China, 2LianFeng Acoustic Technologies Co., Xi'an, China

Abstract

In this technical report, we describe our proposed system for DCASE task1: Low-Complexity Acoustic Scene Classification. First, To obtain better performance than Baseline, we choose ResNet as basic model, and add several self-attention blocks including CBAM and MHSA to get more fine-grained features and temporal features respectively from spectrogram. In order to pay attention to detailed information, add the CBAM block between two convolution layers in the ResNet block. The MHSA aims to get temporal context relationships in the spectrum. Another requirement of this task is Low-Complexity, thus, the regular convolution module is replaced by the depthwise separable convolution module in the proposed model. During experiments, we use FMix as data augmentation to improve generalization. Moreover, we use a hard-task training strategy in training process.

System characteristics
Sampling rate 44.1kHz
Data augmentation FMix
Features log-mel energies
Classifier CNN,ResNet,Transformer,CBAM
Complexity management model compression
PDF

Acoustic Scene Classification Based on Multi-Teacher Knowledge Distillation and Serfr-CNN

Hongbo Fei, Xing Li and Jie Jia
vivo Mobile Commun co Ltd, Hangzhou, China

Abstract

In this technical report, we describe our low-complexity acoustic scene classification algorithm submitted in DCASE 2023 Task 1a. We focus on knowledge distillation strategy and network innovation, multi-teacher knowledge distillation method and SERFR-CNN is proposed, which aims at the problems of insufficient classification accuracy and adaptability of current models. Based on traditional knowledge distillation method, combined with the model ensemble strategy, and then t multiteacher knowledge distillation method is proposed. In terms of audio feature extraction, we use Log-Mel spectrograms and Timefrequency masking algorithm. In order to further improve system performance, virtual data generation technology is adopted. Finally, use the trained model for transfer learning. By using proposed systems, we achieved a classification accuracy of 59.3% on the officially provided evaluation dataset, which is 16.4% over than the baseline system.

System characteristics
Sampling rate 16kHz
Data augmentation mixup; mixup, time stretching
Features log-mel energies
Classifier SERFR-CNN-32; SERFR-CNN-24
Complexity management weight quantization
PDF

Submission to DCASE 2023 Task 1: Low-Complexity Acoustic Scene Classification Using Cepstral Analysis

Yaojun Han and Nengheng Zheng
College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China

Abstract

This technical report describes the submitted system for task 1 of the DCASE 2023 challenge. The goal of this task is to design an acoustic scene classification system for devise-imbalanced datasets under the constraints of low complexity. We applied cepstrum analysis to filter out the channel information contained in the raw audio signals before the feature extraction stage. Moreover, we separate the spectral envelope and the fine structure of the spectrum in the cepstrum domain, and simply analyzed the impact of the two on the classification results. Due to the constraints of low complexity, we use knowledge distillation to allow the simpler student model to learn complex teacher models. In addition, we experimented with different augmentation techniques such as Mixup, random noise, pitch shifting, and time-frequency masking to expand the diversity of the dataset. Through the calculation of NeSsi tool, our model requires 80.845K of memory, with 29.349M MACs. And the accuracy of the model on the development dataset is 51.4%.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup, time masking, frequency masking, pitch shifting, random noise
Features log-mel energies, spectral envelope, spectrum fine structure
Classifier CNN
Complexity management knowledge distillation
PDF

Dual-Strategy Enhancement of Acoustic Scene and Event Classification: Integrating Res2net, Ghostnet, and Mobileformer Architectures

TaeSoo Kim, Daniel Rho, GaHui Lee and Jae Han Park
Computing Sciences, KT Corporation, Seoul, Korea

Abstract

In this technical report, we investigate the balance between accuracy and efficiency in the low-complexity acoustic scene classification (ASC) task for the DCASE 2023 challenge. We explore two approaches: the first prioritizes accuracy using Res2Net and GhostNet, while the second emphasizes efficiency using MobileFormer. Our study highlights the trade-offs between accuracy and efficiency in ASC models and contributes to the ongoing research on developing robust and lightweight models suitable for embedded systems.

System characteristics
Sampling rate 16kHz
Data augmentation mixup, frequency masking, temporal masking
Features log-mel energies
Classifier BCRes2Net
Decision making maximum likelihood
Complexity management weight quantization
PDF

Low-Complexity Acoustic Scene Classification Base on Depthwise Separable CNN

Zhicong Liang, Pengyuan Xie, Zhe Wang and Wenbo Cai
NetEase, Guangzhou, China

Abstract

This report outlines our submission for DCASE2023 Task1, which focuses on Low Complexity Acoustic Scene Classification. To meet this requirement, we implemented the Depthwise Separable CNN method to construct our model. This approach significantly reduces model size while improving accuracy. Additionally, we applied SpecAugment and mixup as data augmentation techniques. To further enhance our model's performance, we employed Knowledge Distillation, teaching the submission model from larger models. Overall, these techniques enable us to achieve better results on the task.

System characteristics
Sampling rate 44.1kHz
Data augmentation random cropping, SpecAugment, mixup
Features log-mel energies
Classifier CNN
PDF

Low-Complexity Deep Learning System for Acoustic Scene Classification Using Teacher-Student Scheme and Multiple Spectrograms

Lam Pham1, Ngo Dat2, Le Cam3, Anahid Naghibzadeh-Jalali1 and Alexander Schindler1
1Center for Digital Safety & Security, Austrian Institute of Technology, Vienna, Austria, 2Computing Sciences, University of Essex, Colchester, UK, 3Computing Sciences, Ho Chi Minh University of Technology, HCM city, Vietnam

Abstract

In this technical report, a low-complexity deep learning system for acoustic scene classification (ASC) is presented. The proposed system comprises two main phases: (Phase I) Training a teacher network; and (Phase II) training a student network using distilled knowledge from the teacher. In the first phase, the teacher, which presents a large footprint model, is trained. After training the teacher, the embeddings, which are the feature map of the second last layer of the teacher, are extracted. In the second phase, the student network, which presents a low complexity model, is trained with the embeddings extracted from the teacher. Our experiments conducted on DCASE 2023 Task 1 Development dataset have fulfilled the requirement of low-complexity and achieved the best classification accuracy of 57.4%, improving DCASE baseline by 14.5%.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup
Features CQT, Gammatonegram, Mel
Classifier CNN
Decision making Late fusion of predicted probabilities
PDF

Malach23 Submission to DCASE 2023: Acoustic Scene Classification with Receptive-Field Regularized Convolution Neural Networks and State Space Models

Noah Pichler, Jonathan Greif, Christian Willdoner and David Fleischanderl
Johannes Kepler University, Linz, Austria

Abstract

This report describes our approach to Task 1 of the DCASE (Detection and Classification of Acoustic Scenes and Events) Challenge. To classify urban acoustic scenes through short audio samples, we experiment with Receptive Field Regularized Convolutional Neural Networks - and S4 Models as classifiers. To stay within the allowed model-complexity limits of the challenge, we use a Convolution Neural Network (CNN) with 13 layers plus one classification layer, and one CNN layer followed by 3 S4 Blocks, respectively. Additionally, we augment the Mel Spectrograms, through the MixStyle [4] or Mixup [5] method. We surpass the baseline with our experiments significantly, and, in particular, the S4 model stands out due to its low number of multiply accumulate operations.

System characteristics
Sampling rate 32kHz
Data augmentation mixstyle; mixup
Features log-mel energies
Classifier CNN; S4
Decision making maximum likelihood
Complexity management efficient models
PDF

CP-JKU Submission to Dcase23: Efficient Acoustic Scene Classification with Cp-Mobile

Florian Schmid, Tobias Morocutti, Shahed Masoudian, Khaled Koutini and Gerhard Widmer
Institute of Computational Perception (CP), Johannes Kepler University (JKU) Linz, Linz, Austria

Abstract

In this technical report, we describe the CP-JKU team’s submission for Task 1 Low-Complexity Acoustic Scene Classification of the DCASE 23 challenge. We introduce a novel architecture, CPMobile, with regularized receptive field and residual inverted bottleneck blocks. We use Knowledge Distillation to teach CP-Mobile from an ensemble of multiple Patchout faSt Spectrogram Transformer (PaSST) and CP-ResNet models. To enhance cross-device generalization performance, Freq-MixStyle and Device Impulse Response (DIR) augmentation are applied while training teachers and students. CP-Mobile is fine-tuned using Quantization Aware Training and then quantized to perform computations in 8-bit precision. The improved teacher ensemble, the efficient student architecture and DIR augmentation improve the results on the TAU Urban Acoustic Scenes 2022 Mobile development dataset by around 5 percentage points in accuracy compared to the top-ranked submission for Task 1 of the DCASE 22 challenge.

System characteristics
Sampling rate 32kHz
Data augmentation device impulse response augmentation, mixup, freq-mixstyle, pitch shifting
Features log-mel energies
Classifier RF-regularized CNNs, PaSST transformer
Complexity management knowledge distillation, weight quantization; knowledge distillation, weight quantization, structured pruning
PDF

Submission to DCASE 2023 Task 1: Device Invariant Training with Structured Filter Pruning for Low Complexity Acoustic Scene Classification

Lorenz Schmidt, Beran Kiliç and Nils Peters
International Audio Laboratories, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany

Abstract

This technical reports describes our contribution to the DCASE challenge 2023 Acoustic Scene Classification Task 1. We apply Inverse Contrastive Learning to regularize models and generalize better to unseen devices. First we construct a teacher ensemble by fine-tuning several PaSST models and then train student models with different Memory-Accumulate Counts (MACs) hard constraints. This yields four different models with approximately MMACs of 30, 20, 10 and 5. Finally the model is quantized to 8bit in order to fulfill memory requirements of the challenge.

System characteristics
Sampling rate 32kHz
Data augmentation random cutoff, mixstyle, pitch shifting
Features log-mel energies
Classifier ensemble, RF-regularized CNNs, PaSST transformer
Decision making generalized mean; weighted generalized mean
Complexity management weight quantization, pruning
PDF

Low-Complexity Acoustic Scene Classification Using Convolution Neural Network

Ee-Leng Tan, Jin Jie Yeo, Santi Peksi and Woon-Seng Gan
EEE, Nanyang Technological Univeristy, Singapore, Singapore

Abstract

In this technical report, we describe the CISS-NTU team’s submission for Task 1 Low-Complexity Acoustic Scene Classification of the detection and classification of acoustic scenes and events (DCASE) 2023 challenge [1]. We have explored and adapted the hyperparameters of the baseline (BL) system provided in this challenge. The TAU Urban Acoustic Scene 2022 Mobile, development dataset [2] has been used to train and validate our models. Each audio sample is transformed into 160 log-mel energies. Three models are submitted with two trained using the development dataset and one trained using the development dataset combined with augmented samples. The best performing model achieves an accuracy of 52.1% and a log loss of 1.372, and only requires 6.46 M of multiply-and-accumulate (MAC) operations and has a memory usage of 54.30 KB

System characteristics
Sampling rate 44.1kHz
Features log-mel energies
Classifier CNN
Complexity management weight quantization
PDF

Low-Complexity Acoustic Scene Classification Using Blueprint Separable Convolution and Knowledge Distillation

Jiaxin Tan and Yanxiong Li
School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China

Abstract

This technical report describes our proposed system for Task 1 in Detection and Classification of Acoustic Scenes and Events (DCASE) 2023. We design a teacher model based on blueprint separable Convolution (BSConv) with reference to the middle layer of the blueprint separable residual network. To meet the requirements of system complexity, we adopt knowledge distillation to teach student models from teacher model. Data augmentations (e.g., Mixstyle, SpecAugment, and spectrum modulation) are applied to prevent overfitting. When evaluated on the development data, one of the proposed systems obtains the accuracy score of 54.9% and has 73,386 parameters with 13.18 million multiply-and-accumulate operations.

System characteristics
Sampling rate 44.1kHz
Features mel-spectrogram
Classifier CNN
Complexity management weight quantization
PDF

Hierarchical Knowledge Distillation: A Multi-Stage Learning Approach

Quoc Vo and David Han
Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA

Abstract

This technical report details our approach to Task 1 of the 2023 Detection and Classification of Acoustic Scenes and Event (DCASE2023), which focuses on the classification of recorded audios for acoustic scene recognition. The task calls for a quantized model of no more than 128KB in memory allowance for model parameters and a maximum of 30 millions of multiply-accumulate operations (MMACS) per inference. Our solution exploits log-mel sprectrogram features and leverages multiple data augmentations. Our proposed methodology utilizes an audio spectrogram transformer (AST) as the teacher model and multiple Convolutional Neural Network (CNN) models as students in a hierarchical knowledge distillation (KD) framework. This approach aids in bridging the substantial parameter disparity between the teacher model, which has over 86 million parameters, and our compact CNN-based model limited to just 119,526 parameters. Upon network training completion, the variable type of the weight data is converted into type INT8 to meet the size constraints. Our INT8 model achieves a log-loss of 1.59 and an accuracy of 46.01% on the TAU Urban Acoustic Scenes 2022 Mobile Development dataset’s standard test set, signifying the efficacy of our framework. Our proposed method demonstrates the potential of distillation strategies in optimizing smaller models without compromising their learning ability in a hierarchical approach.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup, SpecAugment
Features log-mel spectrogram
Classifier CNN, Transformer, Knowledge Distillation
Complexity management weight quantization
PDF

Low-Complexity Acoustic Scene Classification Using Deep Space Separable Distillation Module and Multi-Label Learning

Kangli Wang, Yiling Wu and Yanxiong Li
South China University of Technology, China, GuangZhou

Abstract

This technical report describes our system for Task 1 in Detection and Classification of Acoustic Scenes and Events (DCASE) 2023. We propose a deep space separable distillation block as the basic unit of the model, using its strong block processing ability to continuously cut the high-frequency and low-frequency parts of the log-Mel spectrogram. The accuracy is improved by multi-scale embedding and multi-task learning methods. To prevent overfitting, we adopt data augmentation methods such as mixing, speculation and spectral modulation. Quantization aware training is adopted to quantize the model to meet the requirements of edge devices with low complexity constraints. The proposed system achieves a 53.3% accuracy on the development dataset with only a parameter count of 45.16 kB and the MACs of 8.64 M.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup,speculation and spectral modulation
Features log-mel energies
Classifier CNN
Complexity management weight quantization
PDF

Low-Complexity Acoustic Scene Classification Using Deep Mutual Learning and Knowledge Distillation Fine-Tuning

Shilong Weng, Liu Yang and Binghong Xu
School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China

Abstract

This technical report describes our submission for task 1 lowcomplexity acoustic scene classification of the DCASE 2023 challenge. To enhance the generalization to unseen devices, the reassembled 10-second audio is convolved with a microphone impulse response randomly selected from the Microphone Impulse Response Project library before fed into models. Then a ResNet38 teacher model pre-trained on AudioSet and three low-complexity BC-Res2Net student models are involved in Deep Mutual Learning to further improve the performance of the teacher model, and obtain a well-initialized student model as well. Next, we use Knowledge Distillation fine-tuning to teach the student model to learn from the well-performing teacher model while maintaining the predictive performance of the teacher model. Finally, the student model is quantized by Post-Training Static Quantization to implement inference computations using 8-bit integers.

System characteristics
Sampling rate 32kHz
Data augmentation Conv_IR, time shifting, time-frequency masking, mixstyle
Features log-mel energies
Classifier CNN
Decision making average
Complexity management weight quantization
PDF

Tiny Audio Spectrogram Transformer: Mobilevit for Low-Complexity Acoustic Scene Classification with Decoupled Knowledge Distillation

Jinyang Yu1, Zikai Song2,3, Jiahao Ji2,3, Lixian Zhu2,3, Kele Xu1, Kun Qian2,3, Yong Dou1 and Bin Hu2,3
1Computer Department, National University of Defense Technology, Changsha, P.R. China, 2Ministry of Education (Beijing Institute of Technology), Key Laboratory of Brain Health Intelligent Evaluation and Intervention, P.R. China, 3Beijing Institute of Technology, School of Medical Technology, P.R. China

Abstract

This report presents BIT&NUDT submissions to DCASE2023 challenge Task1, which aims to acoustic scene classification (ASC) with low complexity. Several vision transformers adapted to audio classification tasks have been proved to be more robust than CNNs due to their global representations. However, considering the complexity of self-attention, they seem not fit for lightweight edge devices. In our submission, we transfer a light-weight vision transformer, MobileViT from image tasks to ASC. By inserting the MobileViT block into CNN, our network can benefit from both attention global representations and CNN spatial representations. Under the parameter memory limitation of 128KB, we make quantization and convert a part of the parameters to INT8 for balance between complexity and accuracy. Further more, we use Decoupled Knowledge Distillation to take advantage of PaSST teacher models which outperformed in previous DCASE challenge.

System characteristics
Sampling rate 32kHz
Data augmentation timerolling, pitch shifting, gaussian noise, specaugment, mixup, mixstyle
Features log-mel energies
Classifier CNN + Transformer
Complexity management weight quantization
PDF

Acoustic Scene Classification Based on Pruned_ghostnet and Fhr_mobilenet

Lin Zhang, Hongxia Dong, Menglong Wu and Xichang Cai
Electronic and Communication Engineering, North China University of Technology, Beijing, China

Abstract

This technical report describes our submission for Task 1 of the DCASE2023 challenge. We computed the logarithmic mel spectrogram for each audio segment under the condition of the original sampling rate of 44.1KHz. In addition, to obtain richer feature information, we also computed the first-order and second-order differences on top of the logarithmic mel spectrogram. The resulting spectrogram has 128-frequency bins, 43-time bins, and 3 channels. The feature maps were then fed into classification networks, where we employed two schemes, namely Pruned_GhostNet and FHR_MobileNet.The achieved accuracies were 47% and 52.8%, respectively, with model parameters of123.648K and 76.224K, and MACs of 7.375M and 28.461M.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup, specaugment
Features log-mel energies,delta and delta-delta
Classifier GhostNet; FHR_Mobilenet
Decision making average
Complexity management weight quantization
PDF