Task description
The goal of acoustic scene classification is to classify a test recording into one of the predefined ten acoustic scene classes. This targets acoustic scene classification with devices with low computational and memory allowance, which impose certain limits on the model complexity, such as the model’s number of parameters and the multiply-accumulate operations count. In addition to low-complexity, the aim is generalization across a number of different devices. For this purpose, the task will use audio data recorded and simulated with a variety of devices.
The development dataset consists of recordings from 10 European cities using 9 different devices: 3 real devices (A, B, C) and 6 simulated devices (S1-S6). Data from devices B, C, and S1-S6 consists of randomly selected segments from the simultaneous recordings, therefore all overlap with the data from device A, but not necessarily with each other. The total amount of audio in the development set is 64 hours.
The evaluation dataset contains data from 12 cities, 10 acoustic scenes, 11 devices. There are five new devices (not available in the development set): real device D and simulated devices S7-S11. Evaluation data contains 22 hours of audio.
The device A consists in a Soundman OKM II Klassik/studio A3, electret binaural microphone and a Zoom F8 audio recorder using 48kHz sampling rate and 24-bit resolution. The other devices are commonly available customer devices: device B is a Samsung Galaxy S7, device C is iPhone SE, and device D is a GoPro Hero5 Session.
More detailed task description can be found in the task description page
Systems ranking
Submission information | Evaluation dataset | Development dataset | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Rank | Submission label | Name |
Technical Report |
Official system rank |
Rank value |
Performance rank |
Memory rank |
MACs rank |
Accuracy with 95% confidence interval (Evaluation dataset) |
Logloss with 95% confidence interval (Evaluation dataset) |
Accuracy (Development dataset) | Logloss (Development dataset) |
AI4EDGE_IPL_task1_1 | AI4EDGE_1 | Almeida2023 | 28 | 30.25 | 34 | 14 | 39 | 51.9 (51.7 - 52.2) | 1.920 (1.786 - 2.054) | 67.5 | 0.932 | |
AI4EDGE_IPL_task1_2 | AI4EDGE_2 | Almeida2023 | 47 | 43.75 | 51 | 25 | 48 | 48.8 (48.5 - 49.1) | 1.996 (1.859 - 2.133) | 69.9 | 0.872 | |
AI4EDGE_IPL_task1_3 | AI4EDGE_3 | Almeida2023 | 38 | 37.25 | 43 | 23 | 40 | 50.8 (50.5 - 51.1) | 1.364 (1.308 - 1.420) | 66.0 | 0.969 | |
Bai_JLESS_task1_1 | JLESS | Du2023 | 54 | 50.25 | 56 | 48 | 41 | 47.9 (47.6 - 48.2) | 1.825 (1.727 - 1.923) | 50.8 | 1.436 | |
Bai_JLESS_task1_2 | JLESS | Du2023 | 47 | 43.75 | 52 | 47 | 24 | 48.8 (48.5 - 49.1) | 1.791 (1.691 - 1.891) | 50.5 | 1.467 | |
Cai_TENCENT_task1_1 | Cai_1 | Cai2023 | 17 | 25.75 | 7 | 45 | 44 | 56.6 (56.3 - 56.9) | 1.174 (1.122 - 1.225) | 57.5 | 1.147 | |
Cai_TENCENT_task1_2 | Cai_2 | Cai2023 | 11 | 21.25 | 10 | 31 | 34 | 56.2 (55.9 - 56.5) | 1.246 (1.196 - 1.295) | 58.1 | 1.178 | |
Cai_TENCENT_task1_3 | Cai_3 | Cai2023 | 14 | 22.25 | 12 | 31 | 34 | 55.8 (55.5 - 56.1) | 1.241 (1.194 - 1.288) | 57.4 | 1.190 | |
Cai_TENCENT_task1_4 | Cai_4 | Cai2023 | 11 | 21.25 | 17 | 21 | 30 | 55.4 (55.1 - 55.7) | 1.252 (1.207 - 1.297) | 57.0 | 1.198 | |
Cai_XJTLU_task1_1 | TFSepNet1 | Cai2023a | 9 | 19.75 | 36 | 2 | 5 | 51.9 (51.6 - 52.2) | 1.307 (1.261 - 1.352) | 53.9 | 1.257 | |
Cai_XJTLU_task1_2 | TFSepNet2 | Cai2023a | 8 | 18.25 | 33 | 2 | 5 | 52.5 (52.3 - 52.8) | 1.292 (1.246 - 1.337) | 51.9 | 1.307 | |
Cai_XJTLU_task1_3 | TFSepNet3 | Cai2023a | 6 | 14.00 | 22 | 5 | 7 | 55.1 (54.8 - 55.4) | 1.223 (1.173 - 1.273) | 57.5 | 1.160 | |
Cai_XJTLU_task1_4 | TFSepNet4 | Cai2023a | 3 | 11.50 | 5 | 18 | 18 | 57.0 (56.7 - 57.3) | 1.241 (1.184 - 1.299) | 64.3 | 0.989 | |
Fei_vv_task1_1 | vv_1 | Fei2023 | 15 | 24.75 | 19 | 38 | 23 | 55.2 (55.0 - 55.5) | 1.282 (1.223 - 1.341) | 59.3 | 1.145 | |
Fei_vv_task1_2 | vv_2 | Fei2023 | 13 | 21.75 | 24 | 26 | 13 | 54.5 (54.2 - 54.8) | 1.290 (1.232 - 1.348) | 56.7 | 1.204 | |
Fei_vv_task1_3 | vv_3 | Fei2023 | 26 | 29.25 | 28 | 38 | 23 | 53.2 (53.0 - 53.5) | 1.349 (1.286 - 1.412) | 58.0 | 1.231 | |
Fei_vv_task1_4 | vv_4 | Fei2023 | 23 | 28.25 | 37 | 26 | 13 | 51.8 (51.5 - 52.0) | 1.370 (1.306 - 1.434) | 55.4 | 1.284 | |
Han_SZU_task1_1 | Han_SZU_1 | Han2023 | 39 | 37.75 | 44 | 13 | 50 | 50.5 (50.3 - 50.8) | 2.011 (1.886 - 2.137) | 51.4 | 1.378 | |
LAM_AEV_task1_1 | AEV_sys_1 | Pham2023 | 25 | 29.00 | 18 | 33 | 47 | 55.3 (55.1 - 55.6) | 1.847 (1.726 - 1.968) | 56.8 | 1.349 | |
LAM_AEV_task1_2 | AEV_sys_2 | Pham2023 | 31 | 31.50 | 23 | 33 | 47 | 55.0 (54.7 - 55.3) | 2.083 (1.941 - 2.224) | 57.4 | 1.333 | |
LAM_AEV_task1_3 | AEV_sys_3 | Pham2023 | 20 | 27.00 | 14 | 33 | 47 | 55.6 (55.3 - 55.9) | 1.933 (1.810 - 2.055) | 57.4 | 1.333 | |
Liang_NTES_task1_1 | NTES_1 | Liang2023 | 40 | 38.50 | 32 | 41 | 49 | 52.6 (52.3 - 52.8) | 1.402 (1.339 - 1.465) | 54.9 | 1.293 | |
MALACH23_JKU_task1_1 | RFR-CNN-1 | Pichler2023 | 8 | 18.25 | 6 | 36 | 25 | 57.0 (56.7 - 57.3) | 1.230 (1.180 - 1.279) | 55.2 | 1.280 | |
MALACH23_JKU_task1_2 | RFR-CNN-2 | Pichler2023 | 7 | 16.75 | 8 | 32 | 19 | 56.6 (56.3 - 56.8) | 1.242 (1.196 - 1.288) | 53.5 | 1.323 | |
MALACH23_JKU_task1_3 | S4-1 | Pichler2023 | 42 | 39.50 | 67 | 22 | 2 | 9.9 (9.7 - 10.1) | 4.354 (4.289 - 4.418) | 46.7 | 1.496 | |
MALACH23_JKU_task1_4 | S4-2 | Pichler2023 | 43 | 39.75 | 68 | 22 | 1 | 9.8 (9.7 - 10.0) | 3.224 (3.184 - 3.265) | 45.1 | 1.509 | |
DCASE2023 baseline | Baseline | 52 | 46.75 | 64 | 13 | 46 | 44.8 (44.5 - 45.1) | 1.523 (1.478 - 1.568) | 42.9 | 1.575 | ||
Park_KT_task1_1 | KT_1 | Kim2023 | 10 | 20.75 | 9 | 34 | 31 | 56.3 (56.0 - 56.6) | 1.495 (1.410 - 1.580) | 72.5 | 0.824 | |
Park_KT_task1_2 | KT_2 | Kim2023 | 29 | 30.75 | 29 | 34 | 31 | 53.0 (52.7 - 53.3) | 1.660 (1.569 - 1.751) | 56.1 | 1.446 | |
Park_KT_task1_3 | KT_3 | Kim2023 | 26 | 29.25 | 26 | 34 | 31 | 54.2 (54.0 - 54.5) | 2.230 (2.107 - 2.353) | 70.5 | 1.167 | |
Park_KT_task1_4 | KT_MF | Kim2023 | 20 | 27.00 | 49 | 7 | 3 | 49.2 (48.9 - 49.5) | 1.510 (1.469 - 1.550) | 54.2 | 1.427 | |
Schmid_CPJKU_task1_1 | CPM bc=8 | Schmid2023 | 5 | 13.75 | 25 | 1 | 4 | 54.4 (54.1 - 54.6) | 1.313 (1.245 - 1.380) | 52.6 | 1.370 | |
Schmid_CPJKU_task1_2 | CPM bc=16 | Schmid2023 | 1 | 5.25 | 4 | 4 | 9 | 58.7 (58.4 - 59.0) | 1.256 (1.181 - 1.332) | 58.4 | 1.200 | |
Schmid_CPJKU_task1_3 | CPM bc=24 | Schmid2023 | 2 | 7.00 | 2 | 8 | 16 | 61.4 (61.2 - 61.7) | 1.153 (1.085 - 1.221) | 61.8 | 1.090 | |
Schmid_CPJKU_task1_4 | CPM bc=32 | Schmid2023 | 3 | 11.50 | 1 | 16 | 28 | 62.7 (62.4 - 63.0) | 1.117 (1.047 - 1.187) | 64.1 | 1.007 | |
Schmidt_FAU_task1_1 | 30mmacs | Schmidt2023 | 26 | 29.25 | 13 | 46 | 45 | 55.7 (55.4 - 56.0) | 1.322 (1.271 - 1.373) | 57.5 | 1.217 | |
Schmidt_FAU_task1_2 | 20mmacs | Schmidt2023 | 12 | 21.50 | 15 | 24 | 32 | 55.6 (55.3 - 55.9) | 1.337 (1.281 - 1.392) | 57.2 | 1.211 | |
Schmidt_FAU_task1_3 | 10mmacs | Schmidt2023 | 19 | 26.75 | 31 | 28 | 17 | 52.7 (52.4 - 53.0) | 1.398 (1.346 - 1.450) | 54.5 | 1.298 | |
Schmidt_FAU_task1_4 | 5mmacs | Schmidt2023 | 24 | 28.75 | 48 | 9 | 10 | 49.7 (49.4 - 50.0) | 1.482 (1.427 - 1.536) | 50.6 | 1.417 | |
Tan_NTU_task1_1 | TYPG_T1_1 | Tan2023 | 33 | 33.50 | 59 | 10 | 6 | 47.1 (46.8 - 47.4) | 1.508 (1.461 - 1.554) | 50.3 | 1.397 | |
Tan_NTU_task1_2 | TYPG_T1_2 | Tan2023 | 35 | 34.00 | 54 | 17 | 11 | 48.5 (48.2 - 48.8) | 1.461 (1.417 - 1.505) | 52.1 | 1.372 | |
Tan_NTU_task1_3 | TYPG_T1_3 | Tan2023 | 37 | 37.00 | 60 | 17 | 11 | 46.3 (46.1 - 46.6) | 1.492 (1.449 - 1.534) | 50.0 | 1.381 | |
Tan_SCUT_task1_1 | BSConv1_1 | Tan2023a | 4 | 13.50 | 3 | 27 | 21 | 60.8 (60.6 - 61.1) | 1.192 (1.119 - 1.265) | 55.7 | 1.318 | |
Tan_SCUT_task1_2 | BSConv1_2 | Tan2023a | 30 | 31.00 | 38 | 27 | 21 | 51.7 (51.4 - 52.0) | 1.444 (1.378 - 1.509) | 54.3 | 1.243 | |
Tan_SCUT_task1_3 | BSConv1_3 | Tan2023a | 16 | 25.50 | 27 | 27 | 21 | 53.5 (53.2 - 53.8) | 1.441 (1.370 - 1.513) | 55.5 | 1.320 | |
Tan_SCUT_task1_4 | BSConv1_4 | Tan2023a | 32 | 33.00 | 42 | 27 | 21 | 50.9 (50.6 - 51.2) | 1.525 (1.455 - 1.595) | 54.5 | 1.305 | |
Vo_DU_task1_1 | HKD-MLA-1 | Vo2023 | 52 | 46.75 | 63 | 35 | 26 | 45.0 (44.7 - 45.2) | 2.157 (2.035 - 2.279) | 46.0 | 1.591 | |
Vo_DU_task1_2 | HKD-MLA-2 | Vo2023 | 53 | 47.75 | 65 | 35 | 26 | 44.8 (44.5 - 45.1) | 2.116 (2.003 - 2.229) | 45.4 | 1.640 | |
Vo_DU_task1_3 | HKD-MLA-3 | Vo2023 | 50 | 46.25 | 62 | 35 | 26 | 45.2 (44.9 - 45.5) | 2.092 (1.973 - 2.211) | 45.8 | 1.624 | |
Vo_DU_task1_4 | HKD-MLA-4 | Vo2023 | 49 | 45.75 | 61 | 35 | 26 | 45.5 (45.2 - 45.8) | 1.793 (1.717 - 1.869) | 46.7 | 1.617 | |
Wang_SCUT_task1_1 | DSSDM1 | Wang2023 | 31 | 31.50 | 50 | 11 | 15 | 49.1 (48.9 - 49.4) | 1.493 (1.434 - 1.553) | 53.3 | 1.287 | |
Wang_SCUT_task1_2 | DSSDM2 | Wang2023 | 18 | 26.50 | 30 | 19 | 27 | 52.9 (52.6 - 53.2) | 1.348 (1.300 - 1.397) | 56.4 | 1.191 | |
Wang_SCUT_task1_3 | DSSDM3 | Wang2023 | 46 | 43.50 | 58 | 20 | 38 | 47.1 (46.8 - 47.4) | 1.702 (1.621 - 1.782) | 50.8 | 1.477 | |
Wang_SCUT_task1_4 | DSSDM4 | Wang2023 | 48 | 45.00 | 55 | 37 | 33 | 48.5 (48.2 - 48.8) | 1.472 (1.416 - 1.529) | 52.4 | 1.368 | |
XuQianHu_BIT&NUDT_task1_1 | DYXS_t1_1 | Yu2023 | 45 | 40.50 | 41 | 44 | 36 | 51.0 (50.7 - 51.3) | 1.364 (1.319 - 1.409) | 59.0 | 1.164 | |
XuQianHu_BIT&NUDT_task1_2 | DYXS_t1_2 | Yu2023 | 44 | 40.00 | 39 | 40 | 42 | 51.6 (51.3 - 51.9) | 1.355 (1.308 - 1.401) | 60.6 | 1.141 | |
XuQianHu_BIT&NUDT_task1_3 | DYXS_t1_3 | Yu2023 | 41 | 39.25 | 46 | 43 | 22 | 50.0 (49.8 - 50.3) | 1.395 (1.346 - 1.445) | 59.6 | 1.168 | |
XuQianHu_BIT&NUDT_task1_4 | DYXS_t1_4 | Yu2023 | 36 | 35.50 | 40 | 42 | 20 | 51.1 (50.9 - 51.4) | 1.367 (1.324 - 1.411) | 61.3 | 1.139 | |
Yang_GZHU_task1_1 | dml_kd | Weng2023 | 15 | 24.75 | 16 | 30 | 37 | 55.5 (55.3 - 55.8) | 1.280 (1.220 - 1.339) | 59.7 | 1.151 | |
Yang_GZHU_task1_2 | dml_kd_tta | Weng2023 | 14 | 22.25 | 11 | 30 | 37 | 55.9 (55.6 - 56.2) | 1.241 (1.184 - 1.298) | 59.9 | 1.115 | |
Yang_GZHU_task1_3 | dml | Weng2023 | 21 | 27.25 | 21 | 30 | 37 | 55.1 (54.8 - 55.4) | 1.279 (1.217 - 1.340) | 57.9 | 1.170 | |
Yang_GZHU_task1_4 | dml_tta | Weng2023 | 19 | 26.75 | 20 | 30 | 37 | 55.2 (54.9 - 55.5) | 1.259 (1.201 - 1.318) | 58.0 | 1.163 | |
Zhang_NCUT_task1_1 | Zhang1_NCUT | Zhang2023 | 49 | 45.75 | 66 | 39 | 12 | 43.3 (43.0 - 43.6) | 1.757 (1.692 - 1.821) | 47.0 | 1.671 | |
Zhang_NCUT_task1_2 | Zhang2_NCUT | Zhang2023 | 51 | 46.50 | 57 | 29 | 43 | 47.9 (47.6 - 48.2) | 1.533 (1.476 - 1.590) | 52.8 | 1.347 | |
Zhang_SATLab_task1_1 | SATLab_1 | Bing2023 | 26 | 29.25 | 53 | 3 | 8 | 48.8 (48.5 - 49.1) | 3.248 (3.031 - 3.465) | 65.4 | 0.941 | |
Zhang_SATLab_task1_2 | SATLab_2 | Bing2023 | 34 | 33.75 | 47 | 12 | 29 | 50.0 (49.7 - 50.2) | 4.213 (3.928 - 4.497) | 54.5 | 1.437 | |
Zhang_SATLab_task1_3 | SATLab_3 | Bing2023 | 27 | 30.00 | 35 | 15 | 35 | 51.9 (51.6 - 52.2) | 1.704 (1.620 - 1.789) | 57.0 | 1.342 | |
Zhang_SATLab_task1_4 | SATLab_4 | Bing2023 | 22 | 27.50 | 45 | 6 | 14 | 50.3 (50.0 - 50.6) | 1.542 (1.483 - 1.601) | 53.2 | 1.439 |
Teams ranking
Submission information | Evaluation dataset | Development dataset | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Rank | Submission label | Name |
Technical Report |
Official team rank |
Rank value |
Performance rank |
Memory rank |
MACs rank |
Accuracy with 95% confidence interval (Evaluation dataset) |
Logloss with 95% confidence interval (Evaluation dataset) |
Accuracy (Development dataset) | Logloss (Development dataset) |
AI4EDGE_IPL_task1_1 | AI4EDGE_1 | Almeida2023 | 13 | 30.25 | 34 | 14 | 39 | 51.9 (51.7 - 52.2) | 1.920 (1.786 - 2.054) | 67.5 | 0.932 | |
Bai_JLESS_task1_2 | JLESS | Du2023 | 18 | 43.75 | 52 | 47 | 24 | 48.8 (48.5 - 49.1) | 1.791 (1.691 - 1.891) | 50.5 | 1.467 | |
Cai_TENCENT_task1_2 | Cai_2 | Cai2023 | 6 | 21.25 | 10 | 31 | 34 | 56.2 (55.9 - 56.5) | 1.246 (1.196 - 1.295) | 58.1 | 1.178 | |
Cai_XJTLU_task1_4 | TFSepNet4 | Cai2023a | 2 | 11.50 | 5 | 18 | 18 | 57.0 (56.7 - 57.3) | 1.241 (1.184 - 1.299) | 64.3 | 0.989 | |
Fei_vv_task1_2 | vv_2 | Fei2023 | 8 | 21.75 | 24 | 26 | 13 | 54.5 (54.2 - 54.8) | 1.290 (1.232 - 1.348) | 56.7 | 1.204 | |
Han_SZU_task1_1 | Han_SZU_1 | Han2023 | 16 | 37.75 | 44 | 13 | 50 | 50.5 (50.3 - 50.8) | 2.011 (1.886 - 2.137) | 51.4 | 1.378 | |
LAM_AEV_task1_3 | AEV_sys_3 | Pham2023 | 11 | 27.00 | 14 | 33 | 47 | 55.6 (55.3 - 55.9) | 1.933 (1.810 - 2.055) | 57.4 | 1.333 | |
Liang_NTES_task1_1 | NTES_1 | Liang2023 | 17 | 38.50 | 32 | 41 | 49 | 52.6 (52.3 - 52.8) | 1.402 (1.339 - 1.465) | 54.9 | 1.293 | |
MALACH23_JKU_task1_2 | RFR-CNN-2 | Pichler2023 | 4 | 16.75 | 8 | 32 | 19 | 56.6 (56.3 - 56.8) | 1.242 (1.196 - 1.288) | 53.5 | 1.323 | |
DCASE2023 baseline | Baseline | 21 | 46.75 | 64 | 13 | 46 | 44.8 (44.5 - 45.1) | 1.523 (1.478 - 1.568) | 42.9 | 1.575 | ||
Park_KT_task1_1 | KT_1 | Kim2023 | 5 | 20.75 | 9 | 34 | 31 | 56.3 (56.0 - 56.6) | 1.495 (1.410 - 1.580) | 72.5 | 0.824 | |
Schmid_CPJKU_task1_2 | CPM bc=16 | Schmid2023 | 1 | 5.25 | 4 | 4 | 9 | 58.7 (58.4 - 59.0) | 1.256 (1.181 - 1.332) | 58.4 | 1.200 | |
Schmidt_FAU_task1_2 | 20mmacs | Schmidt2023 | 7 | 21.50 | 15 | 24 | 32 | 55.6 (55.3 - 55.9) | 1.337 (1.281 - 1.392) | 57.2 | 1.211 | |
Tan_NTU_task1_1 | TYPG_T1_1 | Tan2023 | 14 | 33.50 | 59 | 10 | 6 | 47.1 (46.8 - 47.4) | 1.508 (1.461 - 1.554) | 50.3 | 1.397 | |
Tan_SCUT_task1_1 | BSConv1_1 | Tan2023a | 3 | 13.50 | 3 | 27 | 21 | 60.8 (60.6 - 61.1) | 1.192 (1.119 - 1.265) | 55.7 | 1.318 | |
Vo_DU_task1_4 | HKD-MLA-4 | Vo2023 | 19 | 45.75 | 61 | 35 | 26 | 45.5 (45.2 - 45.8) | 1.793 (1.717 - 1.869) | 46.7 | 1.617 | |
Wang_SCUT_task1_2 | DSSDM2 | Wang2023 | 10 | 26.50 | 30 | 19 | 27 | 52.9 (52.6 - 53.2) | 1.348 (1.300 - 1.397) | 56.4 | 1.191 | |
XuQianHu_BIT&NUDT_task1_4 | DYXS_t1_4 | Yu2023 | 15 | 35.50 | 40 | 42 | 20 | 51.1 (50.9 - 51.4) | 1.367 (1.324 - 1.411) | 61.3 | 1.139 | |
Yang_GZHU_task1_2 | dml_kd_tta | Weng2023 | 9 | 22.25 | 11 | 30 | 37 | 55.9 (55.6 - 56.2) | 1.241 (1.184 - 1.298) | 59.9 | 1.115 | |
Zhang_NCUT_task1_1 | Zhang1_NCUT | Zhang2023 | 20 | 45.75 | 66 | 39 | 12 | 43.3 (43.0 - 43.6) | 1.757 (1.692 - 1.821) | 47.0 | 1.671 | |
Zhang_SATLab_task1_4 | SATLab_4 | Bing2023 | 12 | 27.50 | 45 | 6 | 14 | 50.3 (50.0 - 50.6) | 1.542 (1.483 - 1.601) | 53.2 | 1.439 |
System complexity
Submission information | Evaluation dataset | Acoustic model | System | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Rank | Submission label |
Technical Report |
Official system rank |
Rank value | Accuracy (Eval) | Logloss (Eval) | MACS | Memory use | Parameters |
Non-zero parameters |
Sparsity |
Complexity management |
AI4EDGE_IPL_task1_1 | Almeida2023 | 28 | 30.25 | 51.9 | 1.920 | 25475456 | 62720 | 52852 | 51986 | 0.016385378036781972 | weight quantization | |
AI4EDGE_IPL_task1_2 | Almeida2023 | 47 | 43.75 | 48.8 | 1.996 | 29304736 | 53760 | 68996 | 67826 | 0.016957504782885935 | weight quantization | |
AI4EDGE_IPL_task1_3 | Almeida2023 | 38 | 37.25 | 50.8 | 1.364 | 26711936 | 49280 | 65192 | 64034 | 0.017762915695177295 | weight quantization, pruning | |
Bai_JLESS_task1_1 | Du2023 | 54 | 50.25 | 47.9 | 1.825 | 27931612 | 78252 | 78252 | 78252 | 0.0 | model compression | |
Bai_JLESS_task1_2 | Du2023 | 47 | 43.75 | 48.8 | 1.791 | 14130372 | 60458 | 60458 | 60458 | 0.0 | model compression | |
Cai_TENCENT_task1_1 | Cai2023 | 17 | 25.75 | 56.6 | 1.174 | 28840396 | 127684 | 127684 | 127684 | 0.0 | weight quantization | |
Cai_TENCENT_task1_2 | Cai2023 | 11 | 21.25 | 56.2 | 1.246 | 21990724 | 79942 | 79942 | 79942 | 0.0 | weight quantization | |
Cai_TENCENT_task1_3 | Cai2023 | 14 | 22.25 | 55.8 | 1.241 | 21990724 | 79942 | 79942 | 79942 | 0.0 | weight quantization | |
Cai_TENCENT_task1_4 | Cai2023 | 11 | 21.25 | 55.4 | 1.252 | 19533124 | 63558 | 63558 | 63558 | 0.0 | weight quantization | |
Cai_XJTLU_task1_1 | Cai2023a | 9 | 19.75 | 51.9 | 1.307 | 1649349 | 6828 | 6828 | 6828 | 0.0 | knowledge distillation, weight quantization | |
Cai_XJTLU_task1_2 | Cai2023a | 8 | 18.25 | 52.5 | 1.292 | 1649349 | 6828 | 6828 | 6828 | 0.0 | weight quantization | |
Cai_XJTLU_task1_3 | Cai2023a | 6 | 14.00 | 55.1 | 1.223 | 3424245 | 15890 | 15890 | 15890 | 0.0 | weight quantization | |
Cai_XJTLU_task1_4 | Cai2023a | 3 | 11.50 | 57.0 | 1.241 | 10219540 | 54260 | 54260 | 54260 | 0.0 | weight quantization | |
Fei_vv_task1_1 | Fei2023 | 15 | 24.75 | 55.2 | 1.282 | 13402932 | 123636 | 123636 | 123636 | 0.0 | weight quantization | |
Fei_vv_task1_2 | Fei2023 | 13 | 21.75 | 54.5 | 1.290 | 7802348 | 70588 | 70588 | 70588 | 0.0 | weight quantization | |
Fei_vv_task1_3 | Fei2023 | 26 | 29.25 | 53.2 | 1.349 | 13402932 | 123636 | 123636 | 123636 | 0.0 | weight quantization | |
Fei_vv_task1_4 | Fei2023 | 23 | 28.25 | 51.8 | 1.370 | 7802348 | 70588 | 70588 | 70588 | 0.0 | weight quantization | |
Han_SZU_task1_1 | Han2023 | 39 | 37.75 | 50.5 | 2.012 | 29.349M | 80845 | 80845 | 80845 | 0.0 | knowledge distillation | |
LAM_AEV_task1_1 | Pham2023 | 25 | 29.00 | 55.3 | 1.847 | 29267550 | 88704 | 22962 | 22176 | 0.034230467729291836 | ||
LAM_AEV_task1_2 | Pham2023 | 31 | 31.50 | 55.0 | 2.083 | 29267550 | 88704 | 22962 | 22176 | 0.034230467729291836 | ||
LAM_AEV_task1_3 | Pham2023 | 20 | 27.00 | 55.6 | 1.933 | 29267550 | 88704 | 22962 | 22176 | 0.034230467729291836 | ||
Liang_NTES_task1_1 | Liang2023 | 40 | 38.50 | 52.6 | 1.402 | 29591778 | 143345 | 31260 | 31260 | 0.0 | ||
MALACH23_JKU_task1_1 | Pichler2023 | 8 | 18.25 | 57.0 | 1.230 | 14686940 | 119608 | 59804 | 59804 | 0.0 | ||
MALACH23_JKU_task1_2 | Pichler2023 | 7 | 16.75 | 56.6 | 1.242 | 10819292 | 87160 | 43580 | 43580 | 0.0 | efficient models | |
MALACH23_JKU_task1_3 | Pichler2023 | 42 | 39.50 | 9.9 | 4.354 | 572340 | 116008 | 116648 | 29162 | 0.75 | efficient models | |
MALACH23_JKU_task1_4 | Pichler2023 | 43 | 39.75 | 9.8 | 3.224 | 214420 | 63592 | 15994 | 15994 | 0.0 | efficient models | |
DCASE2023 baseline | 52 | 46.75 | 44.8 | 1.523 | 29234920 | 65280 | 46512 | 46512 | 0.0 | weight quantization | ||
Park_KT_task1_1 | Kim2023 | 10 | 20.75 | 56.3 | 1.495 | 19556096 | 250000 | 92070 | 92070 | 0.0 | weight quantization | |
Park_KT_task1_2 | Kim2023 | 29 | 30.75 | 53.0 | 1.660 | 19556096 | 250000 | 92070 | 92070 | 0.0 | weight quantization | |
Park_KT_task1_3 | Kim2023 | 26 | 29.25 | 54.2 | 2.230 | 19556096 | 250000 | 92070 | 92070 | 0.0 | weight quantization | |
Park_KT_task1_4 | Kim2023 | 20 | 27.00 | 49.2 | 1.510 | 617000 | 30000 | 20516 | 20516 | 0.0 | weight quantization | |
Schmid_CPJKU_task1_1 | Schmid2023 | 5 | 13.75 | 54.4 | 1.313 | 1582336 | 5722 | 5722 | 5722 | 0.0 | knowledge distillation, weight quantization | |
Schmid_CPJKU_task1_2 | Schmid2023 | 1 | 5.25 | 58.7 | 1.256 | 4354304 | 12310 | 12310 | 12310 | 0.0 | knowledge distillation, weight quantization | |
Schmid_CPJKU_task1_3 | Schmid2023 | 2 | 7.00 | 61.4 | 1.153 | 9638144 | 30106 | 30106 | 30106 | 0.0 | knowledge distillation, weight quantization | |
Schmid_CPJKU_task1_4 | Schmid2023 | 3 | 11.50 | 62.7 | 1.117 | 16803072 | 54182 | 54182 | 54182 | 0.0 | knowledge distillation, weight quantization, structured pruning | |
Schmidt_FAU_task1_1 | Schmidt2023 | 26 | 29.25 | 55.7 | 1.322 | 28931380 | 127988 | 127988 | 127988 | 0.0 | weight quantization, pruning | |
Schmidt_FAU_task1_2 | Schmidt2023 | 12 | 21.50 | 55.6 | 1.337 | 19910080 | 68456 | 68456 | 68456 | 0.0 | weight quantization, pruning | |
Schmidt_FAU_task1_3 | Schmidt2023 | 19 | 26.75 | 52.7 | 1.398 | 9996775 | 74700 | 74700 | 74700 | 0.0 | weight quantization, pruning | |
Schmidt_FAU_task1_4 | Schmidt2023 | 24 | 28.75 | 49.7 | 1.482 | 4938255 | 34616 | 34616 | 34616 | 0.0 | weight quantization, pruning | |
Tan_NTU_task1_1 | Tan2023 | 33 | 33.50 | 47.1 | 1.508 | 2960384 | 40960 | 37434 | 37306 | 0.0034193513917828433 | weight quantization | |
Tan_NTU_task1_2 | Tan2023 | 35 | 34.00 | 48.5 | 1.461 | 6462656 | 54304 | 54242 | 54098 | 0.002654769366911225 | weight quantization | |
Tan_NTU_task1_3 | Tan2023 | 37 | 37.00 | 46.3 | 1.492 | 6462656 | 54304 | 54242 | 54098 | 0.002654769366911225 | weight quantization | |
Tan_SCUT_task1_1 | Tan2023a | 4 | 13.50 | 60.8 | 1.192 | 13180000 | 73386 | 73386 | 73386 | 0.0 | weight quantization | |
Tan_SCUT_task1_2 | Tan2023a | 30 | 31.00 | 51.7 | 1.444 | 13180000 | 73386 | 73386 | 73386 | 0.0 | weight quantization | |
Tan_SCUT_task1_3 | Tan2023a | 16 | 25.50 | 53.5 | 1.441 | 13180000 | 73386 | 73386 | 73386 | 0.0 | weight quantization | |
Tan_SCUT_task1_4 | Tan2023a | 32 | 33.00 | 50.9 | 1.525 | 13180000 | 73386 | 73386 | 73386 | 0.0 | weight quantization | |
Vo_DU_task1_1 | Vo2023 | 52 | 46.75 | 45.0 | 2.157 | 15600000 | 503316 | 119526 | 119526 | 0.0 | weight quantization | |
Vo_DU_task1_2 | Vo2023 | 53 | 47.75 | 44.8 | 2.116 | 15600000 | 503316 | 119526 | 119526 | 0.0 | weight quantization | |
Vo_DU_task1_3 | Vo2023 | 50 | 46.25 | 45.2 | 2.092 | 15600000 | 503316 | 119526 | 119526 | 0.0 | weight quantization | |
Vo_DU_task1_4 | Vo2023 | 49 | 45.75 | 45.5 | 1.793 | 15600000 | 503316 | 119526 | 119526 | 0.0 | weight quantization | |
Wang_SCUT_task1_1 | Wang2023 | 31 | 31.50 | 49.1 | 1.493 | 8646000 | 62080 | 45164 | 45164 | 0.0 | weight quantization | |
Wang_SCUT_task1_2 | Wang2023 | 18 | 26.50 | 52.9 | 1.348 | 16746000 | 81150 | 56172 | 56172 | 0.0 | weight quantization | |
Wang_SCUT_task1_3 | Wang2023 | 46 | 43.50 | 47.1 | 1.702 | 25442000 | 82280 | 56556 | 56556 | 0.0 | weight quantization | |
Wang_SCUT_task1_4 | Wang2023 | 48 | 45.00 | 48.5 | 1.472 | 20902000 | 148618 | 121812 | 121812 | 0.0 | weight quantization | |
XuQianHu_BIT&NUDT_task1_1 | Yu2023 | 45 | 40.50 | 51.0 | 1.364 | 23803968 | 125885 | 52288 | 52288 | 0.0 | weight quantization | |
XuQianHu_BIT&NUDT_task1_2 | Yu2023 | 44 | 40.00 | 51.6 | 1.355 | 28400320 | 123654 | 51648 | 51648 | 0.0 | weight quantization | |
XuQianHu_BIT&NUDT_task1_3 | Yu2023 | 41 | 39.25 | 50.0 | 1.395 | 13402688 | 125650 | 57392 | 57392 | 0.0 | weight quantization | |
XuQianHu_BIT&NUDT_task1_4 | Yu2023 | 36 | 35.50 | 51.1 | 1.367 | 11878580 | 125057 | 66114 | 66114 | 0.0 | weight quantization | |
Yang_GZHU_task1_1 | Weng2023 | 15 | 24.75 | 55.5 | 1.280 | 23970000 | 76906 | 76906 | 76906 | 0.0 | weight quantization | |
Yang_GZHU_task1_2 | Weng2023 | 14 | 22.25 | 55.9 | 1.241 | 23970000 | 76906 | 76906 | 76906 | 0.0 | weight quantization | |
Yang_GZHU_task1_3 | Weng2023 | 21 | 27.25 | 55.1 | 1.279 | 23970000 | 76906 | 76906 | 76906 | 0.0 | weight quantization | |
Yang_GZHU_task1_4 | Weng2023 | 19 | 26.75 | 55.2 | 1.259 | 23970000 | 76906 | 76906 | 76906 | 0.0 | weight quantization | |
Zhang_NCUT_task1_1 | Zhang2023 | 49 | 45.75 | 43.3 | 1.757 | 7375000 | 574464 | 123648 | 123648 | 0.0 | weight quantization | |
Zhang_NCUT_task1_2 | Zhang2023 | 51 | 46.50 | 47.9 | 1.533 | 28461000 | 1622016 | 76224 | 76224 | 0.0 | weight quantization | |
Zhang_SATLab_task1_1 | Bing2023 | 26 | 29.25 | 48.8 | 3.248 | 3972096 | 7946 | 7946 | 7434 | 0.06443493581676318 | weight quantization | |
Zhang_SATLab_task1_2 | Bing2023 | 34 | 33.75 | 50.0 | 4.213 | 19466240 | 46232 | 46232 | 45996 | 0.005104689392628536 | pruning, weight quantization | |
Zhang_SATLab_task1_3 | Bing2023 | 27 | 30.00 | 51.9 | 1.704 | 23438336 | 54178 | 54178 | 53430 | 0.013806342057661736 | pruning, weight quantization | |
Zhang_SATLab_task1_4 | Bing2023 | 22 | 27.50 | 50.3 | 1.542 | 7944192 | 15892 | 15892 | 14868 | 0.06443493581676318 | weight quantization |
Energy Consumption
Submission information | Evaluation dataset | Acoustic model | Normalized energy consumption | Energy consumption (kWh) | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Rank | Submission label |
Technical Report |
Official system rank |
Rank value | Accuracy (Eval) | MACS | Memory use | Training | Inference | Training | Inference |
AI4EDGE_IPL_task1_1 | Almeida2023 | 28 | 30.25 | 51.9 | 25475456 | 62720 | 0.7283 | 0.7906 | 0.2540 | 0.2340 | |
AI4EDGE_IPL_task1_2 | Almeida2023 | 47 | 43.75 | 48.8 | 29304736 | 53760 | 0.6146 | 0.6401 | 0.3010 | 0.2890 | |
AI4EDGE_IPL_task1_3 | Almeida2023 | 38 | 37.25 | 50.8 | 26711936 | 49280 | 0.6336 | 0.6655 | 0.2920 | 0.2780 | |
Bai_JLESS_task1_1 | Du2023 | 54 | 50.25 | 47.9 | 27931612 | 78252 | 2.5391 | 2.3548 | 0.1150 | 0.1240 | |
Bai_JLESS_task1_2 | Du2023 | 47 | 43.75 | 48.8 | 14130372 | 60458 | 2.5614 | 2.4132 | 0.1140 | 0.1210 | |
Cai_TENCENT_task1_1 | Cai2023 | 17 | 25.75 | 56.6 | 28840396 | 127684 | 0.0962 | 0.2854 | 0.8510 | 0.2870 | |
Cai_TENCENT_task1_2 | Cai2023 | 11 | 21.25 | 56.2 | 21990724 | 79942 | 0.1091 | 0.2925 | 0.7510 | 0.2800 | |
Cai_TENCENT_task1_3 | Cai2023 | 14 | 22.25 | 55.8 | 21990724 | 79942 | 0.1091 | 0.2925 | 0.7510 | 0.2800 | |
Cai_TENCENT_task1_4 | Cai2023 | 11 | 21.25 | 55.4 | 19533124 | 63558 | 0.1185 | 0.3302 | 0.6910 | 0.2480 | |
Cai_XJTLU_task1_1 | Cai2023a | 9 | 19.75 | 51.9 | 1649349 | 6828 | 0.0010 | 0.2500 | 2.0300 | 0.0080 | |
Cai_XJTLU_task1_2 | Cai2023a | 8 | 18.25 | 52.5 | 1649349 | 6828 | 0.0013 | 0.2500 | 1.5910 | 0.0080 | |
Cai_XJTLU_task1_3 | Cai2023a | 6 | 14.00 | 55.1 | 3424245 | 15890 | 0.0012 | 0.1250 | 1.7280 | 0.0160 | |
Cai_XJTLU_task1_4 | Cai2023a | 3 | 11.50 | 57.0 | 10219540 | 54260 | 0.0011 | 0.0952 | 1.7550 | 0.0210 | |
Fei_vv_task1_1 | Fei2023 | 15 | 24.75 | 55.2 | 13402932 | 123636 | 0.6742 | 8.0000 | 1.7800 | 0.1500 | |
Fei_vv_task1_2 | Fei2023 | 13 | 21.75 | 54.5 | 7802348 | 70588 | 0.7595 | 8.0000 | 1.5800 | 0.1500 | |
Fei_vv_task1_3 | Fei2023 | 26 | 29.25 | 53.2 | 13402932 | 123636 | 0.6742 | 8.0000 | 1.7800 | 0.1500 | |
Fei_vv_task1_4 | Fei2023 | 23 | 28.25 | 51.8 | 7802348 | 70588 | 0.7595 | 8.0000 | 1.5800 | 0.1500 | |
Han_SZU_task1_1 | Han2023 | 39 | 37.75 | 50.5 | 29.349M | 80845 | 1.0845 | 308.0000 | 0.2840 | 0.0010 | |
LAM_AEV_task1_1 | Pham2023 | 25 | 29.00 | 55.3 | 29267550 | 88704 | 0.0276 | 0.5152 | 82.7400 | 4.4270 | |
LAM_AEV_task1_2 | Pham2023 | 31 | 31.50 | 55.0 | 29267550 | 88704 | 0.0138 | 0.5152 | 164.7400 | 4.4270 | |
LAM_AEV_task1_3 | Pham2023 | 20 | 27.00 | 55.6 | 29267550 | 88704 | 0.0138 | 0.5152 | 164.7400 | 4.4270 | |
Liang_NTES_task1_1 | Liang2023 | 40 | 38.50 | 52.6 | 29591778 | 143345 | 5.3234 | 0.1670 | |||
MALACH23_JKU_task1_1 | Pichler2023 | 8 | 18.25 | 57.0 | 14686940 | 119608 | 0.0152 | 0.9375 | 2.9640 | 0.0480 | |
MALACH23_JKU_task1_2 | Pichler2023 | 7 | 16.75 | 56.6 | 10819292 | 87160 | 0.0185 | 0.9574 | 2.4310 | 0.0470 | |
MALACH23_JKU_task1_3 | Pichler2023 | 42 | 39.50 | 9.9 | 572340 | 116008 | 0.0507 | 5.7273 | 1.2420 | 0.0110 | |
MALACH23_JKU_task1_4 | Pichler2023 | 43 | 39.75 | 9.8 | 214420 | 63592 | 0.0098 | 1.1538 | 1.5300 | 0.0130 | |
DCASE2023 baseline | 52 | 46.75 | 44.8 | 29234920 | 65280 | 0.9669 | 1.0000 | 0.3020 | 0.2920 | ||
Park_KT_task1_1 | Kim2023 | 10 | 20.75 | 56.3 | 19556096 | 250000 | 4.4282 | 56.5613 | 0.0530 | 0.0042 | |
Park_KT_task1_2 | Kim2023 | 29 | 30.75 | 53.0 | 19556096 | 250000 | 4.4282 | 56.5613 | 0.0530 | 0.0042 | |
Park_KT_task1_3 | Kim2023 | 26 | 29.25 | 54.2 | 19556096 | 250000 | 4.4282 | 56.5613 | 0.0530 | 0.0042 | |
Park_KT_task1_4 | Kim2023 | 20 | 27.00 | 49.2 | 617000 | 30000 | 248.0005 | 1287.1071 | 0.0009 | 0.0002 | |
Schmid_CPJKU_task1_1 | Schmid2023 | 5 | 13.75 | 54.4 | 1582336 | 5722 | 0.1249 | 7.1515 | 1.8890 | 0.0330 | |
Schmid_CPJKU_task1_2 | Schmid2023 | 1 | 5.25 | 58.7 | 4354304 | 12310 | 0.1204 | 6.7429 | 1.9600 | 0.0350 | |
Schmid_CPJKU_task1_3 | Schmid2023 | 2 | 7.00 | 61.4 | 9638144 | 30106 | 0.1111 | 6.3784 | 2.1250 | 0.0370 | |
Schmid_CPJKU_task1_4 | Schmid2023 | 3 | 11.50 | 62.7 | 16803072 | 54182 | 0.0663 | 6.5556 | 3.5600 | 0.0360 | |
Schmidt_FAU_task1_1 | Schmidt2023 | 26 | 29.25 | 55.7 | 28931380 | 127988 | 0.0678 | 50.1029 | 5.2587 | 0.0071 | |
Schmidt_FAU_task1_2 | Schmidt2023 | 12 | 21.50 | 55.6 | 19910080 | 68456 | 0.0747 | 49.8043 | 4.7693 | 0.0072 | |
Schmidt_FAU_task1_3 | Schmidt2023 | 19 | 26.75 | 52.7 | 9996775 | 74700 | 0.0745 | 44.6759 | 4.7830 | 0.0080 | |
Schmidt_FAU_task1_4 | Schmidt2023 | 24 | 28.75 | 49.7 | 4938255 | 34616 | 0.0726 | 53.7101 | 4.9099 | 0.0066 | |
Tan_NTU_task1_1 | Tan2023 | 33 | 33.50 | 47.1 | 2960384 | 40960 | 1.7114 | 8.8426 | 0.2692 | 0.0521 | |
Tan_NTU_task1_2 | Tan2023 | 35 | 34.00 | 48.5 | 6462656 | 54304 | 1.0784 | 4.3258 | 0.4272 | 0.1065 | |
Tan_NTU_task1_3 | Tan2023 | 37 | 37.00 | 46.3 | 6462656 | 54304 | 0.8352 | 4.3340 | 0.5516 | 0.1063 | |
Tan_SCUT_task1_1 | Tan2023a | 4 | 13.50 | 60.8 | 13180000 | 73386 | 0.0543 | 7.6842 | 5.3790 | 0.0380 | |
Tan_SCUT_task1_2 | Tan2023a | 30 | 31.00 | 51.7 | 13180000 | 73386 | 0.1614 | 9.6689 | 1.8090 | 0.0302 | |
Tan_SCUT_task1_3 | Tan2023a | 16 | 25.50 | 53.5 | 13180000 | 73386 | 0.0553 | 8.5882 | 5.2810 | 0.0340 | |
Tan_SCUT_task1_4 | Tan2023a | 32 | 33.00 | 50.9 | 13180000 | 73386 | 0.1614 | 9.1250 | 1.8090 | 0.0320 | |
Vo_DU_task1_1 | Vo2023 | 52 | 46.75 | 45.0 | 15600000 | 503316 | 0.0064 | 0.6503 | 3.3220 | 0.0326 | |
Vo_DU_task1_2 | Vo2023 | 53 | 47.75 | 44.8 | 15600000 | 503316 | 0.0068 | 0.9021 | 3.1220 | 0.0235 | |
Vo_DU_task1_3 | Vo2023 | 50 | 46.25 | 45.2 | 15600000 | 503316 | 0.0071 | 1.8120 | 2.9780 | 0.0117 | |
Vo_DU_task1_4 | Vo2023 | 49 | 45.75 | 45.5 | 15600000 | 503316 | 0.0121 | 17.6667 | 1.7560 | 0.0012 | |
Wang_SCUT_task1_1 | Wang2023 | 31 | 31.50 | 49.1 | 8646000 | 62080 | 1.0735 | 1.1542 | 0.2720 | 0.2530 | |
Wang_SCUT_task1_2 | Wang2023 | 18 | 26.50 | 52.9 | 16746000 | 81150 | 0.7449 | 0.7871 | 0.3920 | 0.3710 | |
Wang_SCUT_task1_3 | Wang2023 | 46 | 43.50 | 47.1 | 25442000 | 82280 | 0.8044 | 0.8439 | 0.3630 | 0.3460 | |
Wang_SCUT_task1_4 | Wang2023 | 48 | 45.00 | 48.5 | 20902000 | 148618 | 0.6173 | 0.8957 | 0.4730 | 0.3260 | |
XuQianHu_BIT&NUDT_task1_1 | Yu2023 | 45 | 40.50 | 51.0 | 23803968 | 125885 | 0.0622 | 2.0603 | 1.3576 | 0.0410 | |
XuQianHu_BIT&NUDT_task1_2 | Yu2023 | 44 | 40.00 | 51.6 | 28400320 | 123654 | 0.0504 | 2.0093 | 1.6751 | 0.0420 | |
XuQianHu_BIT&NUDT_task1_3 | Yu2023 | 41 | 39.25 | 50.0 | 13402688 | 125650 | 0.0439 | 2.2005 | 1.9233 | 0.0384 | |
XuQianHu_BIT&NUDT_task1_4 | Yu2023 | 36 | 35.50 | 51.1 | 11878580 | 125057 | 0.0439 | 2.2438 | 1.9208 | 0.0376 | |
Yang_GZHU_task1_1 | Weng2023 | 15 | 24.75 | 55.5 | 23970000 | 76906 | 0.0615 | 31.4878 | 21.0000 | 0.0410 | |
Yang_GZHU_task1_2 | Weng2023 | 14 | 22.25 | 55.9 | 23970000 | 76906 | 0.0615 | 5.3792 | 21.0000 | 0.2400 | |
Yang_GZHU_task1_3 | Weng2023 | 21 | 27.25 | 55.1 | 23970000 | 76906 | 0.0759 | 31.4878 | 17.0000 | 0.0410 | |
Yang_GZHU_task1_4 | Weng2023 | 19 | 26.75 | 55.2 | 23970000 | 76906 | 0.0759 | 5.3792 | 17.0000 | 0.2400 | |
Zhang_NCUT_task1_1 | Zhang2023 | 49 | 45.75 | 43.3 | 7375000 | 574464 | 0.8202 | 18.8125 | 0.3670 | 0.0160 | |
Zhang_NCUT_task1_2 | Zhang2023 | 51 | 46.50 | 47.9 | 28461000 | 1622016 | 0.7679 | 6.5435 | 0.3920 | 0.0460 | |
Zhang_SATLab_task1_1 | Bing2023 | 26 | 29.25 | 48.8 | 3972096 | 7946 | 1.8354 | 5.8000 | 0.0790 | 0.0250 | |
Zhang_SATLab_task1_2 | Bing2023 | 34 | 33.75 | 50.0 | 19466240 | 46232 | 0.2757 | 0.5142 | 0.5260 | 0.2820 | |
Zhang_SATLab_task1_3 | Bing2023 | 27 | 30.00 | 51.9 | 23438336 | 54178 | 0.9477 | 3.1522 | 0.1530 | 0.0460 | |
Zhang_SATLab_task1_4 | Bing2023 | 22 | 27.50 | 50.3 | 7944192 | 15892 | 0.2397 | 0.4723 | 0.6050 | 0.3070 |
Generalization performance
All results with evaluation dataset.
Submission information | Overall | Devices | Cities | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Evaluation dataset | Unseen | Seen | Unseen | Seen | ||||||||||
Rank | Submission label |
Technical Report |
Official system rank |
Rank value | Accuracy (Evaluation dataset) | Logloss (Evaluation dataset) |
Accuracy / unseen devices (Evaluation dataset) |
Logloss / unseen devices (Evaluation dataset) |
Accuracy / seen devices (Evaluation dataset) |
Logloss / seen devices (Evaluation dataset) |
Accuracy / unseen cities (Evaluation dataset) |
Logloss / unseen cities (Evaluation dataset) |
Accuracy / seen cities (Evaluation dataset) |
Logloss / seen cities (Evaluation dataset) |
AI4EDGE_IPL_task1_1 | Almeida2023 | 28 | 30.25 | 51.9 | 1.920 | 47.0 | 2.370 | 56.1 | 1.544 | 51.6 | 1.931 | 52.3 | 1.905 | |
AI4EDGE_IPL_task1_2 | Almeida2023 | 47 | 43.75 | 48.8 | 1.996 | 42.4 | 2.391 | 54.1 | 1.667 | 49.2 | 1.847 | 48.8 | 2.005 | |
AI4EDGE_IPL_task1_3 | Almeida2023 | 38 | 37.25 | 50.8 | 1.364 | 45.3 | 1.528 | 55.4 | 1.227 | 52.5 | 1.302 | 50.7 | 1.372 | |
Bai_JLESS_task1_1 | Du2023 | 54 | 50.25 | 47.9 | 1.825 | 39.5 | 2.335 | 55.0 | 1.399 | 44.2 | 1.944 | 48.9 | 1.799 | |
Bai_JLESS_task1_2 | Du2023 | 47 | 43.75 | 48.8 | 1.791 | 40.6 | 2.225 | 55.6 | 1.430 | 44.0 | 1.817 | 50.0 | 1.794 | |
Cai_TENCENT_task1_1 | Cai2023 | 17 | 25.75 | 56.6 | 1.174 | 50.7 | 1.347 | 61.5 | 1.029 | 54.9 | 1.202 | 57.1 | 1.167 | |
Cai_TENCENT_task1_2 | Cai2023 | 11 | 21.25 | 56.2 | 1.246 | 51.8 | 1.369 | 59.8 | 1.143 | 55.1 | 1.275 | 56.5 | 1.242 | |
Cai_TENCENT_task1_3 | Cai2023 | 14 | 22.25 | 55.8 | 1.241 | 51.9 | 1.356 | 59.1 | 1.144 | 55.3 | 1.255 | 56.0 | 1.241 | |
Cai_TENCENT_task1_4 | Cai2023 | 11 | 21.25 | 55.4 | 1.252 | 51.3 | 1.358 | 58.7 | 1.164 | 55.7 | 1.259 | 55.3 | 1.255 | |
Cai_XJTLU_task1_1 | Cai2023a | 9 | 19.75 | 51.9 | 1.307 | 49.1 | 1.390 | 54.2 | 1.237 | 50.9 | 1.325 | 52.1 | 1.306 | |
Cai_XJTLU_task1_2 | Cai2023a | 8 | 18.25 | 52.5 | 1.292 | 49.7 | 1.389 | 55.0 | 1.211 | 53.5 | 1.285 | 52.5 | 1.298 | |
Cai_XJTLU_task1_3 | Cai2023a | 6 | 14.00 | 55.1 | 1.223 | 51.6 | 1.331 | 58.0 | 1.133 | 56.2 | 1.218 | 55.0 | 1.226 | |
Cai_XJTLU_task1_4 | Cai2023a | 3 | 11.50 | 57.0 | 1.241 | 53.5 | 1.385 | 59.9 | 1.121 | 56.8 | 1.259 | 57.5 | 1.234 | |
Fei_vv_task1_1 | Fei2023 | 15 | 24.75 | 55.2 | 1.282 | 51.7 | 1.393 | 58.2 | 1.190 | 54.0 | 1.298 | 55.8 | 1.274 | |
Fei_vv_task1_2 | Fei2023 | 13 | 21.75 | 54.5 | 1.290 | 50.9 | 1.410 | 57.5 | 1.191 | 52.9 | 1.296 | 55.2 | 1.281 | |
Fei_vv_task1_3 | Fei2023 | 26 | 29.25 | 53.2 | 1.349 | 49.9 | 1.457 | 56.1 | 1.259 | 51.5 | 1.373 | 54.1 | 1.337 | |
Fei_vv_task1_4 | Fei2023 | 23 | 28.25 | 51.8 | 1.370 | 48.8 | 1.460 | 54.2 | 1.295 | 49.9 | 1.389 | 52.6 | 1.359 | |
Han_SZU_task1_1 | Han2023 | 39 | 37.75 | 50.5 | 2.012 | 44.9 | 2.565 | 55.2 | 1.550 | 49.2 | 1.966 | 51.1 | 2.020 | |
LAM_AEV_task1_1 | Pham2023 | 25 | 29.00 | 55.3 | 1.847 | 49.6 | 2.329 | 60.1 | 1.445 | 55.4 | 1.884 | 55.6 | 1.849 | |
LAM_AEV_task1_2 | Pham2023 | 31 | 31.50 | 55.0 | 2.083 | 48.9 | 2.681 | 60.1 | 1.584 | 54.1 | 2.134 | 55.5 | 2.074 | |
LAM_AEV_task1_3 | Pham2023 | 20 | 27.00 | 55.6 | 1.933 | 49.6 | 2.447 | 60.5 | 1.504 | 54.9 | 1.937 | 56.0 | 1.938 | |
Liang_NTES_task1_1 | Liang2023 | 40 | 38.50 | 52.6 | 1.402 | 44.9 | 1.704 | 58.9 | 1.150 | 51.3 | 1.408 | 53.0 | 1.397 | |
MALACH23_JKU_task1_1 | Pichler2023 | 8 | 18.25 | 57.0 | 1.230 | 52.3 | 1.349 | 60.9 | 1.130 | 56.1 | 1.239 | 57.4 | 1.227 | |
MALACH23_JKU_task1_2 | Pichler2023 | 7 | 16.75 | 56.6 | 1.242 | 51.9 | 1.353 | 60.4 | 1.150 | 55.5 | 1.265 | 57.1 | 1.235 | |
MALACH23_JKU_task1_3 | Pichler2023 | 42 | 39.50 | 9.9 | 4.354 | 10.0 | 4.358 | 9.8 | 4.350 | 9.9 | 4.360 | 9.9 | 4.349 | |
MALACH23_JKU_task1_4 | Pichler2023 | 43 | 39.75 | 9.8 | 3.224 | 9.8 | 3.228 | 9.9 | 3.221 | 10.0 | 3.225 | 9.8 | 3.223 | |
DCASE2023 baseline | 52 | 46.75 | 44.8 | 1.523 | 38.0 | 1.730 | 50.5 | 1.351 | 42.4 | 1.563 | 45.6 | 1.513 | ||
Park_KT_task1_1 | Kim2023 | 10 | 20.75 | 56.3 | 1.495 | 49.9 | 1.827 | 61.7 | 1.218 | 54.4 | 1.543 | 57.1 | 1.471 | |
Park_KT_task1_2 | Kim2023 | 29 | 30.75 | 53.0 | 1.660 | 46.2 | 2.056 | 58.7 | 1.329 | 51.7 | 1.720 | 53.9 | 1.629 | |
Park_KT_task1_3 | Kim2023 | 26 | 29.25 | 54.2 | 2.230 | 47.1 | 2.840 | 60.2 | 1.721 | 53.6 | 2.240 | 54.7 | 2.212 | |
Park_KT_task1_4 | Kim2023 | 20 | 27.00 | 49.2 | 1.510 | 45.8 | 1.573 | 52.0 | 1.457 | 49.2 | 1.480 | 49.4 | 1.512 | |
Schmid_CPJKU_task1_1 | Schmid2023 | 5 | 13.75 | 54.4 | 1.313 | 52.3 | 1.395 | 56.1 | 1.244 | 53.7 | 1.318 | 54.5 | 1.317 | |
Schmid_CPJKU_task1_2 | Schmid2023 | 1 | 5.25 | 58.7 | 1.256 | 54.6 | 1.445 | 62.1 | 1.099 | 59.4 | 1.215 | 58.8 | 1.264 | |
Schmid_CPJKU_task1_3 | Schmid2023 | 2 | 7.00 | 61.4 | 1.153 | 57.8 | 1.318 | 64.4 | 1.015 | 60.8 | 1.133 | 61.8 | 1.156 | |
Schmid_CPJKU_task1_4 | Schmid2023 | 3 | 11.50 | 62.7 | 1.117 | 57.4 | 1.334 | 67.1 | 0.936 | 61.2 | 1.152 | 63.4 | 1.107 | |
Schmidt_FAU_task1_1 | Schmidt2023 | 26 | 29.25 | 55.7 | 1.322 | 51.0 | 1.465 | 59.6 | 1.203 | 55.3 | 1.336 | 56.0 | 1.322 | |
Schmidt_FAU_task1_2 | Schmidt2023 | 12 | 21.50 | 55.6 | 1.337 | 50.8 | 1.488 | 59.5 | 1.210 | 55.6 | 1.347 | 55.8 | 1.336 | |
Schmidt_FAU_task1_3 | Schmidt2023 | 19 | 26.75 | 52.7 | 1.398 | 47.6 | 1.562 | 57.0 | 1.261 | 52.3 | 1.408 | 52.9 | 1.396 | |
Schmidt_FAU_task1_4 | Schmidt2023 | 24 | 28.75 | 49.7 | 1.482 | 44.8 | 1.650 | 53.8 | 1.342 | 50.0 | 1.489 | 50.1 | 1.477 | |
Tan_NTU_task1_1 | Tan2023 | 33 | 33.50 | 47.1 | 1.508 | 39.7 | 1.720 | 53.3 | 1.331 | 45.8 | 1.509 | 47.6 | 1.503 | |
Tan_NTU_task1_2 | Tan2023 | 35 | 34.00 | 48.5 | 1.461 | 41.4 | 1.634 | 54.4 | 1.317 | 47.4 | 1.466 | 49.2 | 1.454 | |
Tan_NTU_task1_3 | Tan2023 | 37 | 37.00 | 46.3 | 1.492 | 40.1 | 1.632 | 51.5 | 1.375 | 44.2 | 1.517 | 47.1 | 1.482 | |
Tan_SCUT_task1_1 | Tan2023a | 4 | 13.50 | 60.8 | 1.192 | 55.7 | 1.420 | 65.1 | 1.002 | 59.3 | 1.249 | 61.5 | 1.180 | |
Tan_SCUT_task1_2 | Tan2023a | 30 | 31.00 | 51.7 | 1.444 | 45.8 | 1.726 | 56.6 | 1.208 | 50.9 | 1.481 | 52.1 | 1.434 | |
Tan_SCUT_task1_3 | Tan2023a | 16 | 25.50 | 53.5 | 1.441 | 48.6 | 1.675 | 57.5 | 1.247 | 52.9 | 1.487 | 53.9 | 1.431 | |
Tan_SCUT_task1_4 | Tan2023a | 32 | 33.00 | 50.9 | 1.525 | 45.5 | 1.843 | 55.4 | 1.259 | 51.1 | 1.542 | 51.1 | 1.519 | |
Vo_DU_task1_1 | Vo2023 | 52 | 46.75 | 45.0 | 2.157 | 36.2 | 2.963 | 52.3 | 1.485 | 42.6 | 2.334 | 45.8 | 2.111 | |
Vo_DU_task1_2 | Vo2023 | 53 | 47.75 | 44.8 | 2.116 | 36.3 | 2.877 | 51.9 | 1.482 | 42.3 | 2.289 | 45.7 | 2.069 | |
Vo_DU_task1_3 | Vo2023 | 50 | 46.25 | 45.2 | 2.092 | 36.6 | 2.827 | 52.4 | 1.479 | 43.0 | 2.286 | 46.2 | 2.056 | |
Vo_DU_task1_4 | Vo2023 | 49 | 45.75 | 45.5 | 1.793 | 38.0 | 2.169 | 51.8 | 1.479 | 43.7 | 1.793 | 46.3 | 1.771 | |
Wang_SCUT_task1_1 | Wang2023 | 31 | 31.50 | 49.1 | 1.493 | 41.8 | 1.823 | 55.3 | 1.218 | 47.5 | 1.530 | 49.8 | 1.478 | |
Wang_SCUT_task1_2 | Wang2023 | 18 | 26.50 | 52.9 | 1.348 | 47.3 | 1.510 | 57.5 | 1.214 | 52.0 | 1.375 | 53.4 | 1.339 | |
Wang_SCUT_task1_3 | Wang2023 | 46 | 43.50 | 47.1 | 1.702 | 38.4 | 2.228 | 54.4 | 1.263 | 45.2 | 1.830 | 47.7 | 1.680 | |
Wang_SCUT_task1_4 | Wang2023 | 48 | 45.00 | 48.5 | 1.472 | 42.9 | 1.654 | 53.1 | 1.321 | 45.1 | 1.556 | 49.4 | 1.455 | |
XuQianHu_BIT&NUDT_task1_1 | Yu2023 | 45 | 40.50 | 51.0 | 1.364 | 47.5 | 1.450 | 53.9 | 1.292 | 50.3 | 1.377 | 51.0 | 1.365 | |
XuQianHu_BIT&NUDT_task1_2 | Yu2023 | 44 | 40.00 | 51.6 | 1.355 | 47.4 | 1.455 | 55.1 | 1.271 | 51.0 | 1.369 | 51.6 | 1.357 | |
XuQianHu_BIT&NUDT_task1_3 | Yu2023 | 41 | 39.25 | 50.0 | 1.395 | 45.5 | 1.509 | 53.8 | 1.301 | 49.5 | 1.408 | 50.2 | 1.396 | |
XuQianHu_BIT&NUDT_task1_4 | Yu2023 | 36 | 35.50 | 51.1 | 1.367 | 47.6 | 1.450 | 54.1 | 1.298 | 49.4 | 1.400 | 51.6 | 1.363 | |
Yang_GZHU_task1_1 | Weng2023 | 15 | 24.75 | 55.5 | 1.280 | 52.6 | 1.409 | 58.0 | 1.172 | 55.6 | 1.278 | 55.9 | 1.276 | |
Yang_GZHU_task1_2 | Weng2023 | 14 | 22.25 | 55.9 | 1.241 | 53.2 | 1.352 | 58.2 | 1.149 | 56.0 | 1.242 | 56.3 | 1.237 | |
Yang_GZHU_task1_3 | Weng2023 | 21 | 27.25 | 55.1 | 1.279 | 52.1 | 1.406 | 57.6 | 1.172 | 56.7 | 1.253 | 55.2 | 1.283 | |
Yang_GZHU_task1_4 | Weng2023 | 19 | 26.75 | 55.2 | 1.259 | 52.4 | 1.364 | 57.5 | 1.172 | 56.8 | 1.226 | 55.2 | 1.263 | |
Zhang_NCUT_task1_1 | Zhang2023 | 49 | 45.75 | 43.3 | 1.757 | 33.3 | 2.083 | 51.6 | 1.485 | 40.8 | 1.831 | 44.0 | 1.736 | |
Zhang_NCUT_task1_2 | Zhang2023 | 51 | 46.50 | 47.9 | 1.533 | 39.7 | 1.813 | 54.7 | 1.300 | 46.4 | 1.570 | 48.2 | 1.530 | |
Zhang_SATLab_task1_1 | Bing2023 | 26 | 29.25 | 48.8 | 3.248 | 39.2 | 5.085 | 56.8 | 1.717 | 48.3 | 3.481 | 49.2 | 3.229 | |
Zhang_SATLab_task1_2 | Bing2023 | 34 | 33.75 | 50.0 | 4.213 | 40.6 | 6.653 | 57.7 | 2.179 | 48.7 | 4.433 | 50.4 | 4.145 | |
Zhang_SATLab_task1_3 | Bing2023 | 27 | 30.00 | 51.9 | 1.704 | 41.8 | 2.237 | 60.3 | 1.261 | 50.7 | 1.823 | 52.4 | 1.679 | |
Zhang_SATLab_task1_4 | Bing2023 | 22 | 27.50 | 50.3 | 1.542 | 41.6 | 1.802 | 57.6 | 1.325 | 49.8 | 1.585 | 50.7 | 1.532 |
Class-wise performance
Accuracy
Rank | Submission label |
Technical Report |
Official system rank |
Rank value | Accuracy | Airport | Bus | Metro |
Metro station |
Park |
Public square |
Shopping mall |
Street pedestrian |
Street traffic |
Tram |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AI4EDGE_IPL_task1_1 | Almeida2023 | 28 | 30.25 | 51.9 | 45.9 | 72.9 | 46.4 | 44.6 | 73.2 | 26.7 | 59.7 | 27.9 | 57.3 | 64.7 | |
AI4EDGE_IPL_task1_2 | Almeida2023 | 47 | 43.75 | 48.8 | 40.8 | 72.2 | 51.1 | 40.0 | 73.1 | 21.3 | 43.6 | 31.8 | 59.7 | 54.5 | |
AI4EDGE_IPL_task1_3 | Almeida2023 | 38 | 37.25 | 50.8 | 49.4 | 67.4 | 52.5 | 38.1 | 82.0 | 28.4 | 39.2 | 31.3 | 56.5 | 63.2 | |
Bai_JLESS_task1_1 | Du2023 | 54 | 50.25 | 47.9 | 25.5 | 60.8 | 51.2 | 51.7 | 69.0 | 22.9 | 52.3 | 24.9 | 65.8 | 55.0 | |
Bai_JLESS_task1_2 | Du2023 | 47 | 43.75 | 48.8 | 29.4 | 67.1 | 49.8 | 44.5 | 70.1 | 25.9 | 51.9 | 22.9 | 70.2 | 56.2 | |
Cai_TENCENT_task1_1 | Cai2023 | 17 | 25.75 | 56.6 | 40.5 | 59.7 | 61.8 | 57.8 | 79.9 | 35.6 | 54.8 | 35.3 | 73.4 | 67.4 | |
Cai_TENCENT_task1_2 | Cai2023 | 11 | 21.25 | 56.2 | 34.4 | 64.5 | 58.6 | 62.0 | 83.4 | 36.0 | 56.7 | 31.1 | 68.8 | 66.3 | |
Cai_TENCENT_task1_3 | Cai2023 | 14 | 22.25 | 55.8 | 34.4 | 64.9 | 54.2 | 56.9 | 81.7 | 33.6 | 57.1 | 36.3 | 69.9 | 69.3 | |
Cai_TENCENT_task1_4 | Cai2023 | 11 | 21.25 | 55.4 | 34.6 | 60.5 | 55.1 | 58.6 | 78.9 | 35.5 | 54.0 | 36.1 | 71.9 | 68.5 | |
Cai_XJTLU_task1_1 | Cai2023a | 9 | 19.75 | 51.9 | 34.8 | 66.1 | 46.4 | 43.4 | 79.2 | 28.4 | 61.7 | 33.8 | 67.4 | 57.7 | |
Cai_XJTLU_task1_2 | Cai2023a | 8 | 18.25 | 52.5 | 31.6 | 66.0 | 50.0 | 48.1 | 78.7 | 31.1 | 61.4 | 33.6 | 66.4 | 58.5 | |
Cai_XJTLU_task1_3 | Cai2023a | 6 | 14.00 | 55.1 | 39.7 | 64.2 | 56.0 | 49.4 | 80.2 | 33.7 | 64.2 | 34.0 | 64.7 | 64.7 | |
Cai_XJTLU_task1_4 | Cai2023a | 3 | 11.50 | 57.0 | 44.7 | 68.7 | 57.6 | 52.5 | 83.0 | 36.8 | 52.4 | 34.5 | 70.5 | 69.4 | |
Fei_vv_task1_1 | Fei2023 | 15 | 24.75 | 55.2 | 45.6 | 71.9 | 58.1 | 45.8 | 76.3 | 31.0 | 59.7 | 31.7 | 67.0 | 65.3 | |
Fei_vv_task1_2 | Fei2023 | 13 | 21.75 | 54.5 | 37.1 | 73.4 | 54.7 | 45.1 | 77.7 | 28.1 | 60.4 | 36.6 | 67.5 | 64.4 | |
Fei_vv_task1_3 | Fei2023 | 26 | 29.25 | 53.2 | 57.9 | 84.5 | 54.4 | 42.5 | 73.1 | 24.9 | 54.2 | 27.8 | 62.0 | 51.3 | |
Fei_vv_task1_4 | Fei2023 | 23 | 28.25 | 51.8 | 56.6 | 83.2 | 54.8 | 38.8 | 79.2 | 21.2 | 59.5 | 18.2 | 61.7 | 44.4 | |
Han_SZU_task1_1 | Han2023 | 39 | 37.75 | 50.5 | 39.7 | 60.8 | 54.6 | 43.5 | 80.6 | 28.0 | 51.6 | 32.1 | 62.8 | 51.8 | |
LAM_AEV_task1_1 | Pham2023 | 25 | 29.00 | 55.3 | 50.1 | 80.2 | 52.2 | 40.9 | 74.9 | 43.8 | 57.8 | 23.6 | 69.4 | 60.4 | |
LAM_AEV_task1_2 | Pham2023 | 31 | 31.50 | 55.0 | 35.6 | 79.9 | 44.5 | 54.3 | 82.3 | 33.4 | 58.0 | 32.0 | 68.4 | 61.7 | |
LAM_AEV_task1_3 | Pham2023 | 20 | 27.00 | 55.6 | 47.7 | 77.1 | 48.6 | 52.8 | 78.1 | 38.0 | 56.8 | 27.0 | 67.0 | 62.9 | |
Liang_NTES_task1_1 | Liang2023 | 40 | 38.50 | 52.6 | 38.8 | 72.7 | 47.9 | 45.1 | 74.8 | 27.2 | 56.8 | 36.5 | 56.2 | 69.7 | |
MALACH23_JKU_task1_1 | Pichler2023 | 8 | 18.25 | 57.0 | 34.8 | 72.3 | 60.5 | 55.4 | 85.2 | 37.0 | 51.0 | 35.3 | 68.9 | 69.6 | |
MALACH23_JKU_task1_2 | Pichler2023 | 7 | 16.75 | 56.6 | 34.8 | 69.7 | 54.0 | 57.8 | 81.4 | 37.0 | 51.2 | 40.2 | 69.3 | 70.0 | |
MALACH23_JKU_task1_3 | Pichler2023 | 42 | 39.50 | 9.9 | 12.0 | 11.7 | 8.7 | 9.4 | 12.4 | 8.2 | 11.6 | 12.3 | 7.4 | 5.2 | |
MALACH23_JKU_task1_4 | Pichler2023 | 43 | 39.75 | 9.8 | 16.6 | 10.3 | 7.2 | 9.3 | 11.9 | 6.5 | 13.0 | 4.8 | 9.8 | 9.1 | |
DCASE2023 baseline | 52 | 46.75 | 44.8 | 40.5 | 37.8 | 48.8 | 38.9 | 58.2 | 22.4 | 54.1 | 29.8 | 58.0 | 59.7 | ||
Park_KT_task1_1 | Kim2023 | 10 | 20.75 | 56.3 | 36.2 | 80.8 | 59.8 | 55.1 | 70.1 | 29.6 | 52.6 | 37.4 | 69.5 | 72.1 | |
Park_KT_task1_2 | Kim2023 | 29 | 30.75 | 53.0 | 26.6 | 75.3 | 53.1 | 49.7 | 66.5 | 29.7 | 62.2 | 35.1 | 63.8 | 68.3 | |
Park_KT_task1_3 | Kim2023 | 26 | 29.25 | 54.2 | 38.1 | 78.5 | 57.0 | 52.5 | 73.1 | 29.4 | 45.4 | 33.3 | 67.9 | 67.1 | |
Park_KT_task1_4 | Kim2023 | 20 | 27.00 | 49.2 | 48.7 | 75.7 | 48.3 | 28.2 | 76.2 | 29.5 | 46.4 | 30.3 | 49.9 | 59.0 | |
Schmid_CPJKU_task1_1 | Schmid2023 | 5 | 13.75 | 54.4 | 56.4 | 72.1 | 58.0 | 44.2 | 82.9 | 32.1 | 58.7 | 14.7 | 70.2 | 54.3 | |
Schmid_CPJKU_task1_2 | Schmid2023 | 1 | 5.25 | 58.7 | 49.9 | 74.0 | 59.4 | 48.6 | 87.2 | 35.7 | 59.9 | 26.4 | 70.7 | 75.4 | |
Schmid_CPJKU_task1_3 | Schmid2023 | 2 | 7.00 | 61.4 | 50.4 | 78.0 | 65.6 | 60.3 | 84.7 | 35.1 | 65.7 | 33.6 | 69.6 | 71.3 | |
Schmid_CPJKU_task1_4 | Schmid2023 | 3 | 11.50 | 62.7 | 48.0 | 88.1 | 70.1 | 62.8 | 84.3 | 35.7 | 56.9 | 35.7 | 75.6 | 69.8 | |
Schmidt_FAU_task1_1 | Schmidt2023 | 26 | 29.25 | 55.7 | 40.0 | 66.2 | 59.3 | 53.7 | 78.8 | 36.8 | 56.3 | 31.7 | 71.9 | 62.2 | |
Schmidt_FAU_task1_2 | Schmidt2023 | 12 | 21.50 | 55.6 | 32.5 | 65.4 | 61.1 | 55.1 | 86.9 | 32.3 | 47.7 | 41.8 | 69.0 | 64.1 | |
Schmidt_FAU_task1_3 | Schmidt2023 | 19 | 26.75 | 52.7 | 34.0 | 67.6 | 59.7 | 51.8 | 78.1 | 25.4 | 53.3 | 37.1 | 64.5 | 55.2 | |
Schmidt_FAU_task1_4 | Schmidt2023 | 24 | 28.75 | 49.7 | 31.5 | 59.8 | 61.0 | 49.2 | 68.5 | 31.2 | 67.8 | 17.3 | 57.3 | 53.5 | |
Tan_NTU_task1_1 | Tan2023 | 33 | 33.50 | 47.1 | 33.8 | 63.3 | 41.0 | 39.7 | 62.8 | 21.8 | 58.3 | 28.5 | 61.4 | 60.3 | |
Tan_NTU_task1_2 | Tan2023 | 35 | 34.00 | 48.5 | 35.8 | 71.5 | 44.0 | 37.1 | 70.5 | 23.1 | 60.7 | 25.7 | 61.7 | 54.7 | |
Tan_NTU_task1_3 | Tan2023 | 37 | 37.00 | 46.3 | 40.6 | 55.1 | 45.2 | 48.2 | 75.9 | 13.7 | 58.4 | 14.1 | 63.6 | 48.5 | |
Tan_SCUT_task1_1 | Tan2023a | 4 | 13.50 | 60.8 | 42.7 | 76.8 | 62.5 | 49.7 | 83.6 | 44.3 | 60.1 | 43.4 | 73.5 | 71.7 | |
Tan_SCUT_task1_2 | Tan2023a | 30 | 31.00 | 51.7 | 37.7 | 70.8 | 47.8 | 42.0 | 75.3 | 32.7 | 50.0 | 32.5 | 67.0 | 61.3 | |
Tan_SCUT_task1_3 | Tan2023a | 16 | 25.50 | 53.5 | 38.5 | 70.9 | 49.3 | 39.8 | 75.0 | 40.0 | 56.4 | 34.0 | 68.4 | 62.5 | |
Tan_SCUT_task1_4 | Tan2023a | 32 | 33.00 | 50.9 | 35.5 | 79.1 | 42.7 | 49.4 | 74.3 | 31.3 | 42.9 | 38.1 | 65.7 | 50.2 | |
Vo_DU_task1_1 | Vo2023 | 52 | 46.75 | 45.0 | 45.6 | 58.1 | 40.8 | 39.1 | 65.9 | 20.0 | 39.6 | 26.7 | 62.5 | 51.3 | |
Vo_DU_task1_2 | Vo2023 | 53 | 47.75 | 44.8 | 42.6 | 59.5 | 40.9 | 40.2 | 64.9 | 22.0 | 35.8 | 28.1 | 65.2 | 48.8 | |
Vo_DU_task1_3 | Vo2023 | 50 | 46.25 | 45.2 | 45.2 | 60.1 | 38.6 | 42.3 | 68.4 | 21.9 | 40.9 | 23.8 | 61.8 | 49.2 | |
Vo_DU_task1_4 | Vo2023 | 49 | 45.75 | 45.5 | 41.5 | 50.4 | 57.0 | 29.1 | 71.7 | 16.6 | 43.8 | 34.0 | 61.7 | 49.5 | |
Wang_SCUT_task1_1 | Wang2023 | 31 | 31.50 | 49.1 | 52.9 | 68.7 | 43.2 | 29.2 | 71.3 | 23.1 | 42.3 | 34.5 | 58.5 | 67.7 | |
Wang_SCUT_task1_2 | Wang2023 | 18 | 26.50 | 52.9 | 58.1 | 63.7 | 44.0 | 28.7 | 71.7 | 35.4 | 54.1 | 34.1 | 63.1 | 75.8 | |
Wang_SCUT_task1_3 | Wang2023 | 46 | 43.50 | 47.1 | 24.5 | 66.6 | 62.7 | 69.1 | 68.4 | 21.1 | 36.7 | 24.4 | 63.3 | 34.5 | |
Wang_SCUT_task1_4 | Wang2023 | 48 | 45.00 | 48.5 | 45.9 | 48.3 | 37.3 | 49.7 | 79.3 | 23.5 | 55.1 | 23.2 | 65.8 | 56.6 | |
XuQianHu_BIT&NUDT_task1_1 | Yu2023 | 45 | 40.50 | 51.0 | 16.2 | 59.7 | 52.8 | 54.0 | 81.3 | 30.4 | 55.8 | 42.3 | 65.2 | 52.4 | |
XuQianHu_BIT&NUDT_task1_2 | Yu2023 | 44 | 40.00 | 51.6 | 22.0 | 73.9 | 47.4 | 57.2 | 79.9 | 33.5 | 53.5 | 33.9 | 60.7 | 53.8 | |
XuQianHu_BIT&NUDT_task1_3 | Yu2023 | 41 | 39.25 | 50.0 | 22.4 | 67.5 | 50.1 | 56.4 | 82.3 | 31.6 | 52.1 | 26.4 | 59.0 | 52.6 | |
XuQianHu_BIT&NUDT_task1_4 | Yu2023 | 36 | 35.50 | 51.1 | 24.2 | 67.2 | 46.5 | 59.4 | 75.2 | 31.8 | 58.2 | 33.0 | 64.5 | 51.5 | |
Yang_GZHU_task1_1 | Weng2023 | 15 | 24.75 | 55.5 | 40.3 | 62.4 | 60.1 | 50.2 | 77.2 | 33.2 | 60.6 | 32.8 | 73.9 | 64.7 | |
Yang_GZHU_task1_2 | Weng2023 | 14 | 22.25 | 55.9 | 40.8 | 60.9 | 58.9 | 48.9 | 78.9 | 34.7 | 61.2 | 32.6 | 74.1 | 68.2 | |
Yang_GZHU_task1_3 | Weng2023 | 21 | 27.25 | 55.1 | 42.1 | 74.0 | 52.2 | 43.3 | 82.7 | 36.9 | 62.2 | 26.5 | 68.9 | 62.4 | |
Yang_GZHU_task1_4 | Weng2023 | 19 | 26.75 | 55.2 | 44.7 | 72.8 | 49.4 | 42.4 | 82.2 | 35.3 | 61.5 | 28.0 | 70.4 | 65.1 | |
Zhang_NCUT_task1_1 | Zhang2023 | 49 | 45.75 | 43.3 | 26.0 | 54.3 | 49.7 | 43.6 | 62.7 | 18.8 | 50.2 | 17.1 | 60.8 | 49.6 | |
Zhang_NCUT_task1_2 | Zhang2023 | 51 | 46.50 | 47.9 | 26.9 | 71.4 | 44.0 | 35.1 | 79.9 | 32.3 | 46.5 | 33.9 | 59.1 | 49.7 | |
Zhang_SATLab_task1_1 | Bing2023 | 26 | 29.25 | 48.8 | 35.6 | 76.1 | 39.6 | 36.0 | 77.7 | 33.4 | 50.4 | 34.5 | 57.4 | 47.1 | |
Zhang_SATLab_task1_2 | Bing2023 | 34 | 33.75 | 50.0 | 37.5 | 72.7 | 46.6 | 39.9 | 76.1 | 29.9 | 51.9 | 31.8 | 58.6 | 54.7 | |
Zhang_SATLab_task1_3 | Bing2023 | 27 | 30.00 | 51.9 | 36.7 | 78.5 | 48.9 | 39.8 | 82.0 | 31.3 | 54.1 | 38.9 | 54.9 | 53.9 | |
Zhang_SATLab_task1_4 | Bing2023 | 22 | 27.50 | 50.3 | 33.6 | 75.0 | 42.7 | 37.4 | 84.9 | 29.4 | 52.1 | 41.7 | 55.3 | 51.0 |
Log loss
Rank | Submission label |
Technical Report |
Official system rank |
Rank value | Logloss | Airport | Bus | Metro |
Metro station |
Park |
Public square |
Shopping mall |
Street pedestrian |
Street traffic |
Tram |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AI4EDGE_IPL_task1_1 | Almeida2023 | 28 | 30.25 | 1.920 | 1.521 | 1.162 | 1.712 | 2.025 | 2.107 | 2.513 | 2.220 | 2.167 | 2.708 | 1.062 | |
AI4EDGE_IPL_task1_2 | Almeida2023 | 47 | 43.75 | 1.996 | 1.979 | 1.132 | 1.686 | 1.710 | 1.728 | 2.664 | 3.129 | 2.401 | 2.236 | 1.295 | |
AI4EDGE_IPL_task1_3 | Almeida2023 | 38 | 37.25 | 1.364 | 1.382 | 0.939 | 1.269 | 1.773 | 0.632 | 1.946 | 1.623 | 1.773 | 1.293 | 1.007 | |
Bai_JLESS_task1_1 | Du2023 | 54 | 50.25 | 1.825 | 2.324 | 1.215 | 1.402 | 1.546 | 1.724 | 2.876 | 1.531 | 2.857 | 1.436 | 1.336 | |
Bai_JLESS_task1_2 | Du2023 | 47 | 43.75 | 1.791 | 2.241 | 1.049 | 1.574 | 1.965 | 1.239 | 2.638 | 1.684 | 2.946 | 1.194 | 1.379 | |
Cai_TENCENT_task1_1 | Cai2023 | 17 | 25.75 | 1.174 | 1.446 | 1.060 | 0.971 | 1.180 | 0.646 | 1.776 | 1.219 | 1.695 | 0.862 | 0.883 | |
Cai_TENCENT_task1_2 | Cai2023 | 11 | 21.25 | 1.246 | 1.628 | 0.988 | 1.103 | 1.149 | 0.629 | 1.805 | 1.266 | 1.903 | 1.039 | 0.948 | |
Cai_TENCENT_task1_3 | Cai2023 | 14 | 22.25 | 1.241 | 1.598 | 0.986 | 1.188 | 1.275 | 0.666 | 1.775 | 1.240 | 1.770 | 1.013 | 0.894 | |
Cai_TENCENT_task1_4 | Cai2023 | 11 | 21.25 | 1.252 | 1.599 | 1.096 | 1.161 | 1.244 | 0.766 | 1.723 | 1.307 | 1.752 | 0.954 | 0.918 | |
Cai_XJTLU_task1_1 | Cai2023a | 9 | 19.75 | 1.307 | 1.535 | 1.008 | 1.317 | 1.596 | 0.750 | 1.859 | 1.153 | 1.715 | 1.005 | 1.129 | |
Cai_XJTLU_task1_2 | Cai2023a | 8 | 18.25 | 1.292 | 1.609 | 0.974 | 1.221 | 1.451 | 0.750 | 1.826 | 1.150 | 1.761 | 1.077 | 1.096 | |
Cai_XJTLU_task1_3 | Cai2023a | 6 | 14.00 | 1.223 | 1.446 | 0.998 | 1.103 | 1.394 | 0.655 | 1.824 | 1.043 | 1.729 | 1.107 | 0.932 | |
Cai_XJTLU_task1_4 | Cai2023a | 3 | 11.50 | 1.241 | 1.502 | 0.908 | 1.126 | 1.429 | 0.562 | 1.873 | 1.320 | 1.889 | 0.955 | 0.849 | |
Fei_vv_task1_1 | Fei2023 | 15 | 24.75 | 1.282 | 1.418 | 0.835 | 1.178 | 1.567 | 0.753 | 1.959 | 1.141 | 1.884 | 1.100 | 0.986 | |
Fei_vv_task1_2 | Fei2023 | 13 | 21.75 | 1.290 | 1.636 | 0.796 | 1.238 | 1.551 | 0.729 | 1.977 | 1.146 | 1.719 | 1.117 | 0.995 | |
Fei_vv_task1_3 | Fei2023 | 26 | 29.25 | 1.349 | 1.592 | 0.695 | 1.491 | 1.644 | 0.930 | 2.276 | 1.159 | 1.603 | 1.031 | 1.069 | |
Fei_vv_task1_4 | Fei2023 | 23 | 28.25 | 1.370 | 1.671 | 0.818 | 1.469 | 1.730 | 0.702 | 2.297 | 0.941 | 1.847 | 1.003 | 1.226 | |
Han_SZU_task1_1 | Han2023 | 39 | 37.75 | 2.012 | 2.062 | 1.571 | 1.586 | 2.090 | 0.780 | 3.228 | 2.380 | 2.773 | 2.029 | 1.616 | |
LAM_AEV_task1_1 | Pham2023 | 25 | 29.00 | 1.847 | 1.943 | 0.801 | 1.551 | 2.243 | 1.235 | 2.526 | 1.813 | 3.416 | 1.577 | 1.364 | |
LAM_AEV_task1_2 | Pham2023 | 31 | 31.50 | 2.083 | 2.826 | 0.828 | 2.244 | 1.996 | 0.842 | 3.829 | 2.003 | 3.330 | 1.515 | 1.416 | |
LAM_AEV_task1_3 | Pham2023 | 20 | 27.00 | 1.933 | 2.060 | 0.936 | 1.820 | 1.912 | 1.074 | 3.087 | 1.947 | 3.500 | 1.652 | 1.339 | |
Liang_NTES_task1_1 | Liang2023 | 40 | 38.50 | 1.402 | 1.610 | 0.783 | 1.515 | 1.710 | 0.845 | 2.014 | 1.272 | 1.822 | 1.588 | 0.861 | |
MALACH23_JKU_task1_1 | Pichler2023 | 8 | 18.25 | 1.230 | 1.676 | 0.883 | 1.110 | 1.313 | 0.596 | 1.672 | 1.381 | 1.755 | 0.997 | 0.913 | |
MALACH23_JKU_task1_2 | Pichler2023 | 7 | 16.75 | 1.242 | 1.661 | 0.953 | 1.239 | 1.262 | 0.716 | 1.637 | 1.350 | 1.659 | 1.000 | 0.945 | |
MALACH23_JKU_task1_3 | Pichler2023 | 42 | 39.50 | 4.354 | 4.368 | 4.889 | 4.315 | 3.618 | 5.426 | 3.395 | 4.872 | 3.433 | 4.845 | 4.376 | |
MALACH23_JKU_task1_4 | Pichler2023 | 43 | 39.75 | 3.224 | 3.160 | 3.690 | 3.201 | 2.570 | 3.634 | 2.692 | 3.373 | 2.840 | 3.927 | 3.157 | |
DCASE2023 baseline | 52 | 46.75 | 1.523 | 1.487 | 1.585 | 1.294 | 1.689 | 1.452 | 2.085 | 1.360 | 1.817 | 1.365 | 1.100 | ||
Park_KT_task1_1 | Kim2023 | 10 | 20.75 | 1.495 | 2.157 | 0.591 | 1.287 | 1.530 | 1.067 | 2.541 | 1.623 | 2.210 | 1.106 | 0.836 | |
Park_KT_task1_2 | Kim2023 | 29 | 30.75 | 1.660 | 2.786 | 0.766 | 1.556 | 1.807 | 1.128 | 2.542 | 1.342 | 2.335 | 1.342 | 0.993 | |
Park_KT_task1_3 | Kim2023 | 26 | 29.25 | 2.230 | 2.871 | 0.933 | 2.170 | 2.357 | 1.302 | 3.469 | 2.722 | 3.498 | 1.467 | 1.509 | |
Park_KT_task1_4 | Kim2023 | 20 | 27.00 | 1.510 | 1.618 | 1.014 | 1.459 | 1.883 | 1.047 | 1.889 | 1.598 | 1.797 | 1.535 | 1.254 | |
Schmid_CPJKU_task1_1 | Schmid2023 | 5 | 13.75 | 1.313 | 1.219 | 0.901 | 1.115 | 1.634 | 0.584 | 2.156 | 1.080 | 2.250 | 0.989 | 1.198 | |
Schmid_CPJKU_task1_2 | Schmid2023 | 1 | 5.25 | 1.256 | 1.421 | 0.842 | 1.175 | 1.596 | 0.425 | 2.134 | 1.140 | 2.266 | 0.906 | 0.657 | |
Schmid_CPJKU_task1_3 | Schmid2023 | 2 | 7.00 | 1.153 | 1.364 | 0.697 | 1.009 | 1.193 | 0.470 | 2.095 | 0.960 | 1.970 | 0.988 | 0.783 | |
Schmid_CPJKU_task1_4 | Schmid2023 | 3 | 11.50 | 1.117 | 1.498 | 0.388 | 0.873 | 1.087 | 0.518 | 2.034 | 1.233 | 1.950 | 0.774 | 0.814 | |
Schmidt_FAU_task1_1 | Schmidt2023 | 26 | 29.25 | 1.322 | 1.770 | 1.142 | 1.253 | 1.397 | 0.802 | 1.839 | 1.263 | 1.801 | 0.949 | 1.007 | |
Schmidt_FAU_task1_2 | Schmidt2023 | 12 | 21.50 | 1.337 | 1.930 | 1.175 | 1.199 | 1.374 | 0.583 | 1.982 | 1.532 | 1.568 | 1.041 | 0.980 | |
Schmidt_FAU_task1_3 | Schmidt2023 | 19 | 26.75 | 1.398 | 1.837 | 1.101 | 1.207 | 1.449 | 0.845 | 2.227 | 1.311 | 1.671 | 1.191 | 1.137 | |
Schmidt_FAU_task1_4 | Schmidt2023 | 24 | 28.75 | 1.482 | 2.030 | 1.351 | 1.158 | 1.477 | 1.132 | 2.101 | 1.012 | 2.060 | 1.370 | 1.127 | |
Tan_NTU_task1_1 | Tan2023 | 33 | 33.50 | 1.508 | 1.675 | 1.089 | 1.389 | 1.687 | 1.253 | 2.095 | 1.402 | 2.048 | 1.316 | 1.124 | |
Tan_NTU_task1_2 | Tan2023 | 35 | 34.00 | 1.461 | 1.614 | 0.956 | 1.376 | 1.719 | 1.078 | 2.011 | 1.408 | 1.973 | 1.282 | 1.192 | |
Tan_NTU_task1_3 | Tan2023 | 37 | 37.00 | 1.492 | 1.547 | 1.311 | 1.363 | 1.519 | 1.044 | 2.122 | 1.392 | 2.074 | 1.181 | 1.363 | |
Tan_SCUT_task1_1 | Tan2023a | 4 | 13.50 | 1.192 | 1.627 | 0.664 | 1.014 | 1.608 | 0.537 | 1.663 | 1.148 | 2.026 | 0.818 | 0.818 | |
Tan_SCUT_task1_2 | Tan2023a | 30 | 31.00 | 1.444 | 1.789 | 0.889 | 1.294 | 1.656 | 0.864 | 1.979 | 1.537 | 2.237 | 1.116 | 1.075 | |
Tan_SCUT_task1_3 | Tan2023a | 16 | 25.50 | 1.441 | 1.767 | 0.869 | 1.312 | 1.922 | 0.912 | 1.908 | 1.320 | 2.234 | 1.055 | 1.112 | |
Tan_SCUT_task1_4 | Tan2023a | 32 | 33.00 | 1.525 | 1.938 | 0.654 | 1.470 | 1.528 | 0.993 | 2.129 | 1.728 | 2.201 | 1.202 | 1.405 | |
Vo_DU_task1_1 | Vo2023 | 52 | 46.75 | 2.157 | 1.492 | 1.532 | 2.737 | 1.898 | 1.042 | 3.724 | 2.305 | 4.132 | 1.336 | 1.373 | |
Vo_DU_task1_2 | Vo2023 | 53 | 47.75 | 2.116 | 1.603 | 1.474 | 2.640 | 1.871 | 1.086 | 3.194 | 2.548 | 4.103 | 1.230 | 1.412 | |
Vo_DU_task1_3 | Vo2023 | 50 | 46.25 | 2.092 | 1.541 | 1.508 | 2.746 | 1.762 | 0.993 | 3.112 | 2.224 | 4.321 | 1.325 | 1.387 | |
Vo_DU_task1_4 | Vo2023 | 49 | 45.75 | 1.793 | 1.633 | 1.567 | 1.534 | 2.211 | 0.976 | 3.063 | 1.866 | 2.395 | 1.295 | 1.390 | |
Wang_SCUT_task1_1 | Wang2023 | 31 | 31.50 | 1.493 | 1.234 | 0.904 | 1.480 | 2.014 | 1.130 | 2.168 | 1.615 | 2.120 | 1.312 | 0.957 | |
Wang_SCUT_task1_2 | Wang2023 | 18 | 26.50 | 1.348 | 1.130 | 0.989 | 1.427 | 2.059 | 1.186 | 1.579 | 1.191 | 1.818 | 1.317 | 0.788 | |
Wang_SCUT_task1_3 | Wang2023 | 46 | 43.50 | 1.702 | 2.422 | 0.984 | 1.010 | 0.940 | 1.218 | 2.515 | 2.111 | 2.914 | 1.289 | 1.614 | |
Wang_SCUT_task1_4 | Wang2023 | 48 | 45.00 | 1.472 | 1.363 | 1.565 | 1.891 | 1.454 | 0.760 | 1.956 | 1.241 | 2.119 | 1.053 | 1.321 | |
XuQianHu_BIT&NUDT_task1_1 | Yu2023 | 45 | 40.50 | 1.364 | 1.867 | 1.240 | 1.337 | 1.412 | 0.716 | 1.842 | 1.278 | 1.569 | 1.140 | 1.235 | |
XuQianHu_BIT&NUDT_task1_2 | Yu2023 | 44 | 40.00 | 1.355 | 1.827 | 0.902 | 1.344 | 1.339 | 0.746 | 1.851 | 1.326 | 1.764 | 1.279 | 1.170 | |
XuQianHu_BIT&NUDT_task1_3 | Yu2023 | 41 | 39.25 | 1.395 | 1.845 | 1.033 | 1.291 | 1.324 | 0.678 | 1.955 | 1.348 | 1.963 | 1.314 | 1.202 | |
XuQianHu_BIT&NUDT_task1_4 | Yu2023 | 36 | 35.50 | 1.367 | 1.743 | 1.083 | 1.413 | 1.299 | 0.875 | 1.877 | 1.214 | 1.738 | 1.177 | 1.255 | |
Yang_GZHU_task1_1 | Weng2023 | 15 | 24.75 | 1.280 | 1.519 | 1.091 | 1.098 | 1.464 | 0.741 | 2.027 | 1.113 | 1.916 | 0.861 | 0.968 | |
Yang_GZHU_task1_2 | Weng2023 | 14 | 22.25 | 1.241 | 1.467 | 1.099 | 1.100 | 1.475 | 0.680 | 1.920 | 1.111 | 1.867 | 0.845 | 0.849 | |
Yang_GZHU_task1_3 | Weng2023 | 21 | 27.25 | 1.279 | 1.443 | 0.753 | 1.287 | 1.630 | 0.558 | 1.981 | 1.092 | 2.067 | 0.970 | 1.005 | |
Yang_GZHU_task1_4 | Weng2023 | 19 | 26.75 | 1.259 | 1.378 | 0.795 | 1.340 | 1.627 | 0.591 | 1.937 | 1.120 | 1.972 | 0.913 | 0.920 | |
Zhang_NCUT_task1_1 | Zhang2023 | 49 | 45.75 | 1.757 | 2.159 | 1.382 | 1.482 | 1.701 | 1.165 | 2.575 | 1.538 | 2.788 | 1.292 | 1.484 | |
Zhang_NCUT_task1_2 | Zhang2023 | 51 | 46.50 | 1.533 | 2.046 | 0.927 | 1.587 | 1.903 | 0.735 | 2.025 | 1.641 | 1.830 | 1.301 | 1.337 | |
Zhang_SATLab_task1_1 | Bing2023 | 26 | 29.25 | 3.248 | 4.626 | 1.072 | 2.657 | 4.508 | 1.552 | 3.713 | 3.538 | 5.323 | 3.042 | 2.446 | |
Zhang_SATLab_task1_2 | Bing2023 | 34 | 33.75 | 4.213 | 4.392 | 1.559 | 3.338 | 4.560 | 2.582 | 6.580 | 4.340 | 7.767 | 4.126 | 2.881 | |
Zhang_SATLab_task1_3 | Bing2023 | 27 | 30.00 | 1.704 | 1.810 | 0.713 | 1.606 | 2.119 | 0.817 | 2.119 | 1.555 | 1.766 | 2.093 | 2.443 | |
Zhang_SATLab_task1_4 | Bing2023 | 22 | 27.50 | 1.542 | 1.876 | 0.816 | 1.767 | 1.824 | 0.643 | 2.127 | 1.511 | 1.712 | 1.551 | 1.595 |
Device-wise performance
Accuracy
Unseen devices | Seen devices | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Rank | Submission label |
Technical Report |
Official system rank |
Rank value | Accuracy |
Accuracy / Unseen |
Accuracy / Seen |
D | S7 | S8 | S9 | S10 | A | B | C | S1 | S2 | S3 |
AI4EDGE_IPL_task1_1 | Almeida2023 | 28 | 30.25 | 51.9 | 47.0 | 56.1 | 32.2 | 51.5 | 50.8 | 50.5 | 49.8 | 59.3 | 55.9 | 58.6 | 56.5 | 51.1 | 55.1 | |
AI4EDGE_IPL_task1_2 | Almeida2023 | 47 | 43.75 | 48.8 | 42.4 | 54.1 | 30.3 | 46.6 | 44.9 | 44.1 | 46.1 | 56.0 | 55.1 | 56.2 | 52.8 | 50.8 | 53.9 | |
AI4EDGE_IPL_task1_3 | Almeida2023 | 38 | 37.25 | 50.8 | 45.3 | 55.4 | 42.7 | 46.4 | 45.9 | 44.6 | 46.9 | 58.2 | 55.2 | 56.5 | 54.4 | 52.2 | 55.9 | |
Bai_JLESS_task1_1 | Du2023 | 54 | 50.25 | 47.9 | 39.5 | 55.0 | 29.8 | 48.8 | 47.3 | 34.3 | 37.3 | 64.5 | 56.0 | 60.6 | 49.3 | 49.3 | 50.0 | |
Bai_JLESS_task1_2 | Du2023 | 47 | 43.75 | 48.8 | 40.6 | 55.6 | 31.5 | 48.6 | 47.8 | 35.1 | 40.0 | 64.6 | 56.6 | 61.8 | 49.9 | 49.8 | 50.8 | |
Cai_TENCENT_task1_1 | Cai2023 | 17 | 25.75 | 56.6 | 50.7 | 61.5 | 46.0 | 57.2 | 58.0 | 45.1 | 47.4 | 69.1 | 60.3 | 65.0 | 57.7 | 57.9 | 59.2 | |
Cai_TENCENT_task1_2 | Cai2023 | 11 | 21.25 | 56.2 | 51.8 | 59.8 | 37.6 | 57.7 | 57.6 | 53.4 | 52.9 | 66.0 | 57.6 | 61.5 | 57.0 | 58.3 | 58.4 | |
Cai_TENCENT_task1_3 | Cai2023 | 14 | 22.25 | 55.8 | 51.9 | 59.1 | 39.3 | 58.0 | 57.1 | 52.5 | 52.5 | 65.8 | 56.3 | 61.7 | 56.3 | 56.3 | 58.4 | |
Cai_TENCENT_task1_4 | Cai2023 | 11 | 21.25 | 55.4 | 51.3 | 58.7 | 39.2 | 56.5 | 55.8 | 52.8 | 52.4 | 65.1 | 56.7 | 61.2 | 56.1 | 56.0 | 57.3 | |
Cai_XJTLU_task1_1 | Cai2023a | 9 | 19.75 | 51.9 | 49.1 | 54.2 | 44.6 | 54.3 | 52.7 | 46.5 | 47.3 | 60.6 | 52.7 | 56.2 | 52.7 | 50.4 | 52.7 | |
Cai_XJTLU_task1_2 | Cai2023a | 8 | 18.25 | 52.5 | 49.7 | 55.0 | 44.0 | 54.5 | 54.5 | 47.6 | 47.7 | 62.6 | 51.6 | 56.5 | 53.6 | 51.3 | 54.1 | |
Cai_XJTLU_task1_3 | Cai2023a | 6 | 14.00 | 55.1 | 51.6 | 58.0 | 42.9 | 57.3 | 57.2 | 50.4 | 50.3 | 65.0 | 54.7 | 58.7 | 57.2 | 55.6 | 56.7 | |
Cai_XJTLU_task1_4 | Cai2023a | 3 | 11.50 | 57.0 | 53.5 | 59.9 | 46.2 | 58.3 | 58.4 | 52.7 | 51.9 | 66.5 | 57.5 | 59.8 | 57.6 | 59.0 | 59.2 | |
Fei_vv_task1_1 | Fei2023 | 15 | 24.75 | 55.2 | 51.7 | 58.2 | 44.9 | 57.6 | 53.7 | 53.1 | 49.4 | 64.8 | 57.1 | 60.4 | 56.0 | 55.1 | 55.8 | |
Fei_vv_task1_2 | Fei2023 | 13 | 21.75 | 54.5 | 50.9 | 57.5 | 42.8 | 56.8 | 54.0 | 52.3 | 48.4 | 64.8 | 55.1 | 59.4 | 56.7 | 53.9 | 55.2 | |
Fei_vv_task1_3 | Fei2023 | 26 | 29.25 | 53.2 | 49.9 | 56.1 | 42.6 | 55.6 | 53.3 | 50.8 | 47.0 | 62.2 | 55.9 | 58.8 | 54.1 | 53.0 | 52.4 | |
Fei_vv_task1_4 | Fei2023 | 23 | 28.25 | 51.8 | 48.8 | 54.2 | 41.2 | 54.2 | 53.7 | 48.8 | 46.3 | 60.5 | 52.1 | 56.8 | 53.5 | 50.7 | 51.5 | |
Han_SZU_task1_1 | Han2023 | 39 | 37.75 | 50.5 | 44.9 | 55.2 | 41.0 | 51.3 | 50.0 | 38.3 | 43.8 | 65.0 | 52.6 | 59.0 | 51.9 | 50.0 | 53.1 | |
LAM_AEV_task1_1 | Pham2023 | 25 | 29.00 | 55.3 | 49.6 | 60.1 | 43.6 | 57.7 | 57.3 | 39.4 | 50.3 | 67.8 | 56.6 | 64.3 | 58.7 | 53.8 | 59.3 | |
LAM_AEV_task1_2 | Pham2023 | 31 | 31.50 | 55.0 | 48.9 | 60.1 | 44.7 | 56.0 | 54.5 | 38.8 | 50.6 | 68.0 | 56.4 | 63.8 | 56.9 | 56.4 | 58.9 | |
LAM_AEV_task1_3 | Pham2023 | 20 | 27.00 | 55.6 | 49.6 | 60.5 | 44.6 | 58.7 | 55.4 | 39.7 | 49.8 | 69.2 | 56.4 | 64.6 | 57.9 | 56.0 | 59.1 | |
Liang_NTES_task1_1 | Liang2023 | 40 | 38.50 | 52.6 | 44.9 | 58.9 | 32.5 | 54.4 | 53.9 | 39.8 | 44.0 | 67.5 | 58.2 | 63.7 | 57.0 | 52.2 | 55.0 | |
MALACH23_JKU_task1_1 | Pichler2023 | 8 | 18.25 | 57.0 | 52.3 | 60.9 | 44.7 | 57.7 | 59.1 | 49.5 | 50.6 | 68.4 | 57.3 | 63.5 | 58.0 | 57.9 | 60.2 | |
MALACH23_JKU_task1_2 | Pichler2023 | 7 | 16.75 | 56.6 | 51.9 | 60.4 | 42.9 | 56.6 | 58.3 | 51.4 | 50.4 | 67.7 | 56.7 | 63.1 | 57.7 | 57.7 | 59.7 | |
MALACH23_JKU_task1_3 | Pichler2023 | 42 | 39.50 | 9.9 | 10.0 | 9.8 | 9.8 | 10.1 | 9.6 | 9.9 | 10.6 | 10.1 | 9.9 | 9.8 | 9.9 | 9.5 | 9.8 | |
MALACH23_JKU_task1_4 | Pichler2023 | 43 | 39.75 | 9.8 | 9.8 | 9.9 | 9.3 | 10.2 | 9.9 | 9.6 | 10.0 | 9.9 | 9.9 | 9.8 | 9.8 | 9.6 | 10.2 | |
DCASE2023 baseline | 52 | 46.75 | 44.8 | 38.0 | 50.5 | 34.7 | 46.2 | 43.6 | 31.1 | 34.2 | 60.4 | 50.1 | 54.9 | 45.1 | 45.0 | 47.7 | ||
Park_KT_task1_1 | Kim2023 | 10 | 20.75 | 56.3 | 49.9 | 61.7 | 33.3 | 60.3 | 59.3 | 50.3 | 46.4 | 68.8 | 60.7 | 63.8 | 59.0 | 57.6 | 60.2 | |
Park_KT_task1_2 | Kim2023 | 29 | 30.75 | 53.0 | 46.2 | 58.7 | 34.6 | 54.9 | 52.5 | 45.7 | 43.2 | 64.1 | 58.4 | 61.3 | 56.2 | 56.2 | 56.2 | |
Park_KT_task1_3 | Kim2023 | 26 | 29.25 | 54.2 | 47.1 | 60.2 | 32.6 | 56.6 | 56.2 | 47.0 | 43.2 | 67.3 | 59.0 | 61.3 | 57.3 | 56.8 | 59.3 | |
Park_KT_task1_4 | Kim2023 | 20 | 27.00 | 49.2 | 45.8 | 52.0 | 41.9 | 50.0 | 48.8 | 44.2 | 44.3 | 60.4 | 48.8 | 52.7 | 51.9 | 46.8 | 51.5 | |
Schmid_CPJKU_task1_1 | Schmid2023 | 5 | 13.75 | 54.4 | 52.3 | 56.1 | 42.9 | 56.8 | 55.9 | 53.1 | 53.0 | 59.7 | 53.7 | 57.5 | 56.0 | 53.8 | 55.5 | |
Schmid_CPJKU_task1_2 | Schmid2023 | 1 | 5.25 | 58.7 | 54.6 | 62.1 | 47.9 | 59.6 | 58.5 | 52.4 | 54.6 | 66.9 | 59.5 | 63.1 | 61.0 | 60.7 | 61.6 | |
Schmid_CPJKU_task1_3 | Schmid2023 | 2 | 7.00 | 61.4 | 57.8 | 64.4 | 48.1 | 63.3 | 62.1 | 58.7 | 57.0 | 68.8 | 61.2 | 64.5 | 64.7 | 63.3 | 64.0 | |
Schmid_CPJKU_task1_4 | Schmid2023 | 3 | 11.50 | 62.7 | 57.4 | 67.1 | 45.1 | 63.9 | 64.1 | 56.4 | 57.6 | 71.0 | 64.9 | 67.9 | 66.4 | 66.6 | 66.0 | |
Schmidt_FAU_task1_1 | Schmidt2023 | 26 | 29.25 | 55.7 | 51.0 | 59.6 | 40.5 | 57.5 | 56.2 | 51.0 | 49.6 | 66.5 | 57.6 | 63.1 | 56.6 | 56.7 | 57.3 | |
Schmidt_FAU_task1_2 | Schmidt2023 | 12 | 21.50 | 55.6 | 50.8 | 59.5 | 38.4 | 55.5 | 56.3 | 52.2 | 51.7 | 66.8 | 56.2 | 62.3 | 56.9 | 56.5 | 58.4 | |
Schmidt_FAU_task1_3 | Schmidt2023 | 19 | 26.75 | 52.7 | 47.6 | 57.0 | 37.7 | 55.2 | 54.2 | 44.4 | 46.3 | 64.5 | 54.5 | 61.0 | 53.0 | 53.6 | 55.0 | |
Schmidt_FAU_task1_4 | Schmidt2023 | 24 | 28.75 | 49.7 | 44.8 | 53.8 | 38.1 | 52.2 | 51.6 | 38.7 | 43.6 | 61.7 | 52.2 | 55.5 | 53.1 | 48.7 | 51.3 | |
Tan_NTU_task1_1 | Tan2023 | 33 | 33.50 | 47.1 | 39.7 | 53.3 | 36.5 | 49.3 | 40.5 | 35.8 | 36.4 | 63.3 | 51.4 | 57.7 | 48.4 | 47.6 | 51.2 | |
Tan_NTU_task1_2 | Tan2023 | 35 | 34.00 | 48.5 | 41.4 | 54.4 | 36.0 | 50.1 | 42.2 | 37.3 | 41.4 | 64.2 | 50.6 | 58.8 | 50.0 | 49.1 | 53.5 | |
Tan_NTU_task1_3 | Tan2023 | 37 | 37.00 | 46.3 | 40.1 | 51.5 | 29.5 | 47.0 | 45.8 | 37.8 | 40.4 | 61.0 | 45.5 | 55.9 | 47.7 | 48.3 | 50.6 | |
Tan_SCUT_task1_1 | Tan2023a | 4 | 13.50 | 60.8 | 55.7 | 65.1 | 42.8 | 62.6 | 60.0 | 56.3 | 57.1 | 70.0 | 61.5 | 66.2 | 64.1 | 64.1 | 64.6 | |
Tan_SCUT_task1_2 | Tan2023a | 30 | 31.00 | 51.7 | 45.8 | 56.6 | 33.5 | 55.4 | 54.2 | 41.3 | 44.8 | 62.9 | 52.5 | 57.9 | 55.0 | 55.1 | 56.1 | |
Tan_SCUT_task1_3 | Tan2023a | 16 | 25.50 | 53.5 | 48.6 | 57.5 | 36.2 | 56.9 | 55.6 | 47.8 | 46.5 | 62.8 | 53.8 | 58.7 | 56.3 | 55.6 | 58.0 | |
Tan_SCUT_task1_4 | Tan2023a | 32 | 33.00 | 50.9 | 45.5 | 55.4 | 30.9 | 54.6 | 52.8 | 42.2 | 46.9 | 59.4 | 51.7 | 57.1 | 53.9 | 54.4 | 56.1 | |
Vo_DU_task1_1 | Vo2023 | 52 | 46.75 | 45.0 | 36.2 | 52.3 | 26.8 | 44.0 | 41.0 | 33.3 | 35.8 | 62.3 | 53.2 | 56.9 | 48.3 | 46.4 | 46.6 | |
Vo_DU_task1_2 | Vo2023 | 53 | 47.75 | 44.8 | 36.3 | 51.9 | 28.4 | 43.5 | 40.0 | 33.8 | 35.5 | 62.3 | 52.8 | 57.1 | 47.4 | 46.1 | 45.8 | |
Vo_DU_task1_3 | Vo2023 | 50 | 46.25 | 45.2 | 36.6 | 52.4 | 27.6 | 44.3 | 41.2 | 34.7 | 35.4 | 62.1 | 53.2 | 57.7 | 47.3 | 47.0 | 46.9 | |
Vo_DU_task1_4 | Vo2023 | 49 | 45.75 | 45.5 | 38.0 | 51.8 | 29.5 | 43.2 | 44.7 | 36.5 | 35.9 | 61.7 | 51.3 | 56.9 | 46.3 | 46.4 | 48.3 | |
Wang_SCUT_task1_1 | Wang2023 | 31 | 31.50 | 49.1 | 41.8 | 55.3 | 26.1 | 51.3 | 48.0 | 39.3 | 44.1 | 61.5 | 52.6 | 57.5 | 54.8 | 52.9 | 52.6 | |
Wang_SCUT_task1_2 | Wang2023 | 18 | 26.50 | 52.9 | 47.3 | 57.5 | 38.8 | 55.9 | 53.5 | 42.1 | 46.3 | 61.8 | 54.8 | 60.1 | 57.0 | 56.2 | 55.3 | |
Wang_SCUT_task1_3 | Wang2023 | 46 | 43.50 | 47.1 | 38.4 | 54.4 | 22.8 | 49.1 | 48.1 | 39.3 | 32.9 | 61.9 | 53.1 | 56.8 | 48.6 | 53.5 | 52.2 | |
Wang_SCUT_task1_4 | Wang2023 | 48 | 45.00 | 48.5 | 42.9 | 53.1 | 33.1 | 50.5 | 50.1 | 38.0 | 42.9 | 59.8 | 50.2 | 53.4 | 53.2 | 51.6 | 50.5 | |
XuQianHu_BIT&NUDT_task1_1 | Yu2023 | 45 | 40.50 | 51.0 | 47.5 | 53.9 | 41.4 | 51.3 | 52.3 | 46.4 | 46.2 | 58.0 | 53.6 | 57.2 | 52.5 | 50.1 | 52.1 | |
XuQianHu_BIT&NUDT_task1_2 | Yu2023 | 44 | 40.00 | 51.6 | 47.4 | 55.1 | 43.5 | 50.2 | 52.7 | 45.0 | 45.4 | 58.4 | 55.7 | 58.8 | 53.5 | 51.4 | 52.7 | |
XuQianHu_BIT&NUDT_task1_3 | Yu2023 | 41 | 39.25 | 50.0 | 45.5 | 53.8 | 41.5 | 48.8 | 50.0 | 42.0 | 45.2 | 58.5 | 53.6 | 57.7 | 51.3 | 49.6 | 52.0 | |
XuQianHu_BIT&NUDT_task1_4 | Yu2023 | 36 | 35.50 | 51.1 | 47.6 | 54.1 | 43.4 | 50.5 | 50.6 | 47.2 | 46.2 | 59.4 | 55.3 | 58.0 | 51.6 | 48.9 | 51.4 | |
Yang_GZHU_task1_1 | Weng2023 | 15 | 24.75 | 55.5 | 52.6 | 58.0 | 46.9 | 58.4 | 55.6 | 51.8 | 50.3 | 64.1 | 56.4 | 60.6 | 55.6 | 55.6 | 55.7 | |
Yang_GZHU_task1_2 | Weng2023 | 14 | 22.25 | 55.9 | 53.2 | 58.2 | 47.2 | 58.7 | 56.4 | 52.5 | 51.2 | 63.5 | 56.8 | 61.0 | 56.2 | 55.6 | 55.8 | |
Yang_GZHU_task1_3 | Weng2023 | 21 | 27.25 | 55.1 | 52.1 | 57.6 | 42.2 | 57.3 | 56.2 | 54.7 | 50.0 | 62.5 | 55.7 | 61.4 | 56.5 | 54.7 | 55.0 | |
Yang_GZHU_task1_4 | Weng2023 | 19 | 26.75 | 55.2 | 52.4 | 57.5 | 43.4 | 57.5 | 56.1 | 54.6 | 50.6 | 62.3 | 56.2 | 61.1 | 56.1 | 54.3 | 54.7 | |
Zhang_NCUT_task1_1 | Zhang2023 | 49 | 45.75 | 43.3 | 33.3 | 51.6 | 34.0 | 38.8 | 38.0 | 27.6 | 27.8 | 63.2 | 55.0 | 57.6 | 45.1 | 45.9 | 43.0 | |
Zhang_NCUT_task1_2 | Zhang2023 | 51 | 46.50 | 47.9 | 39.7 | 54.7 | 37.9 | 48.3 | 48.0 | 27.0 | 37.4 | 63.0 | 54.8 | 61.0 | 49.4 | 50.0 | 50.0 | |
Zhang_SATLab_task1_1 | Bing2023 | 26 | 29.25 | 48.8 | 39.2 | 56.8 | 32.2 | 46.7 | 37.9 | 27.7 | 51.4 | 64.0 | 54.1 | 61.1 | 54.0 | 53.6 | 53.8 | |
Zhang_SATLab_task1_2 | Bing2023 | 34 | 33.75 | 50.0 | 40.6 | 57.7 | 28.6 | 50.5 | 47.5 | 30.0 | 46.7 | 64.7 | 57.5 | 62.9 | 54.3 | 53.3 | 53.8 | |
Zhang_SATLab_task1_3 | Bing2023 | 27 | 30.00 | 51.9 | 41.8 | 60.3 | 27.2 | 51.8 | 46.9 | 32.7 | 50.6 | 67.5 | 59.4 | 65.2 | 57.0 | 56.4 | 56.4 | |
Zhang_SATLab_task1_4 | Bing2023 | 22 | 27.50 | 50.3 | 41.6 | 57.6 | 34.0 | 47.2 | 43.7 | 33.3 | 49.6 | 65.0 | 55.7 | 62.4 | 54.0 | 54.0 | 54.5 |
Log loss
Unseen devices | Seen devices | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Rank | Submission label |
Technical Report |
Official system rank |
Rank value | Log loss |
Log loss / Unseen |
Log loss / Seen |
D | S7 | S8 | S9 | S10 | A | B | C | S1 | S2 | S3 |
AI4EDGE_IPL_task1_1 | Almeida2023 | 28 | 30.25 | 1.920 | 2.370 | 1.544 | 4.428 | 1.659 | 1.969 | 1.831 | 1.965 | 1.325 | 1.601 | 1.421 | 1.480 | 1.815 | 1.622 | |
AI4EDGE_IPL_task1_2 | Almeida2023 | 47 | 43.75 | 1.996 | 2.391 | 1.667 | 3.613 | 1.860 | 2.071 | 2.198 | 2.211 | 1.523 | 1.672 | 1.468 | 1.729 | 1.848 | 1.762 | |
AI4EDGE_IPL_task1_3 | Almeida2023 | 38 | 37.25 | 1.364 | 1.528 | 1.227 | 1.568 | 1.485 | 1.534 | 1.565 | 1.490 | 1.156 | 1.215 | 1.183 | 1.261 | 1.319 | 1.225 | |
Bai_JLESS_task1_1 | Du2023 | 54 | 50.25 | 1.825 | 2.335 | 1.399 | 3.859 | 1.607 | 1.631 | 2.131 | 2.448 | 1.095 | 1.349 | 1.163 | 1.579 | 1.650 | 1.560 | |
Bai_JLESS_task1_2 | Du2023 | 47 | 43.75 | 1.791 | 2.225 | 1.430 | 2.609 | 1.820 | 1.623 | 2.561 | 2.510 | 1.129 | 1.303 | 1.160 | 1.654 | 1.669 | 1.662 | |
Cai_TENCENT_task1_1 | Cai2023 | 17 | 25.75 | 1.174 | 1.347 | 1.029 | 1.463 | 1.168 | 1.138 | 1.513 | 1.454 | 0.850 | 1.071 | 0.929 | 1.113 | 1.123 | 1.088 | |
Cai_TENCENT_task1_2 | Cai2023 | 11 | 21.25 | 1.246 | 1.369 | 1.143 | 1.843 | 1.221 | 1.198 | 1.258 | 1.326 | 1.003 | 1.191 | 1.107 | 1.214 | 1.177 | 1.166 | |
Cai_TENCENT_task1_3 | Cai2023 | 14 | 22.25 | 1.241 | 1.356 | 1.144 | 1.768 | 1.195 | 1.191 | 1.296 | 1.332 | 0.996 | 1.200 | 1.094 | 1.206 | 1.208 | 1.159 | |
Cai_TENCENT_task1_4 | Cai2023 | 11 | 21.25 | 1.252 | 1.358 | 1.164 | 1.695 | 1.230 | 1.218 | 1.301 | 1.345 | 1.019 | 1.200 | 1.112 | 1.234 | 1.219 | 1.200 | |
Cai_XJTLU_task1_1 | Cai2023a | 9 | 19.75 | 1.307 | 1.390 | 1.237 | 1.538 | 1.248 | 1.267 | 1.465 | 1.431 | 1.082 | 1.284 | 1.187 | 1.290 | 1.317 | 1.265 | |
Cai_XJTLU_task1_2 | Cai2023a | 8 | 18.25 | 1.292 | 1.389 | 1.211 | 1.605 | 1.247 | 1.231 | 1.432 | 1.430 | 1.029 | 1.275 | 1.158 | 1.263 | 1.304 | 1.235 | |
Cai_XJTLU_task1_3 | Cai2023a | 6 | 14.00 | 1.223 | 1.331 | 1.133 | 1.620 | 1.165 | 1.166 | 1.336 | 1.367 | 0.957 | 1.222 | 1.107 | 1.153 | 1.198 | 1.163 | |
Cai_XJTLU_task1_4 | Cai2023a | 3 | 11.50 | 1.241 | 1.385 | 1.121 | 1.779 | 1.192 | 1.200 | 1.358 | 1.398 | 0.953 | 1.193 | 1.130 | 1.169 | 1.140 | 1.144 | |
Fei_vv_task1_1 | Fei2023 | 15 | 24.75 | 1.282 | 1.393 | 1.190 | 1.613 | 1.189 | 1.302 | 1.339 | 1.519 | 0.995 | 1.218 | 1.112 | 1.257 | 1.298 | 1.261 | |
Fei_vv_task1_2 | Fei2023 | 13 | 21.75 | 1.290 | 1.410 | 1.191 | 1.650 | 1.212 | 1.304 | 1.361 | 1.522 | 0.990 | 1.246 | 1.128 | 1.232 | 1.313 | 1.236 | |
Fei_vv_task1_3 | Fei2023 | 26 | 29.25 | 1.349 | 1.457 | 1.259 | 1.617 | 1.263 | 1.406 | 1.415 | 1.583 | 1.067 | 1.268 | 1.175 | 1.329 | 1.377 | 1.339 | |
Fei_vv_task1_4 | Fei2023 | 23 | 28.25 | 1.370 | 1.460 | 1.295 | 1.614 | 1.297 | 1.419 | 1.418 | 1.555 | 1.083 | 1.355 | 1.236 | 1.321 | 1.432 | 1.346 | |
Han_SZU_task1_1 | Han2023 | 39 | 37.75 | 2.012 | 2.565 | 1.550 | 2.860 | 1.664 | 1.923 | 4.114 | 2.263 | 1.370 | 1.748 | 1.420 | 1.574 | 1.721 | 1.469 | |
LAM_AEV_task1_1 | Pham2023 | 25 | 29.00 | 1.847 | 2.329 | 1.445 | 3.322 | 1.521 | 1.618 | 3.158 | 2.026 | 1.282 | 1.655 | 1.289 | 1.463 | 1.567 | 1.416 | |
LAM_AEV_task1_2 | Pham2023 | 31 | 31.50 | 2.083 | 2.681 | 1.584 | 3.367 | 1.903 | 2.028 | 3.894 | 2.215 | 1.350 | 1.745 | 1.425 | 1.688 | 1.729 | 1.566 | |
LAM_AEV_task1_3 | Pham2023 | 20 | 27.00 | 1.933 | 2.447 | 1.504 | 3.117 | 1.577 | 1.816 | 3.558 | 2.166 | 1.218 | 1.781 | 1.304 | 1.570 | 1.631 | 1.521 | |
Liang_NTES_task1_1 | Liang2023 | 40 | 38.50 | 1.402 | 1.704 | 1.150 | 2.196 | 1.328 | 1.364 | 1.938 | 1.694 | 0.922 | 1.154 | 0.995 | 1.207 | 1.357 | 1.266 | |
MALACH23_JKU_task1_1 | Pichler2023 | 8 | 18.25 | 1.230 | 1.349 | 1.130 | 1.513 | 1.211 | 1.194 | 1.428 | 1.399 | 0.957 | 1.207 | 1.081 | 1.195 | 1.212 | 1.130 | |
MALACH23_JKU_task1_2 | Pichler2023 | 7 | 16.75 | 1.242 | 1.353 | 1.150 | 1.558 | 1.236 | 1.205 | 1.365 | 1.402 | 0.974 | 1.226 | 1.099 | 1.220 | 1.218 | 1.161 | |
MALACH23_JKU_task1_3 | Pichler2023 | 42 | 39.50 | 4.354 | 4.358 | 4.350 | 4.363 | 4.339 | 4.366 | 4.391 | 4.330 | 4.340 | 4.341 | 4.367 | 4.359 | 4.339 | 4.355 | |
MALACH23_JKU_task1_4 | Pichler2023 | 43 | 39.75 | 3.224 | 3.228 | 3.221 | 3.245 | 3.203 | 3.226 | 3.249 | 3.216 | 3.215 | 3.224 | 3.214 | 3.225 | 3.225 | 3.225 | |
DCASE2023 baseline | 52 | 46.75 | 1.523 | 1.730 | 1.351 | 1.847 | 1.476 | 1.535 | 1.929 | 1.862 | 1.101 | 1.366 | 1.262 | 1.448 | 1.500 | 1.430 | ||
Park_KT_task1_1 | Kim2023 | 10 | 20.75 | 1.495 | 1.827 | 1.218 | 2.779 | 1.275 | 1.410 | 1.746 | 1.927 | 1.024 | 1.210 | 1.103 | 1.310 | 1.356 | 1.302 | |
Park_KT_task1_2 | Kim2023 | 29 | 30.75 | 1.660 | 2.056 | 1.329 | 2.983 | 1.509 | 1.699 | 1.941 | 2.150 | 1.190 | 1.315 | 1.198 | 1.454 | 1.396 | 1.423 | |
Park_KT_task1_3 | Kim2023 | 26 | 29.25 | 2.230 | 2.840 | 1.721 | 4.363 | 2.054 | 2.076 | 2.688 | 3.021 | 1.436 | 1.762 | 1.608 | 1.816 | 1.888 | 1.816 | |
Park_KT_task1_4 | Kim2023 | 20 | 27.00 | 1.510 | 1.573 | 1.457 | 1.614 | 1.496 | 1.530 | 1.642 | 1.581 | 1.276 | 1.526 | 1.422 | 1.488 | 1.561 | 1.468 | |
Schmid_CPJKU_task1_1 | Schmid2023 | 5 | 13.75 | 1.313 | 1.395 | 1.244 | 1.822 | 1.216 | 1.244 | 1.341 | 1.353 | 1.120 | 1.360 | 1.191 | 1.226 | 1.311 | 1.255 | |
Schmid_CPJKU_task1_2 | Schmid2023 | 1 | 5.25 | 1.256 | 1.445 | 1.099 | 1.833 | 1.188 | 1.267 | 1.502 | 1.437 | 0.962 | 1.202 | 1.075 | 1.128 | 1.123 | 1.101 | |
Schmid_CPJKU_task1_3 | Schmid2023 | 2 | 7.00 | 1.153 | 1.318 | 1.015 | 1.840 | 1.056 | 1.153 | 1.252 | 1.290 | 0.892 | 1.134 | 1.000 | 1.018 | 1.038 | 1.010 | |
Schmid_CPJKU_task1_4 | Schmid2023 | 3 | 11.50 | 1.117 | 1.334 | 0.936 | 1.902 | 1.044 | 1.045 | 1.310 | 1.366 | 0.847 | 1.001 | 0.919 | 0.949 | 0.935 | 0.966 | |
Schmidt_FAU_task1_1 | Schmidt2023 | 26 | 29.25 | 1.322 | 1.465 | 1.203 | 1.816 | 1.269 | 1.317 | 1.461 | 1.461 | 1.039 | 1.258 | 1.125 | 1.281 | 1.279 | 1.240 | |
Schmidt_FAU_task1_2 | Schmidt2023 | 12 | 21.50 | 1.337 | 1.488 | 1.210 | 1.972 | 1.322 | 1.321 | 1.405 | 1.420 | 1.038 | 1.292 | 1.140 | 1.274 | 1.277 | 1.241 | |
Schmidt_FAU_task1_3 | Schmidt2023 | 19 | 26.75 | 1.398 | 1.562 | 1.261 | 1.961 | 1.323 | 1.344 | 1.620 | 1.562 | 1.083 | 1.332 | 1.177 | 1.355 | 1.329 | 1.288 | |
Schmidt_FAU_task1_4 | Schmidt2023 | 24 | 28.75 | 1.482 | 1.650 | 1.342 | 1.946 | 1.407 | 1.434 | 1.785 | 1.677 | 1.143 | 1.390 | 1.314 | 1.357 | 1.451 | 1.395 | |
Tan_NTU_task1_1 | Tan2023 | 33 | 33.50 | 1.508 | 1.720 | 1.331 | 1.789 | 1.427 | 1.696 | 1.879 | 1.809 | 1.107 | 1.387 | 1.260 | 1.406 | 1.446 | 1.379 | |
Tan_NTU_task1_2 | Tan2023 | 35 | 34.00 | 1.461 | 1.634 | 1.317 | 1.706 | 1.405 | 1.630 | 1.758 | 1.671 | 1.097 | 1.387 | 1.245 | 1.400 | 1.425 | 1.346 | |
Tan_NTU_task1_3 | Tan2023 | 37 | 37.00 | 1.492 | 1.632 | 1.375 | 1.868 | 1.461 | 1.489 | 1.692 | 1.650 | 1.180 | 1.473 | 1.302 | 1.445 | 1.462 | 1.388 | |
Tan_SCUT_task1_1 | Tan2023a | 4 | 13.50 | 1.192 | 1.420 | 1.002 | 2.215 | 1.096 | 1.205 | 1.304 | 1.279 | 0.899 | 1.100 | 0.952 | 1.018 | 1.027 | 1.019 | |
Tan_SCUT_task1_2 | Tan2023a | 30 | 31.00 | 1.444 | 1.726 | 1.208 | 2.504 | 1.262 | 1.341 | 1.902 | 1.619 | 1.037 | 1.335 | 1.164 | 1.254 | 1.248 | 1.214 | |
Tan_SCUT_task1_3 | Tan2023a | 16 | 25.50 | 1.441 | 1.675 | 1.247 | 2.407 | 1.278 | 1.366 | 1.662 | 1.661 | 1.095 | 1.395 | 1.192 | 1.261 | 1.322 | 1.214 | |
Tan_SCUT_task1_4 | Tan2023a | 32 | 33.00 | 1.525 | 1.843 | 1.259 | 2.915 | 1.307 | 1.422 | 1.954 | 1.620 | 1.134 | 1.416 | 1.195 | 1.285 | 1.273 | 1.253 | |
Vo_DU_task1_1 | Vo2023 | 52 | 46.75 | 2.157 | 2.963 | 1.485 | 6.412 | 1.811 | 1.767 | 2.425 | 2.399 | 1.240 | 1.438 | 1.346 | 1.608 | 1.639 | 1.641 | |
Vo_DU_task1_2 | Vo2023 | 53 | 47.75 | 2.116 | 2.877 | 1.482 | 6.182 | 1.769 | 1.759 | 2.385 | 2.292 | 1.237 | 1.443 | 1.349 | 1.584 | 1.624 | 1.652 | |
Vo_DU_task1_3 | Vo2023 | 50 | 46.25 | 2.092 | 2.827 | 1.479 | 5.800 | 1.808 | 1.747 | 2.280 | 2.501 | 1.228 | 1.432 | 1.318 | 1.600 | 1.651 | 1.647 | |
Vo_DU_task1_4 | Vo2023 | 49 | 45.75 | 1.793 | 2.169 | 1.479 | 2.920 | 1.752 | 1.705 | 2.256 | 2.214 | 1.276 | 1.484 | 1.376 | 1.590 | 1.592 | 1.556 | |
Wang_SCUT_task1_1 | Wang2023 | 31 | 31.50 | 1.493 | 1.823 | 1.218 | 2.972 | 1.326 | 1.431 | 1.818 | 1.569 | 1.056 | 1.320 | 1.165 | 1.228 | 1.263 | 1.277 | |
Wang_SCUT_task1_2 | Wang2023 | 18 | 26.50 | 1.348 | 1.510 | 1.214 | 1.798 | 1.264 | 1.302 | 1.632 | 1.553 | 1.086 | 1.309 | 1.168 | 1.241 | 1.235 | 1.243 | |
Wang_SCUT_task1_3 | Wang2023 | 46 | 43.50 | 1.702 | 2.228 | 1.263 | 4.002 | 1.475 | 1.499 | 1.759 | 2.405 | 1.039 | 1.276 | 1.159 | 1.467 | 1.274 | 1.366 | |
Wang_SCUT_task1_4 | Wang2023 | 48 | 45.00 | 1.472 | 1.654 | 1.321 | 1.966 | 1.384 | 1.415 | 1.845 | 1.657 | 1.126 | 1.395 | 1.309 | 1.341 | 1.381 | 1.376 | |
XuQianHu_BIT&NUDT_task1_1 | Yu2023 | 45 | 40.50 | 1.364 | 1.450 | 1.292 | 1.630 | 1.351 | 1.345 | 1.455 | 1.469 | 1.220 | 1.297 | 1.248 | 1.317 | 1.363 | 1.307 | |
XuQianHu_BIT&NUDT_task1_2 | Yu2023 | 44 | 40.00 | 1.355 | 1.455 | 1.271 | 1.553 | 1.380 | 1.347 | 1.487 | 1.510 | 1.222 | 1.255 | 1.209 | 1.300 | 1.352 | 1.291 | |
XuQianHu_BIT&NUDT_task1_3 | Yu2023 | 41 | 39.25 | 1.395 | 1.509 | 1.301 | 1.654 | 1.418 | 1.391 | 1.551 | 1.528 | 1.225 | 1.287 | 1.239 | 1.340 | 1.385 | 1.329 | |
XuQianHu_BIT&NUDT_task1_4 | Yu2023 | 36 | 35.50 | 1.367 | 1.450 | 1.298 | 1.575 | 1.363 | 1.371 | 1.442 | 1.501 | 1.215 | 1.281 | 1.224 | 1.331 | 1.408 | 1.331 | |
Yang_GZHU_task1_1 | Weng2023 | 15 | 24.75 | 1.280 | 1.409 | 1.172 | 1.705 | 1.169 | 1.276 | 1.423 | 1.472 | 0.999 | 1.254 | 1.084 | 1.213 | 1.234 | 1.248 | |
Yang_GZHU_task1_2 | Weng2023 | 14 | 22.25 | 1.241 | 1.352 | 1.149 | 1.661 | 1.141 | 1.206 | 1.348 | 1.403 | 1.001 | 1.213 | 1.077 | 1.184 | 1.207 | 1.215 | |
Yang_GZHU_task1_3 | Weng2023 | 21 | 27.25 | 1.279 | 1.406 | 1.172 | 1.941 | 1.188 | 1.204 | 1.272 | 1.426 | 1.017 | 1.272 | 1.087 | 1.175 | 1.236 | 1.246 | |
Yang_GZHU_task1_4 | Weng2023 | 19 | 26.75 | 1.259 | 1.364 | 1.172 | 1.794 | 1.180 | 1.189 | 1.260 | 1.396 | 1.031 | 1.241 | 1.097 | 1.178 | 1.243 | 1.243 | |
Zhang_NCUT_task1_1 | Zhang2023 | 49 | 45.75 | 1.757 | 2.083 | 1.485 | 2.026 | 1.802 | 1.898 | 2.387 | 2.301 | 1.178 | 1.376 | 1.295 | 1.655 | 1.661 | 1.743 | |
Zhang_NCUT_task1_2 | Zhang2023 | 51 | 46.50 | 1.533 | 1.813 | 1.300 | 1.788 | 1.506 | 1.501 | 2.399 | 1.869 | 1.087 | 1.275 | 1.127 | 1.455 | 1.426 | 1.431 | |
Zhang_SATLab_task1_1 | Bing2023 | 26 | 29.25 | 3.248 | 5.085 | 1.717 | 9.657 | 2.122 | 3.929 | 7.783 | 1.935 | 1.724 | 1.700 | 1.609 | 1.798 | 1.808 | 1.662 | |
Zhang_SATLab_task1_2 | Bing2023 | 34 | 33.75 | 4.213 | 6.653 | 2.179 | 12.855 | 2.905 | 3.678 | 10.376 | 3.450 | 2.070 | 2.147 | 2.139 | 2.077 | 2.308 | 2.335 | |
Zhang_SATLab_task1_3 | Bing2023 | 27 | 30.00 | 1.704 | 2.237 | 1.261 | 4.419 | 1.475 | 1.685 | 2.116 | 1.487 | 1.139 | 1.288 | 1.124 | 1.332 | 1.352 | 1.329 | |
Zhang_SATLab_task1_4 | Bing2023 | 22 | 27.50 | 1.542 | 1.802 | 1.325 | 2.165 | 1.581 | 1.735 | 2.027 | 1.504 | 1.145 | 1.385 | 1.206 | 1.415 | 1.408 | 1.392 |
System characteristics
General characteristics
Rank | Submission label |
Technical Report |
Official system rank |
Rank value |
Accuracy (Eval) |
Logloss (Eval) |
Sampling rate |
Data augmentation |
Features | Embeddings |
---|---|---|---|---|---|---|---|---|---|---|
AI4EDGE_IPL_task1_1 | Almeida2023 | 28 | 30.25 | 51.9 | 1.920 | 8kHz | pitch shifting, time stretching, mixup, time masking, frequency masking | log-mel energies | ||
AI4EDGE_IPL_task1_2 | Almeida2023 | 47 | 43.75 | 48.8 | 1.996 | 8kHz | pitch shifting, time stretching, mixup, time masking, frequency masking | log-mel energies | ||
AI4EDGE_IPL_task1_3 | Almeida2023 | 38 | 37.25 | 50.8 | 1.364 | 8kHz | pitch shifting, time stretching, mixup, time masking, frequency masking | log-mel energies | ||
Bai_JLESS_task1_1 | Du2023 | 54 | 50.25 | 47.9 | 1.825 | 44.1kHz | FMix | log-mel energies | ||
Bai_JLESS_task1_2 | Du2023 | 47 | 43.75 | 48.8 | 1.791 | 44.1kHz | FMix | log-mel energies | ||
Cai_TENCENT_task1_1 | Cai2023 | 17 | 25.75 | 56.6 | 1.174 | 32kHz | mixup, mixstyle, | log-mel energies | ||
Cai_TENCENT_task1_2 | Cai2023 | 11 | 21.25 | 56.2 | 1.246 | 32kHz | mixup, mixstyle, | log-mel energies | ||
Cai_TENCENT_task1_3 | Cai2023 | 14 | 22.25 | 55.8 | 1.241 | 32kHz | mixup, mixstyle | log-mel energies | ||
Cai_TENCENT_task1_4 | Cai2023 | 11 | 21.25 | 55.4 | 1.252 | 32kHz | mixup, mixstyle | log-mel energies | ||
Cai_XJTLU_task1_1 | Cai2023a | 9 | 19.75 | 51.9 | 1.307 | 32kHz | device simulation, mixup, mixstyle | log-mel energies | ||
Cai_XJTLU_task1_2 | Cai2023a | 8 | 18.25 | 52.5 | 1.292 | 32kHz | device simulation, mixup, mixstyle | log-mel energies | ||
Cai_XJTLU_task1_3 | Cai2023a | 6 | 14.00 | 55.1 | 1.223 | 32kHz | device simulation, mixup, mixstyle | log-mel energies | ||
Cai_XJTLU_task1_4 | Cai2023a | 3 | 11.50 | 57.0 | 1.241 | 32kHz | device simulation, mixup, mixstyle | log-mel energies | ||
Fei_vv_task1_1 | Fei2023 | 15 | 24.75 | 55.2 | 1.282 | 16kHz | mixup | log-mel energies | ||
Fei_vv_task1_2 | Fei2023 | 13 | 21.75 | 54.5 | 1.290 | 16kHz | mixup | log-mel energies | ||
Fei_vv_task1_3 | Fei2023 | 26 | 29.25 | 53.2 | 1.349 | 16kHz | mixup, time stretching | log-mel energies | ||
Fei_vv_task1_4 | Fei2023 | 23 | 28.25 | 51.8 | 1.370 | 16kHz | mixup, time stretching | log-mel energies | ||
Han_SZU_task1_1 | Han2023 | 39 | 37.75 | 50.5 | 2.012 | 44.1kHz | mixup, time masking, frequency masking, pitch shifting, random noise | log-mel energies, spectral envelope, spectrum fine structure | ||
LAM_AEV_task1_1 | Pham2023 | 25 | 29.00 | 55.3 | 1.847 | 44.1kHz | mixup | CQT, Gammatonegram, Mel | ||
LAM_AEV_task1_2 | Pham2023 | 31 | 31.50 | 55.0 | 2.083 | 44.1kHz | mixup | CQT, Gammatonegram, Mel | ||
LAM_AEV_task1_3 | Pham2023 | 20 | 27.00 | 55.6 | 1.933 | 44.1kHz | mixup | CQT, Gammatonegram, Mel | ||
Liang_NTES_task1_1 | Liang2023 | 40 | 38.50 | 52.6 | 1.402 | 44.1kHz | random cropping, SpecAugment, mixup | log-mel energies | ||
MALACH23_JKU_task1_1 | Pichler2023 | 8 | 18.25 | 57.0 | 1.230 | 32kHz | mixstyle | log-mel energies | ||
MALACH23_JKU_task1_2 | Pichler2023 | 7 | 16.75 | 56.6 | 1.242 | 32kHz | mixstyle | log-mel energies | ||
MALACH23_JKU_task1_3 | Pichler2023 | 42 | 39.50 | 9.9 | 4.354 | 32kHz | mixup | log-mel energies | ||
MALACH23_JKU_task1_4 | Pichler2023 | 43 | 39.75 | 9.8 | 3.224 | 32kHz | mixup | log-mel energies | ||
DCASE2023 baseline | 52 | 46.75 | 44.8 | 1.523 | 44.1kHz | log-mel energies | ||||
Park_KT_task1_1 | Kim2023 | 10 | 20.75 | 56.3 | 1.495 | 16kHz | mixup, frequency masking, temporal masking | log-mel energies | ||
Park_KT_task1_2 | Kim2023 | 29 | 30.75 | 53.0 | 1.660 | 16kHz | mixup, frequency masking, temporal masking | log-mel energies | ||
Park_KT_task1_3 | Kim2023 | 26 | 29.25 | 54.2 | 2.230 | 16kHz | mixup, frequency masking, temporal masking | log-mel energies | ||
Park_KT_task1_4 | Kim2023 | 20 | 27.00 | 49.2 | 1.510 | 16kHz | mixup, frequency masking, temporal masking | log-mel energies | ||
Schmid_CPJKU_task1_1 | Schmid2023 | 5 | 13.75 | 54.4 | 1.313 | 32kHz | device impulse response augmentation, mixup, freq-mixstyle, pitch shifting | log-mel energies | ||
Schmid_CPJKU_task1_2 | Schmid2023 | 1 | 5.25 | 58.7 | 1.256 | 32kHz | device impulse response augmentation, mixup, freq-mixstyle, pitch shifting | log-mel energies | ||
Schmid_CPJKU_task1_3 | Schmid2023 | 2 | 7.00 | 61.4 | 1.153 | 32kHz | device impulse response augmentation, mixup, freq-mixstyle, pitch shifting | log-mel energies | ||
Schmid_CPJKU_task1_4 | Schmid2023 | 3 | 11.50 | 62.7 | 1.117 | 32kHz | device impulse response augmentation, mixup, freq-mixstyle, pitch shifting | log-mel energies | ||
Schmidt_FAU_task1_1 | Schmidt2023 | 26 | 29.25 | 55.7 | 1.322 | 32kHz | random cutoff, mixstyle, pitch shifting | log-mel energies | ||
Schmidt_FAU_task1_2 | Schmidt2023 | 12 | 21.50 | 55.6 | 1.337 | 32kHz | random cutoff, mixstyle, pitch shifting | log-mel energies | ||
Schmidt_FAU_task1_3 | Schmidt2023 | 19 | 26.75 | 52.7 | 1.398 | 32kHz | random cutoff, mixstyle, pitch shifting | log-mel energies | ||
Schmidt_FAU_task1_4 | Schmidt2023 | 24 | 28.75 | 49.7 | 1.482 | 32kHz | random cutoff, mixstyle, pitch shifting | log-mel energies | ||
Tan_NTU_task1_1 | Tan2023 | 33 | 33.50 | 47.1 | 1.508 | 44.1kHz | log-mel energies | |||
Tan_NTU_task1_2 | Tan2023 | 35 | 34.00 | 48.5 | 1.461 | 44.1kHz | log-mel energies | |||
Tan_NTU_task1_3 | Tan2023 | 37 | 37.00 | 46.3 | 1.492 | 44.1kHz | log-mel energies | |||
Tan_SCUT_task1_1 | Tan2023a | 4 | 13.50 | 60.8 | 1.192 | 44.1kHz | mel-spectrogram | |||
Tan_SCUT_task1_2 | Tan2023a | 30 | 31.00 | 51.7 | 1.444 | 44.1kHz | mel-spectrogram | |||
Tan_SCUT_task1_3 | Tan2023a | 16 | 25.50 | 53.5 | 1.441 | 44.1kHz | mel-spectrogram | |||
Tan_SCUT_task1_4 | Tan2023a | 32 | 33.00 | 50.9 | 1.525 | 44.1kHz | mel-spectrogram | |||
Vo_DU_task1_1 | Vo2023 | 52 | 46.75 | 45.0 | 2.157 | 44.1kHz | mixup, SpecAugment | log-mel spectrogram | ||
Vo_DU_task1_2 | Vo2023 | 53 | 47.75 | 44.8 | 2.116 | 44.1kHz | mixup, SpecAugment | log-mel spectrogram | ||
Vo_DU_task1_3 | Vo2023 | 50 | 46.25 | 45.2 | 2.092 | 44.1kHz | mixup, SpecAugment | log-mel spectrogram | ||
Vo_DU_task1_4 | Vo2023 | 49 | 45.75 | 45.5 | 1.793 | 44.1kHz | mixup, SpecAugment | log-mel spectrogram | ||
Wang_SCUT_task1_1 | Wang2023 | 31 | 31.50 | 49.1 | 1.493 | 44.1kHz | mixup,speculation and spectral modulation | log-mel energies | ||
Wang_SCUT_task1_2 | Wang2023 | 18 | 26.50 | 52.9 | 1.348 | 44.1kHz | mixup,speculation and spectral modulation | log-mel energies | ||
Wang_SCUT_task1_3 | Wang2023 | 46 | 43.50 | 47.1 | 1.702 | 44.1kHz | mixup,speculation and spectral modulation | log-mel energies | ||
Wang_SCUT_task1_4 | Wang2023 | 48 | 45.00 | 48.5 | 1.472 | 44.1kHz | mixup,speculation and spectral modulation | log-mel energies | ||
XuQianHu_BIT&NUDT_task1_1 | Yu2023 | 45 | 40.50 | 51.0 | 1.364 | 32kHz | timerolling, pitch shifting, gaussian noise, specaugment, mixup, mixstyle | log-mel energies | ||
XuQianHu_BIT&NUDT_task1_2 | Yu2023 | 44 | 40.00 | 51.6 | 1.355 | 32kHz | timerolling, pitch shifting, gaussian noise, specaugment, mixup, mixstyle | log-mel energies | ||
XuQianHu_BIT&NUDT_task1_3 | Yu2023 | 41 | 39.25 | 50.0 | 1.395 | 32kHz | timerolling, pitch shifting, gaussian noise, specaugment, mixup, mixstyle | log-mel energies | ||
XuQianHu_BIT&NUDT_task1_4 | Yu2023 | 36 | 35.50 | 51.1 | 1.367 | 32kHz | timerolling, pitch shifting, gaussian noise, specaugment, mixup, mixstyle | log-mel energies | ||
Yang_GZHU_task1_1 | Weng2023 | 15 | 24.75 | 55.5 | 1.280 | 32kHz | Conv_IR, time shifting, time-frequency masking, mixstyle | log-mel energies | ||
Yang_GZHU_task1_2 | Weng2023 | 14 | 22.25 | 55.9 | 1.241 | 32kHz | Conv_IR, time shifting, time-frequency masking, mixstyle | log-mel energies | ||
Yang_GZHU_task1_3 | Weng2023 | 21 | 27.25 | 55.1 | 1.279 | 32kHz | Conv_IR, time shifting, time-frequency masking, mixstyle | log-mel energies | ||
Yang_GZHU_task1_4 | Weng2023 | 19 | 26.75 | 55.2 | 1.259 | 32kHz | Conv_IR, time shifting, time-frequency masking, mixstyle | log-mel energies | ||
Zhang_NCUT_task1_1 | Zhang2023 | 49 | 45.75 | 43.3 | 1.757 | 44.1kHz | mixup, specaugment | log-mel energies,delta and delta-delta | ||
Zhang_NCUT_task1_2 | Zhang2023 | 51 | 46.50 | 47.9 | 1.533 | 44.1kHz | mixup, specaugment | log-mel energies,delta and delta-delta | ||
Zhang_SATLab_task1_1 | Bing2023 | 26 | 29.25 | 48.8 | 3.248 | 44.1kHz | mixup, temporal crop, time masking, frequency masking | log-mel energies | ||
Zhang_SATLab_task1_2 | Bing2023 | 34 | 33.75 | 50.0 | 4.213 | 44.1kHz | mixup, temporal crop, time masking, frequency masking | log-mel energies | ||
Zhang_SATLab_task1_3 | Bing2023 | 27 | 30.00 | 51.9 | 1.704 | 44.1kHz | mixup, temporal crop, time masking, frequency masking | log-mel energies | ||
Zhang_SATLab_task1_4 | Bing2023 | 22 | 27.50 | 50.3 | 1.542 | 44.1kHz | mixup, temporal crop, time masking, frequency masking | log-mel energies |
Machine learning characteristics
Rank | Code |
Technical Report |
Official system rank |
Rank value |
Accuracy (Eval) |
Logloss (Eval) |
External data usage |
External data sources |
Model complexity |
Model MACS |
Classifier |
Ensemble subsystems |
Decision making |
Framework | Pipeline |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AI4EDGE_IPL_task1_1 | Almeida2023 | 28 | 30.25 | 51.9 | 1.920 | 52852 | 25475456 | CNN, ensemble | 10 | keras/tensorflow | pretraining, training, adaptation, knowledge distillation, weight quantization | ||||
AI4EDGE_IPL_task1_2 | Almeida2023 | 47 | 43.75 | 48.8 | 1.996 | 68996 | 29304736 | CNN, ensemble | 10 | keras/tensorflow | pretraining, training, adaptation, knowledge distillation, weight quantization | ||||
AI4EDGE_IPL_task1_3 | Almeida2023 | 38 | 37.25 | 50.8 | 1.364 | 65192 | 26711936 | CNN, ensemble | 10 | keras/tensorflow | pretraining, training, adaptation, knowledge distillation, weight quantization | ||||
Bai_JLESS_task1_1 | Du2023 | 54 | 50.25 | 47.9 | 1.825 | 78252 | 27931612 | CNN,ResNet,Transformer,CBAM | pytorch | pretraining, training, adaptation, pruning, weight quantization | |||||
Bai_JLESS_task1_2 | Du2023 | 47 | 43.75 | 48.8 | 1.791 | 60458 | 14130372 | CNN,ResNet,Transformer,CBAM | pytorch | pretraining, training, adaptation, pruning, weight quantization | |||||
Cai_TENCENT_task1_1 | Cai2023 | 17 | 25.75 | 56.6 | 1.174 | pre-trained model | Audioset | 127684 | 28840396 | CNN | pytorch | pretraining, teacher model trainining, student model training, weight quantization | |||
Cai_TENCENT_task1_2 | Cai2023 | 11 | 21.25 | 56.2 | 1.246 | pre-trained model | Audioset | 79942 | 21990724 | CNN | pytorch | pretraining, teacher model trainining, student model training, weight quantization | |||
Cai_TENCENT_task1_3 | Cai2023 | 14 | 22.25 | 55.8 | 1.241 | pre-trained model | Audioset | 79942 | 21990724 | CNN | pytorch | pretraining, teacher model trainining, student model training, weight quantization | |||
Cai_TENCENT_task1_4 | Cai2023 | 11 | 21.25 | 55.4 | 1.252 | pre-trained model | Audioset | 63558 | 19533124 | CNN | pytorch | pretraining, teacher model trainining, student model training, weight quantization | |||
Cai_XJTLU_task1_1 | Cai2023a | 9 | 19.75 | 51.9 | 1.307 | device simulation | MicIRP | 6828 | 1649349 | CNN, TF-SepNet | pytorch | training, weight quantization | |||
Cai_XJTLU_task1_2 | Cai2023a | 8 | 18.25 | 52.5 | 1.292 | device simulation | MicIRP | 6828 | 1649349 | CNN, TF-SepNet | pytorch | training, weight quantization | |||
Cai_XJTLU_task1_3 | Cai2023a | 6 | 14.00 | 55.1 | 1.223 | device simulation | MicIRP | 15890 | 3424245 | CNN, TF-SepNet | pytorch | training, weight quantization | |||
Cai_XJTLU_task1_4 | Cai2023a | 3 | 11.50 | 57.0 | 1.241 | device simulation | MicIRP | 54260 | 10219540 | CNN, TF-SepNet | pytorch | training, weight quantization | |||
Fei_vv_task1_1 | Fei2023 | 15 | 24.75 | 55.2 | 1.282 | pre-trained model | 123636 | 13402932 | SERFR-CNN-32 | pytorch | pretraining, training, adaptation, pruning, weight quantization | ||||
Fei_vv_task1_2 | Fei2023 | 13 | 21.75 | 54.5 | 1.290 | pre-trained model | 70588 | 7802348 | SERFR-CNN-24 | pytorch | pretraining, training, adaptation, pruning, weight quantization | ||||
Fei_vv_task1_3 | Fei2023 | 26 | 29.25 | 53.2 | 1.349 | pre-trained model | 123636 | 13402932 | SERFR-CNN-32 | pytorch | pretraining, training, adaptation, pruning, weight quantization | ||||
Fei_vv_task1_4 | Fei2023 | 23 | 28.25 | 51.8 | 1.370 | pre-trained model | 70588 | 7802348 | SERFR-CNN-24 | pytorch | pretraining, training, adaptation, pruning, weight quantization | ||||
Han_SZU_task1_1 | Han2023 | 39 | 37.75 | 50.5 | 2.012 | 80845 | 29.349M | CNN | keras/tensorflow | cepstrum analysis, extract features, train teacher_network, knowledge distillation, train student_network | |||||
LAM_AEV_task1_1 | Pham2023 | 25 | 29.00 | 55.3 | 1.847 | 22962 | 29267550 | CNN | 3 | Late fusion of predicted probabilities | keras/tensorflow | training | |||
LAM_AEV_task1_2 | Pham2023 | 31 | 31.50 | 55.0 | 2.083 | 22962 | 29267550 | CNN | 3 | Late fusion of predicted probabilities | keras/tensorflow | training | |||
LAM_AEV_task1_3 | Pham2023 | 20 | 27.00 | 55.6 | 1.933 | 22962 | 29267550 | CNN | 3 | Late fusion of predicted probabilities | keras/tensorflow | training | |||
Liang_NTES_task1_1 | Liang2023 | 40 | 38.50 | 52.6 | 1.402 | 31260 | 29591778 | CNN | pytorch | training teacher, training student | |||||
MALACH23_JKU_task1_1 | Pichler2023 | 8 | 18.25 | 57.0 | 1.230 | 59804 | 14686940 | CNN | maximum likelihood | pytorch | mel spectrogram, mixstyle, training | ||||
MALACH23_JKU_task1_2 | Pichler2023 | 7 | 16.75 | 56.6 | 1.242 | 43580 | 10819292 | CNN | maximum likelihood | pytorch | mel spectrogram, mixstyle, training | ||||
MALACH23_JKU_task1_3 | Pichler2023 | 42 | 39.50 | 9.9 | 4.354 | 116648 | 572340 | S4 | maximum likelihood | pytorch | mel spectrogram, mixup, training | ||||
MALACH23_JKU_task1_4 | Pichler2023 | 43 | 39.75 | 9.8 | 3.224 | 15994 | 214420 | S4 | maximum likelihood | pytorch | mel spectrogram, mixup, training | ||||
DCASE2023 baseline | 52 | 46.75 | 44.8 | 1.523 | embeddings | 46512 | 29234920 | CNN | keras/tensorflow | pretraining, training, adaptation, pruning, weight quantization | |||||
Park_KT_task1_1 | Kim2023 | 10 | 20.75 | 56.3 | 1.495 | embeddings | 92070 | 19556096 | BCRes2Net | maximum likelihood | pytorch | training, weight quantization | |||
Park_KT_task1_2 | Kim2023 | 29 | 30.75 | 53.0 | 1.660 | embeddings | 92070 | 19556096 | BCRes2Net | maximum likelihood | pytorch | training, weight quantization | |||
Park_KT_task1_3 | Kim2023 | 26 | 29.25 | 54.2 | 2.230 | embeddings | 92070 | 19556096 | BCRes2Net | maximum likelihood | pytorch | training, weight quantization | |||
Park_KT_task1_4 | Kim2023 | 20 | 27.00 | 49.2 | 1.510 | embeddings | 20516 | 617000 | BCRes2Net | maximum likelihood | pytorch | training, weight quantization | |||
Schmid_CPJKU_task1_1 | Schmid2023 | 5 | 13.75 | 54.4 | 1.313 | pre-trained model | PaSST, MicIRP | 5722 | 1582336 | RF-regularized CNNs, PaSST transformer | pytorch | train teachers, ensemble teacher logits, train student using knowledge distillation, quantization-aware training | |||
Schmid_CPJKU_task1_2 | Schmid2023 | 1 | 5.25 | 58.7 | 1.256 | pre-trained model | PaSST, MicIRP | 12310 | 4354304 | RF-regularized CNNs, PaSST transformer | pytorch | train teachers, ensemble teacher logits, train student using knowledge distillation, quantization-aware training | |||
Schmid_CPJKU_task1_3 | Schmid2023 | 2 | 7.00 | 61.4 | 1.153 | pre-trained model | PaSST, MicIRP | 30106 | 9638144 | RF-regularized CNNs, PaSST transformer | pytorch | train teachers, ensemble teacher logits, train student using knowledge distillation, quantization-aware training | |||
Schmid_CPJKU_task1_4 | Schmid2023 | 3 | 11.50 | 62.7 | 1.117 | pre-trained model | PaSST, MicIRP | 54182 | 16803072 | RF-regularized CNNs, PaSST transformer | pytorch | train teachers, ensemble teacher logits, train student using knowledge distillation, pruning, quantization-aware training | |||
Schmidt_FAU_task1_1 | Schmidt2023 | 26 | 29.25 | 55.7 | 1.322 | pre-trained model | 127988 | 28931380 | ensemble, RF-regularized CNNs, PaSST transformer | 8 | generalized mean | pytorch | pretraining, training, adaptation, pruning, weight quantization | ||
Schmidt_FAU_task1_2 | Schmidt2023 | 12 | 21.50 | 55.6 | 1.337 | pre-trained model | 68456 | 19910080 | ensemble, RF-regularized CNNs, PaSST transformer | 8 | weighted generalized mean | pytorch | pretraining, training, adaptation, pruning, weight quantization | ||
Schmidt_FAU_task1_3 | Schmidt2023 | 19 | 26.75 | 52.7 | 1.398 | pre-trained model | 74700 | 9996775 | ensemble, RF-regularized CNNs, PaSST transformer | 8 | generalized mean | pytorch | pretraining, training, adaptation, pruning, weight quantization | ||
Schmidt_FAU_task1_4 | Schmidt2023 | 24 | 28.75 | 49.7 | 1.482 | pre-trained model | 34616 | 4938255 | ensemble, RF-regularized CNNs, PaSST transformer | 8 | generalized mean | pytorch | pretraining, training, adaptation, pruning, weight quantization | ||
Tan_NTU_task1_1 | Tan2023 | 33 | 33.50 | 47.1 | 1.508 | 37434 | 2960384 | CNN | keras/tensorflow | training, weight quantization | |||||
Tan_NTU_task1_2 | Tan2023 | 35 | 34.00 | 48.5 | 1.461 | 54242 | 6462656 | CNN | keras/tensorflow | training, weight quantization | |||||
Tan_NTU_task1_3 | Tan2023 | 37 | 37.00 | 46.3 | 1.492 | 54242 | 6462656 | CNN | keras/tensorflow | training, weight quantization | |||||
Tan_SCUT_task1_1 | Tan2023a | 4 | 13.50 | 60.8 | 1.192 | embeddings | 73386 | 13180000 | CNN | pytorch | training teacher, training student, knowledge distillation, weight quantization | ||||
Tan_SCUT_task1_2 | Tan2023a | 30 | 31.00 | 51.7 | 1.444 | embeddings | 73386 | 13180000 | CNN | pytorch | training teacher, training student, knowledge distillation, weight quantization | ||||
Tan_SCUT_task1_3 | Tan2023a | 16 | 25.50 | 53.5 | 1.441 | embeddings | 73386 | 13180000 | CNN | pytorch | training teacher, training student, knowledge distillation, weight quantization | ||||
Tan_SCUT_task1_4 | Tan2023a | 32 | 33.00 | 50.9 | 1.525 | embeddings | 73386 | 13180000 | CNN | pytorch | training teacher, training student, knowledge distillation, weight quantization | ||||
Vo_DU_task1_1 | Vo2023 | 52 | 46.75 | 45.0 | 2.157 | pre-trained model | 119526 | 15600000 | CNN, Transformer, Knowledge Distillation | pytorch | pretraining, training, adaptation, weight quantization | ||||
Vo_DU_task1_2 | Vo2023 | 53 | 47.75 | 44.8 | 2.116 | pre-trained model | 119526 | 15600000 | CNN, Transformer, Knowledge Distillation | pytorch | pretraining, training, adaptation, weight quantization | ||||
Vo_DU_task1_3 | Vo2023 | 50 | 46.25 | 45.2 | 2.092 | pre-trained model | 119526 | 15600000 | CNN, Transformer, Knowledge Distillation | pytorch | pretraining, training, adaptation, weight quantization | ||||
Vo_DU_task1_4 | Vo2023 | 49 | 45.75 | 45.5 | 1.793 | pre-trained model | 119526 | 15600000 | CNN, Transformer, Knowledge Distillation | pytorch | pretraining, training, adaptation, weight quantization | ||||
Wang_SCUT_task1_1 | Wang2023 | 31 | 31.50 | 49.1 | 1.493 | 45164 | 8646000 | CNN | pytorch | training, adaptation, pruning, weight quantization | |||||
Wang_SCUT_task1_2 | Wang2023 | 18 | 26.50 | 52.9 | 1.348 | 56172 | 16746000 | CNN | pytorch | training, adaptation, pruning, weight quantization | |||||
Wang_SCUT_task1_3 | Wang2023 | 46 | 43.50 | 47.1 | 1.702 | 56556 | 25442000 | CNN | pytorch | training, adaptation, pruning, weight quantization | |||||
Wang_SCUT_task1_4 | Wang2023 | 48 | 45.00 | 48.5 | 1.472 | 121812 | 20902000 | CNN | pytorch | training, adaptation, pruning, weight quantization | |||||
XuQianHu_BIT&NUDT_task1_1 | Yu2023 | 45 | 40.50 | 51.0 | 1.364 | pre-trained model | 52288 | 23803968 | CNN + Transformer | pytorch | pretraining, training, adaptation, distillation, weight quantization | ||||
XuQianHu_BIT&NUDT_task1_2 | Yu2023 | 44 | 40.00 | 51.6 | 1.355 | pre-trained model | 51648 | 28400320 | CNN + Transformer | pytorch | pretraining, training, adaptation, distillation, weight quantization | ||||
XuQianHu_BIT&NUDT_task1_3 | Yu2023 | 41 | 39.25 | 50.0 | 1.395 | pre-trained model | 57392 | 13402688 | CNN + Transformer | pytorch | pretraining, training, adaptation, distillation, weight quantization | ||||
XuQianHu_BIT&NUDT_task1_4 | Yu2023 | 36 | 35.50 | 51.1 | 1.367 | pre-trained model | 66114 | 11878580 | CNN + Transformer | pytorch | pretraining, training, adaptation, distillation, weight quantization | ||||
Yang_GZHU_task1_1 | Weng2023 | 15 | 24.75 | 55.5 | 1.280 | pre-trained model, convolution with IRs from MicIRP | Microphone Impulse Response Project | 76906 | 23970000 | CNN | pytorch | pretraining, DML training, KD fine-tuning, weight quantization | |||
Yang_GZHU_task1_2 | Weng2023 | 14 | 22.25 | 55.9 | 1.241 | pre-trained model, convolution with IRs from MicIRP | Microphone Impulse Response Project | 76906 | 23970000 | CNN | average | pytorch | pretraining, DML training, KD fine-tuning, weight quantization | ||
Yang_GZHU_task1_3 | Weng2023 | 21 | 27.25 | 55.1 | 1.279 | pre-trained model, convolution with IRs from MicIRP | Microphone Impulse Response Project | 76906 | 23970000 | CNN | pytorch | pretraining, DML training, weight quantization | |||
Yang_GZHU_task1_4 | Weng2023 | 19 | 26.75 | 55.2 | 1.259 | pre-trained model, convolution with IRs from MicIRP | Microphone Impulse Response Project | 76906 | 23970000 | CNN | average | pytorch | pretraining, DML training, weight quantization | ||
Zhang_NCUT_task1_1 | Zhang2023 | 49 | 45.75 | 43.3 | 1.757 | TAU Urban Acoustic Scenes 2022 Mobile, Development dataset | 123648 | 7375000 | GhostNet | average | keras/tensorflow | training, weight quantization | |||
Zhang_NCUT_task1_2 | Zhang2023 | 51 | 46.50 | 47.9 | 1.533 | TAU Urban Acoustic Scenes 2022 Mobile, Development dataset | 76224 | 28461000 | FHR_Mobilenet | average | keras/tensorflow | training, weight quantization | |||
Zhang_SATLab_task1_1 | Bing2023 | 26 | 29.25 | 48.8 | 3.248 | TAU Urban Acoustic Scenes 2022 Mobile, Development dataset | 7946 | 3972096 | MobileNet | keras/tensorflow | training, weight quantization | ||||
Zhang_SATLab_task1_2 | Bing2023 | 34 | 33.75 | 50.0 | 4.213 | TAU Urban Acoustic Scenes 2022 Mobile, Development dataset | 46232 | 19466240 | mini-SegNet | keras/tensorflow | training, pruning, retraining, weight quantization | ||||
Zhang_SATLab_task1_3 | Bing2023 | 27 | 30.00 | 51.9 | 1.704 | TAU Urban Acoustic Scenes 2022 Mobile, Development dataset | 54178 | 23438336 | ensemble, MobileNet, mini-SegNet | 2 | keras/tensorflow | training, pruning, weight quantization, ensemble | |||
Zhang_SATLab_task1_4 | Bing2023 | 22 | 27.50 | 50.3 | 1.542 | TAU Urban Acoustic Scenes 2022 Mobile, Development dataset | 15892 | 7944192 | MobileNet, ensemble | 2 | keras/tensorflow | training, weight quantization, ensemble |
Technical reports
Ai4edgept Submission to DCASE 2023 Low Complexity Acoustic Scene Classification Task1
Carlos Almeida1, Piovesan Federico2, Luis Bento1 and Mónica Figueiredo1
1Eletrotechnical Engineering, Instituto Politécnico de Leiria, Leiria, Portugal, 2Eletrotechnical Engineering, Politecnico di Torino, Torino, Italy
AI4EDGE_IPL_task1_1 AI4EDGE_IPL_task1_2 AI4EDGE_IPL_task1_3
Ai4edgept Submission to DCASE 2023 Low Complexity Acoustic Scene Classification Task1
Carlos Almeida1, Piovesan Federico2, Luis Bento1 and Mónica Figueiredo1
1Eletrotechnical Engineering, Instituto Politécnico de Leiria, Leiria, Portugal, 2Eletrotechnical Engineering, Politecnico di Torino, Torino, Italy
Abstract
The DCASE task 1 challenge aims to classify acoustic scenes using devices with low computational power and memory. The DCASE2023 challenge gives further importance to the size and multiply-accumulate operation count (MAC), this report aims to describe the submission to this challenge, following our research group’s previous work in this field, and the model submitted to DCASE 2022. We use a one-versus-all ten-network ensemble model and propose a knowledge distillation custom method to reduce model complexity. The ensemble model is used as the teacher network, distilling knowledge to the student. The student has 3 variations, the first model is a tuned version of the DCASE2022 baseline architecture, for the second model a slightly larger version of the first model and for the third model a larger version of the second model using structured pruning to further reduce model complexity. Data preprocessing is also conducted in order to further improve performance. Results show that the proposed knowledge distillation methods were able to improve the accuracy significantly.
System characteristics
Sampling rate | 8kHz |
Data augmentation | pitch shifting, time stretching, mixup, time masking, frequency masking |
Features | log-mel energies |
Classifier | CNN, ensemble |
Complexity management | weight quantization; weight quantization, pruning |
Mini-Segnet and Low-Complexity Mobilenet for Acoustic Scene Classification
Ge-Ge Bing1, Yun-Fei Shao1, Zhi Zhang2 and Wei-Qiang Zhang1
1Department of Electronic Engineering, Tsinghua University, Beijing, China, 2School of Computer Science, Beijing Institute of Technology, Beijing, China
Zhang_SATLab_task1_1 Zhang_SATLab_task1_2 Zhang_SATLab_task1_3 Zhang_SATLab_task1_4
Mini-Segnet and Low-Complexity Mobilenet for Acoustic Scene Classification
Ge-Ge Bing1, Yun-Fei Shao1, Zhi Zhang2 and Wei-Qiang Zhang1
1Department of Electronic Engineering, Tsinghua University, Beijing, China, 2School of Computer Science, Beijing Institute of Technology, Beijing, China
Abstract
This report details the architecture we used to address task 1 of the DCASE2023 challenge. The goal of the task is to design an audio scene classification system for device-imbalanced datasets under the constraints of model complexity. Our architecture is based on (1) SegNet, applying structured pruning and quantization to reduce model complexity; (2) MobileNet with an additional frequency split block. Log-mel spectrograms, delta, and delta-delta features are extracted to train the acoustic scene classification model. Mixup, random crop, time and frequency domain masking are used for data augmentation. The proposed system achieves higher classification accuracies and lower log loss than the baseline system. After model compression, our single MobileNet model achieves an average accuracy of 51.3% with only 7.946K parameters, and 3.972M Multiply–Accumulate Operations (MACs), while pruned SegNet gets to an average accuracy of 54.46% with 46.232K parameters and 19.466M MACs.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, temporal crop, time masking, frequency masking; mixup, temporal crop, time masking, frequency masking |
Features | log-mel energies |
Classifier | MobileNet; mini-SegNet; ensemble, MobileNet, mini-SegNet; MobileNet, ensemble |
Complexity management | weight quantization; pruning, weight quantization |
Tencent Submission to Dcase23 Task1: Low-Complexity Deep Learning Solution for Acoustic Scene Classification
Weicheng Cai, Zhang Mingyuan and Zhang Xiang
Tencent Inc., Beijing, China
Cai_TENCENT_task1_1 Cai_TENCENT_task1_2 Cai_TENCENT_task1_3 Cai_TENCENT_task1_4
Tencent Submission to Dcase23 Task1: Low-Complexity Deep Learning Solution for Acoustic Scene Classification
Weicheng Cai, Zhang Mingyuan and Zhang Xiang
Tencent Inc., Beijing, China
Abstract
In this technical report, we present the Tencent team’s entry for Task 1 Low-Complexity Acoustic Scene Classification in the DCASE 2023 challenge. We mainly follow the DCASE 2022 1st place solution from the CP-JKU team and have made some adjustments to meet the requirement of this year. Our approach involves employing knowledge distillation to train low-complexity CNN student models using Patchout Spectrogram Transformer (PaSST) models as teachers. We initially train the PaSST models on Audioset and then fine-tune them using the TAU Urban Acoustic Scenes 2022 Mobile development dataset. Lastly, we quantize the student models to enable 8-bit integer-based inference computations to meet the low-complexity constraints in edge devices.
System characteristics
Sampling rate | 32kHz |
Data augmentation | mixup, mixstyle,; mixup, mixstyle |
Features | log-mel energies |
Classifier | CNN |
Complexity management | weight quantization |
Dcase2023 Task1 Submission: Device Simulation and Time-Frequency Separable Convolution for Acoustic Scene Classification
Yiqiang Cai1, Minyu Lin1, Chenyang Zhu2, Shengchen Li1 and Xi Shao2
1School of Advanced Technology, Xi'an Jiaotong-Liverpool University, Suzhou, China, 2College of Tellecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China
Cai_XJTLU_task1_1 Cai_XJTLU_task1_2 Cai_XJTLU_task1_3 Cai_XJTLU_task1_4
Dcase2023 Task1 Submission: Device Simulation and Time-Frequency Separable Convolution for Acoustic Scene Classification
Yiqiang Cai1, Minyu Lin1, Chenyang Zhu2, Shengchen Li1 and Xi Shao2
1School of Advanced Technology, Xi'an Jiaotong-Liverpool University, Suzhou, China, 2College of Tellecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China
Abstract
The task 1 of DCASE 2023 Challenge incorporates a weighted average ranking of accuracy and complexity, which encourages participants to build efficient systems for acoustic scene classification (ASC). In this report, we propose TF-SepNet, a low-complexity ASC model based on Time-Frequency Separable Convolution. Our network architecture consists of a series of separable convolutional layers that exploit time and frequency domains. We also improve the performance of ResNorm by adding a few learnable parameters. Furthermore, knowledge distillation is employed to transfer knowledge from large model to smaller model. Additionally, device simulation is introduced for data augmentation in the device domain. Overall, we evaluate the performance of our model on the DCASE 2023 Task 1 development dataset following the official cross-validation setup and achieve a classification accuracy of 53.9% with 6.83K parameters and 1.65M MACs.
System characteristics
Sampling rate | 32kHz |
Data augmentation | device simulation, mixup, mixstyle |
Features | log-mel energies |
Classifier | CNN, TF-SepNet |
Complexity management | knowledge distillation, weight quantization; weight quantization |
How Information on Soft Labels and Hard Labels Mutually Benefits Sound Event Detection Tasks
Yutong Du1, Jisheng Bai1,2, Pu Zijun1 and Chen Jianfeng1,2
1Joint Laboratory of Environmental Sound Sensing, School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, China, 2LianFeng Acoustic Technologies Co., Xi'an, China
Bai_JLESS_task1_1 Bai_JLESS_task1_2
How Information on Soft Labels and Hard Labels Mutually Benefits Sound Event Detection Tasks
Yutong Du1, Jisheng Bai1,2, Pu Zijun1 and Chen Jianfeng1,2
1Joint Laboratory of Environmental Sound Sensing, School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, China, 2LianFeng Acoustic Technologies Co., Xi'an, China
Abstract
In this technical report, we describe our proposed system for DCASE task1: Low-Complexity Acoustic Scene Classification. First, To obtain better performance than Baseline, we choose ResNet as basic model, and add several self-attention blocks including CBAM and MHSA to get more fine-grained features and temporal features respectively from spectrogram. In order to pay attention to detailed information, add the CBAM block between two convolution layers in the ResNet block. The MHSA aims to get temporal context relationships in the spectrum. Another requirement of this task is Low-Complexity, thus, the regular convolution module is replaced by the depthwise separable convolution module in the proposed model. During experiments, we use FMix as data augmentation to improve generalization. Moreover, we use a hard-task training strategy in training process.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | FMix |
Features | log-mel energies |
Classifier | CNN,ResNet,Transformer,CBAM |
Complexity management | model compression |
Acoustic Scene Classification Based on Multi-Teacher Knowledge Distillation and Serfr-CNN
Hongbo Fei, Xing Li and Jie Jia
vivo Mobile Commun co Ltd, Hangzhou, China
Fei_vv_task1_1 Fei_vv_task1_2 Fei_vv_task1_3 Fei_vv_task1_4
Acoustic Scene Classification Based on Multi-Teacher Knowledge Distillation and Serfr-CNN
Hongbo Fei, Xing Li and Jie Jia
vivo Mobile Commun co Ltd, Hangzhou, China
Abstract
In this technical report, we describe our low-complexity acoustic scene classification algorithm submitted in DCASE 2023 Task 1a. We focus on knowledge distillation strategy and network innovation, multi-teacher knowledge distillation method and SERFR-CNN is proposed, which aims at the problems of insufficient classification accuracy and adaptability of current models. Based on traditional knowledge distillation method, combined with the model ensemble strategy, and then t multiteacher knowledge distillation method is proposed. In terms of audio feature extraction, we use Log-Mel spectrograms and Timefrequency masking algorithm. In order to further improve system performance, virtual data generation technology is adopted. Finally, use the trained model for transfer learning. By using proposed systems, we achieved a classification accuracy of 59.3% on the officially provided evaluation dataset, which is 16.4% over than the baseline system.
System characteristics
Sampling rate | 16kHz |
Data augmentation | mixup; mixup, time stretching |
Features | log-mel energies |
Classifier | SERFR-CNN-32; SERFR-CNN-24 |
Complexity management | weight quantization |
Submission to DCASE 2023 Task 1: Low-Complexity Acoustic Scene Classification Using Cepstral Analysis
Yaojun Han and Nengheng Zheng
College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
Han_SZU_task1_1
Submission to DCASE 2023 Task 1: Low-Complexity Acoustic Scene Classification Using Cepstral Analysis
Yaojun Han and Nengheng Zheng
College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
Abstract
This technical report describes the submitted system for task 1 of the DCASE 2023 challenge. The goal of this task is to design an acoustic scene classification system for devise-imbalanced datasets under the constraints of low complexity. We applied cepstrum analysis to filter out the channel information contained in the raw audio signals before the feature extraction stage. Moreover, we separate the spectral envelope and the fine structure of the spectrum in the cepstrum domain, and simply analyzed the impact of the two on the classification results. Due to the constraints of low complexity, we use knowledge distillation to allow the simpler student model to learn complex teacher models. In addition, we experimented with different augmentation techniques such as Mixup, random noise, pitch shifting, and time-frequency masking to expand the diversity of the dataset. Through the calculation of NeSsi tool, our model requires 80.845K of memory, with 29.349M MACs. And the accuracy of the model on the development dataset is 51.4%.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, time masking, frequency masking, pitch shifting, random noise |
Features | log-mel energies, spectral envelope, spectrum fine structure |
Classifier | CNN |
Complexity management | knowledge distillation |
Dual-Strategy Enhancement of Acoustic Scene and Event Classification: Integrating Res2net, Ghostnet, and Mobileformer Architectures
TaeSoo Kim, Daniel Rho, GaHui Lee and Jae Han Park
Computing Sciences, KT Corporation, Seoul, Korea
Park_KT_task1_1 Park_KT_task1_2 Park_KT_task1_3 Park_KT_task1_4
Dual-Strategy Enhancement of Acoustic Scene and Event Classification: Integrating Res2net, Ghostnet, and Mobileformer Architectures
TaeSoo Kim, Daniel Rho, GaHui Lee and Jae Han Park
Computing Sciences, KT Corporation, Seoul, Korea
Abstract
In this technical report, we investigate the balance between accuracy and efficiency in the low-complexity acoustic scene classification (ASC) task for the DCASE 2023 challenge. We explore two approaches: the first prioritizes accuracy using Res2Net and GhostNet, while the second emphasizes efficiency using MobileFormer. Our study highlights the trade-offs between accuracy and efficiency in ASC models and contributes to the ongoing research on developing robust and lightweight models suitable for embedded systems.
System characteristics
Sampling rate | 16kHz |
Data augmentation | mixup, frequency masking, temporal masking |
Features | log-mel energies |
Classifier | BCRes2Net |
Decision making | maximum likelihood |
Complexity management | weight quantization |
Low-Complexity Acoustic Scene Classification Base on Depthwise Separable CNN
Zhicong Liang, Pengyuan Xie, Zhe Wang and Wenbo Cai
NetEase, Guangzhou, China
Liang_NTES_task1_1
Low-Complexity Acoustic Scene Classification Base on Depthwise Separable CNN
Zhicong Liang, Pengyuan Xie, Zhe Wang and Wenbo Cai
NetEase, Guangzhou, China
Abstract
This report outlines our submission for DCASE2023 Task1, which focuses on Low Complexity Acoustic Scene Classification. To meet this requirement, we implemented the Depthwise Separable CNN method to construct our model. This approach significantly reduces model size while improving accuracy. Additionally, we applied SpecAugment and mixup as data augmentation techniques. To further enhance our model's performance, we employed Knowledge Distillation, teaching the submission model from larger models. Overall, these techniques enable us to achieve better results on the task.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | random cropping, SpecAugment, mixup |
Features | log-mel energies |
Classifier | CNN |
Low-Complexity Deep Learning System for Acoustic Scene Classification Using Teacher-Student Scheme and Multiple Spectrograms
Lam Pham1, Ngo Dat2, Le Cam3, Anahid Naghibzadeh-Jalali1 and Alexander Schindler1
1Center for Digital Safety & Security, Austrian Institute of Technology, Vienna, Austria, 2Computing Sciences, University of Essex, Colchester, UK, 3Computing Sciences, Ho Chi Minh University of Technology, HCM city, Vietnam
LAM_AEV_task1_1 LAM_AEV_task1_2 LAM_AEV_task1_3
Low-Complexity Deep Learning System for Acoustic Scene Classification Using Teacher-Student Scheme and Multiple Spectrograms
Lam Pham1, Ngo Dat2, Le Cam3, Anahid Naghibzadeh-Jalali1 and Alexander Schindler1
1Center for Digital Safety & Security, Austrian Institute of Technology, Vienna, Austria, 2Computing Sciences, University of Essex, Colchester, UK, 3Computing Sciences, Ho Chi Minh University of Technology, HCM city, Vietnam
Abstract
In this technical report, a low-complexity deep learning system for acoustic scene classification (ASC) is presented. The proposed system comprises two main phases: (Phase I) Training a teacher network; and (Phase II) training a student network using distilled knowledge from the teacher. In the first phase, the teacher, which presents a large footprint model, is trained. After training the teacher, the embeddings, which are the feature map of the second last layer of the teacher, are extracted. In the second phase, the student network, which presents a low complexity model, is trained with the embeddings extracted from the teacher. Our experiments conducted on DCASE 2023 Task 1 Development dataset have fulfilled the requirement of low-complexity and achieved the best classification accuracy of 57.4%, improving DCASE baseline by 14.5%.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup |
Features | CQT, Gammatonegram, Mel |
Classifier | CNN |
Decision making | Late fusion of predicted probabilities |
Malach23 Submission to DCASE 2023: Acoustic Scene Classification with Receptive-Field Regularized Convolution Neural Networks and State Space Models
Noah Pichler, Jonathan Greif, Christian Willdoner and David Fleischanderl
Johannes Kepler University, Linz, Austria
MALACH23_JKU_task1_1 MALACH23_JKU_task1_2 MALACH23_JKU_task1_3 MALACH23_JKU_task1_4
Malach23 Submission to DCASE 2023: Acoustic Scene Classification with Receptive-Field Regularized Convolution Neural Networks and State Space Models
Noah Pichler, Jonathan Greif, Christian Willdoner and David Fleischanderl
Johannes Kepler University, Linz, Austria
Abstract
This report describes our approach to Task 1 of the DCASE (Detection and Classification of Acoustic Scenes and Events) Challenge. To classify urban acoustic scenes through short audio samples, we experiment with Receptive Field Regularized Convolutional Neural Networks - and S4 Models as classifiers. To stay within the allowed model-complexity limits of the challenge, we use a Convolution Neural Network (CNN) with 13 layers plus one classification layer, and one CNN layer followed by 3 S4 Blocks, respectively. Additionally, we augment the Mel Spectrograms, through the MixStyle [4] or Mixup [5] method. We surpass the baseline with our experiments significantly, and, in particular, the S4 model stands out due to its low number of multiply accumulate operations.
System characteristics
Sampling rate | 32kHz |
Data augmentation | mixstyle; mixup |
Features | log-mel energies |
Classifier | CNN; S4 |
Decision making | maximum likelihood |
Complexity management | efficient models |
CP-JKU Submission to Dcase23: Efficient Acoustic Scene Classification with Cp-Mobile
Florian Schmid, Tobias Morocutti, Shahed Masoudian, Khaled Koutini and Gerhard Widmer
Institute of Computational Perception (CP), Johannes Kepler University (JKU) Linz, Linz, Austria
Schmid_CPJKU_task1_1 Schmid_CPJKU_task1_2 Schmid_CPJKU_task1_3 Schmid_CPJKU_task1_4
CP-JKU Submission to Dcase23: Efficient Acoustic Scene Classification with Cp-Mobile
Florian Schmid, Tobias Morocutti, Shahed Masoudian, Khaled Koutini and Gerhard Widmer
Institute of Computational Perception (CP), Johannes Kepler University (JKU) Linz, Linz, Austria
Abstract
In this technical report, we describe the CP-JKU team’s submission for Task 1 Low-Complexity Acoustic Scene Classification of the DCASE 23 challenge. We introduce a novel architecture, CPMobile, with regularized receptive field and residual inverted bottleneck blocks. We use Knowledge Distillation to teach CP-Mobile from an ensemble of multiple Patchout faSt Spectrogram Transformer (PaSST) and CP-ResNet models. To enhance cross-device generalization performance, Freq-MixStyle and Device Impulse Response (DIR) augmentation are applied while training teachers and students. CP-Mobile is fine-tuned using Quantization Aware Training and then quantized to perform computations in 8-bit precision. The improved teacher ensemble, the efficient student architecture and DIR augmentation improve the results on the TAU Urban Acoustic Scenes 2022 Mobile development dataset by around 5 percentage points in accuracy compared to the top-ranked submission for Task 1 of the DCASE 22 challenge.
System characteristics
Sampling rate | 32kHz |
Data augmentation | device impulse response augmentation, mixup, freq-mixstyle, pitch shifting |
Features | log-mel energies |
Classifier | RF-regularized CNNs, PaSST transformer |
Complexity management | knowledge distillation, weight quantization; knowledge distillation, weight quantization, structured pruning |
Submission to DCASE 2023 Task 1: Device Invariant Training with Structured Filter Pruning for Low Complexity Acoustic Scene Classification
Lorenz Schmidt, Beran Kiliç and Nils Peters
International Audio Laboratories, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Schmidt_FAU_task1_1 Schmidt_FAU_task1_2 Schmidt_FAU_task1_3 Schmidt_FAU_task1_4
Submission to DCASE 2023 Task 1: Device Invariant Training with Structured Filter Pruning for Low Complexity Acoustic Scene Classification
Lorenz Schmidt, Beran Kiliç and Nils Peters
International Audio Laboratories, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Abstract
This technical reports describes our contribution to the DCASE challenge 2023 Acoustic Scene Classification Task 1. We apply Inverse Contrastive Learning to regularize models and generalize better to unseen devices. First we construct a teacher ensemble by fine-tuning several PaSST models and then train student models with different Memory-Accumulate Counts (MACs) hard constraints. This yields four different models with approximately MMACs of 30, 20, 10 and 5. Finally the model is quantized to 8bit in order to fulfill memory requirements of the challenge.
System characteristics
Sampling rate | 32kHz |
Data augmentation | random cutoff, mixstyle, pitch shifting |
Features | log-mel energies |
Classifier | ensemble, RF-regularized CNNs, PaSST transformer |
Decision making | generalized mean; weighted generalized mean |
Complexity management | weight quantization, pruning |
Low-Complexity Acoustic Scene Classification Using Convolution Neural Network
Ee-Leng Tan, Jin Jie Yeo, Santi Peksi and Woon-Seng Gan
EEE, Nanyang Technological Univeristy, Singapore, Singapore
Tan_NTU_task1_1 Tan_NTU_task1_2 Tan_NTU_task1_3
Low-Complexity Acoustic Scene Classification Using Convolution Neural Network
Ee-Leng Tan, Jin Jie Yeo, Santi Peksi and Woon-Seng Gan
EEE, Nanyang Technological Univeristy, Singapore, Singapore
Abstract
In this technical report, we describe the CISS-NTU team’s submission for Task 1 Low-Complexity Acoustic Scene Classification of the detection and classification of acoustic scenes and events (DCASE) 2023 challenge [1]. We have explored and adapted the hyperparameters of the baseline (BL) system provided in this challenge. The TAU Urban Acoustic Scene 2022 Mobile, development dataset [2] has been used to train and validate our models. Each audio sample is transformed into 160 log-mel energies. Three models are submitted with two trained using the development dataset and one trained using the development dataset combined with augmented samples. The best performing model achieves an accuracy of 52.1% and a log loss of 1.372, and only requires 6.46 M of multiply-and-accumulate (MAC) operations and has a memory usage of 54.30 KB
System characteristics
Sampling rate | 44.1kHz |
Features | log-mel energies |
Classifier | CNN |
Complexity management | weight quantization |
Low-Complexity Acoustic Scene Classification Using Blueprint Separable Convolution and Knowledge Distillation
Jiaxin Tan and Yanxiong Li
School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China
Tan_SCUT_task1_1 Tan_SCUT_task1_2 Tan_SCUT_task1_3 Tan_SCUT_task1_4
Low-Complexity Acoustic Scene Classification Using Blueprint Separable Convolution and Knowledge Distillation
Jiaxin Tan and Yanxiong Li
School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China
Abstract
This technical report describes our proposed system for Task 1 in Detection and Classification of Acoustic Scenes and Events (DCASE) 2023. We design a teacher model based on blueprint separable Convolution (BSConv) with reference to the middle layer of the blueprint separable residual network. To meet the requirements of system complexity, we adopt knowledge distillation to teach student models from teacher model. Data augmentations (e.g., Mixstyle, SpecAugment, and spectrum modulation) are applied to prevent overfitting. When evaluated on the development data, one of the proposed systems obtains the accuracy score of 54.9% and has 73,386 parameters with 13.18 million multiply-and-accumulate operations.
System characteristics
Sampling rate | 44.1kHz |
Features | mel-spectrogram |
Classifier | CNN |
Complexity management | weight quantization |
Hierarchical Knowledge Distillation: A Multi-Stage Learning Approach
Quoc Vo and David Han
Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
Vo_DU_task1_1 Vo_DU_task1_2 Vo_DU_task1_3 Vo_DU_task1_4
Hierarchical Knowledge Distillation: A Multi-Stage Learning Approach
Quoc Vo and David Han
Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
Abstract
This technical report details our approach to Task 1 of the 2023 Detection and Classification of Acoustic Scenes and Event (DCASE2023), which focuses on the classification of recorded audios for acoustic scene recognition. The task calls for a quantized model of no more than 128KB in memory allowance for model parameters and a maximum of 30 millions of multiply-accumulate operations (MMACS) per inference. Our solution exploits log-mel sprectrogram features and leverages multiple data augmentations. Our proposed methodology utilizes an audio spectrogram transformer (AST) as the teacher model and multiple Convolutional Neural Network (CNN) models as students in a hierarchical knowledge distillation (KD) framework. This approach aids in bridging the substantial parameter disparity between the teacher model, which has over 86 million parameters, and our compact CNN-based model limited to just 119,526 parameters. Upon network training completion, the variable type of the weight data is converted into type INT8 to meet the size constraints. Our INT8 model achieves a log-loss of 1.59 and an accuracy of 46.01% on the TAU Urban Acoustic Scenes 2022 Mobile Development dataset’s standard test set, signifying the efficacy of our framework. Our proposed method demonstrates the potential of distillation strategies in optimizing smaller models without compromising their learning ability in a hierarchical approach.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, SpecAugment |
Features | log-mel spectrogram |
Classifier | CNN, Transformer, Knowledge Distillation |
Complexity management | weight quantization |
Low-Complexity Acoustic Scene Classification Using Deep Space Separable Distillation Module and Multi-Label Learning
Kangli Wang, Yiling Wu and Yanxiong Li
South China University of Technology, China, GuangZhou
Abstract
This technical report describes our system for Task 1 in Detection and Classification of Acoustic Scenes and Events (DCASE) 2023. We propose a deep space separable distillation block as the basic unit of the model, using its strong block processing ability to continuously cut the high-frequency and low-frequency parts of the log-Mel spectrogram. The accuracy is improved by multi-scale embedding and multi-task learning methods. To prevent overfitting, we adopt data augmentation methods such as mixing, speculation and spectral modulation. Quantization aware training is adopted to quantize the model to meet the requirements of edge devices with low complexity constraints. The proposed system achieves a 53.3% accuracy on the development dataset with only a parameter count of 45.16 kB and the MACs of 8.64 M.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup,speculation and spectral modulation |
Features | log-mel energies |
Classifier | CNN |
Complexity management | weight quantization |
Low-Complexity Acoustic Scene Classification Using Deep Mutual Learning and Knowledge Distillation Fine-Tuning
Shilong Weng, Liu Yang and Binghong Xu
School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China
Yang_GZHU_task1_1 Yang_GZHU_task1_2 Yang_GZHU_task1_3 Yang_GZHU_task1_4
Low-Complexity Acoustic Scene Classification Using Deep Mutual Learning and Knowledge Distillation Fine-Tuning
Shilong Weng, Liu Yang and Binghong Xu
School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China
Abstract
This technical report describes our submission for task 1 lowcomplexity acoustic scene classification of the DCASE 2023 challenge. To enhance the generalization to unseen devices, the reassembled 10-second audio is convolved with a microphone impulse response randomly selected from the Microphone Impulse Response Project library before fed into models. Then a ResNet38 teacher model pre-trained on AudioSet and three low-complexity BC-Res2Net student models are involved in Deep Mutual Learning to further improve the performance of the teacher model, and obtain a well-initialized student model as well. Next, we use Knowledge Distillation fine-tuning to teach the student model to learn from the well-performing teacher model while maintaining the predictive performance of the teacher model. Finally, the student model is quantized by Post-Training Static Quantization to implement inference computations using 8-bit integers.
System characteristics
Sampling rate | 32kHz |
Data augmentation | Conv_IR, time shifting, time-frequency masking, mixstyle |
Features | log-mel energies |
Classifier | CNN |
Decision making | average |
Complexity management | weight quantization |
Tiny Audio Spectrogram Transformer: Mobilevit for Low-Complexity Acoustic Scene Classification with Decoupled Knowledge Distillation
Jinyang Yu1, Zikai Song2,3, Jiahao Ji2,3, Lixian Zhu2,3, Kele Xu1, Kun Qian2,3, Yong Dou1 and Bin Hu2,3
1Computer Department, National University of Defense Technology, Changsha, P.R. China, 2Ministry of Education (Beijing Institute of Technology), Key Laboratory of Brain Health Intelligent Evaluation and Intervention, P.R. China, 3Beijing Institute of Technology, School of Medical Technology, P.R. China
XuQianHu_BIT&NUDT_task1_1 XuQianHu_BIT&NUDT_task1_2 XuQianHu_BIT&NUDT_task1_3 XuQianHu_BIT&NUDT_task1_4
Tiny Audio Spectrogram Transformer: Mobilevit for Low-Complexity Acoustic Scene Classification with Decoupled Knowledge Distillation
Jinyang Yu1, Zikai Song2,3, Jiahao Ji2,3, Lixian Zhu2,3, Kele Xu1, Kun Qian2,3, Yong Dou1 and Bin Hu2,3
1Computer Department, National University of Defense Technology, Changsha, P.R. China, 2Ministry of Education (Beijing Institute of Technology), Key Laboratory of Brain Health Intelligent Evaluation and Intervention, P.R. China, 3Beijing Institute of Technology, School of Medical Technology, P.R. China
Abstract
This report presents BIT&NUDT submissions to DCASE2023 challenge Task1, which aims to acoustic scene classification (ASC) with low complexity. Several vision transformers adapted to audio classification tasks have been proved to be more robust than CNNs due to their global representations. However, considering the complexity of self-attention, they seem not fit for lightweight edge devices. In our submission, we transfer a light-weight vision transformer, MobileViT from image tasks to ASC. By inserting the MobileViT block into CNN, our network can benefit from both attention global representations and CNN spatial representations. Under the parameter memory limitation of 128KB, we make quantization and convert a part of the parameters to INT8 for balance between complexity and accuracy. Further more, we use Decoupled Knowledge Distillation to take advantage of PaSST teacher models which outperformed in previous DCASE challenge.
System characteristics
Sampling rate | 32kHz |
Data augmentation | timerolling, pitch shifting, gaussian noise, specaugment, mixup, mixstyle |
Features | log-mel energies |
Classifier | CNN + Transformer |
Complexity management | weight quantization |
Acoustic Scene Classification Based on Pruned_ghostnet and Fhr_mobilenet
Lin Zhang, Hongxia Dong, Menglong Wu and Xichang Cai
Electronic and Communication Engineering, North China University of Technology, Beijing, China
Zhang_NCUT_task1_1 Zhang_NCUT_task1_2
Acoustic Scene Classification Based on Pruned_ghostnet and Fhr_mobilenet
Lin Zhang, Hongxia Dong, Menglong Wu and Xichang Cai
Electronic and Communication Engineering, North China University of Technology, Beijing, China
Abstract
This technical report describes our submission for Task 1 of the DCASE2023 challenge. We computed the logarithmic mel spectrogram for each audio segment under the condition of the original sampling rate of 44.1KHz. In addition, to obtain richer feature information, we also computed the first-order and second-order differences on top of the logarithmic mel spectrogram. The resulting spectrogram has 128-frequency bins, 43-time bins, and 3 channels. The feature maps were then fed into classification networks, where we employed two schemes, namely Pruned_GhostNet and FHR_MobileNet.The achieved accuracies were 47% and 52.8%, respectively, with model parameters of123.648K and 76.224K, and MACs of 7.375M and 28.461M.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, specaugment |
Features | log-mel energies,delta and delta-delta |
Classifier | GhostNet; FHR_Mobilenet |
Decision making | average |
Complexity management | weight quantization |