Task description
The goal of acoustic scene classification is to classify a test recording into one of the predefined ten acoustic scene classes. This targets acoustic scene classification with devices with low computational and memory allowance, which impose certain limits on the model complexity, such as the model’s number of parameters and the multiply-accumulate operations count. In addition to low-complexity, the aim is generalization across a number of different devices. For this purpose, the task will use audio data recorded and simulated with a variety of devices.
The development dataset consists of recordings from 10 European cities using 9 different devices: 3 real devices (A, B, C) and 6 simulated devices (S1-S6). Data from devices B, C, and S1-S6 consists of randomly selected segments from the simultaneous recordings, therefore all overlap with the data from device A, but not necessarily with each other. The total amount of audio in the development set is 64 hours.
The evaluation dataset contains data from 12 cities, 10 acoustic scenes, 11 devices. There are five new devices (not available in the development set): real device D and simulated devices S7-S11. Evaluation data contains 22 hours of audio.
The device A consists in a Soundman OKM II Klassik/studio A3, electret binaural microphone and a Zoom F8 audio recorder using 48kHz sampling rate and 24-bit resolution. The other devices are commonly available customer devices: device B is a Samsung Galaxy S7, device C is iPhone SE, and device D is a GoPro Hero5 Session.
More detailed task description can be found in the task description page
Systems ranking
Submission information | Evaluation dataset | Development dataset | ||||||
---|---|---|---|---|---|---|---|---|
Rank | Submission label | Name |
Technical Report |
Official system rank |
Logloss with 95% confidence interval (Evaluation dataset) |
Accuracy with 95% confidence interval (Evaluation dataset) |
Logloss (Development dataset) | Accuracy (Development dataset) |
AI4EDGE_IPL_task1_1 | AI4EDGE_1 | Anastcio2022 | 42 | 2.414 (2.264 - 2.564) | 47.0 (46.7 - 47.3) | 0.742 | 75.6 | |
AI4EDGE_IPL_task1_2 | AI4EDGE_2 | Anastcio2022 | 41 | 2.365 (2.226 - 2.504) | 46.7 (46.4 - 46.9) | 0.791 | 73.5 | |
AI4EDGE_IPL_task1_3 | AI4EDGE_3 | Anastcio2022 | 17 | 1.398 (1.343 - 1.454) | 49.4 (49.1 - 49.7) | 1.347 | 50.5 | |
AI4EDGE_IPL_task1_4 | AI4EDGE_4 | Anastcio2022 | 11 | 1.330 (1.281 - 1.378) | 51.6 (51.3 - 51.9) | 1.103 | 60.5 | |
AIT_Essex_task1_1 | AIT_Essex | Pham2022 | 34 | 1.636 (1.535 - 1.737) | 53.0 (52.7 - 53.3) | 1.719 | 55.6 | |
AIT_Essex_task1_2 | AIT_Essex | Pham2022 | 36 | 1.787 (1.680 - 1.894) | 51.9 (51.6 - 52.2) | 51.4 | ||
AIT_Essex_task1_3 | AIT_Essex | Pham2022 | 37 | 1.808 (1.689 - 1.928) | 55.2 (55.0 - 55.5) | 1.306 | 60.1 | |
Cai_XJTLU_task1_1 | DW_S | Cai2022 | 26 | 1.515 (1.454 - 1.575) | 47.8 (47.5 - 48.0) | 1.578 | 46.2 | |
Cai_XJTLU_task1_2 | DW | Cai2022 | 30 | 1.580 (1.519 - 1.642) | 46.4 (46.1 - 46.7) | 1.551 | 45.6 | |
Cai_XJTLU_task1_3 | DW_AUG_S | Cai2022 | 33 | 1.635 (1.566 - 1.704) | 45.2 (44.9 - 45.5) | 1.437 | 48.3 | |
Cai_XJTLU_task1_4 | DW_AUG | Cai2022 | 27 | 1.564 (1.501 - 1.627) | 48.0 (47.7 - 48.3) | 1.327 | 49.3 | |
Cao_SCUT_task1_1 | KDResCG | Cao2022 | 45 | 2.795 (2.623 - 2.967) | 48.7 (48.4 - 48.9) | 1.441 | 51.1 | |
Chang_HYU_task1_1 | JH_PM_HYU1 | Lee2022 | 5 | 1.147 (1.081 - 1.214) | 60.8 (60.6 - 61.1) | 0.835 | 70.1 | |
Chang_HYU_task1_2 | JH_PM_HYU2 | Lee2022 | 6 | 1.187 (1.125 - 1.249) | 59.2 (58.9 - 59.5) | 1.065 | 62.6 | |
Chang_HYU_task1_3 | JH_PM_HYU3 | Lee2022 | 8 | 1.190 (1.130 - 1.251) | 59.4 (59.1 - 59.6) | 1.005 | 64.9 | |
Chang_HYU_task1_4 | JH_PM_HYU4 | Lee2022a | 7 | 1.187 (1.126 - 1.248) | 59.3 (59.1 - 59.6) | 1.072 | 62.2 | |
Dong_NCUT_task1_1 | Dong1_NCUT | Dong2022 | 29 | 1.568 (1.512 - 1.623) | 48.0 (47.7 - 48.3) | 1.378 | 53.9 | |
Houyb_XDU_task1_1 | Houyb_XDU | Hou2022 | 22 | 1.481 (1.416 - 1.547) | 49.3 (49.0 - 49.5) | 1.449 | 49.7 | |
Liang_UESTC_task1_1 | BC-ResNet1 | Liang2022 | 38 | 1.934 (1.830 - 2.038) | 41.3 (41.0 - 41.5) | 1.263 | 53.8 | |
Liang_UESTC_task1_2 | BC-ResNet2 | Liang2022 | 47 | 2.916 (2.751 - 3.081) | 29.9 (29.6 - 30.2) | 1.267 | 55.9 | |
Liang_UESTC_task1_3 | BC-ResNet3 | Liang2022 | 43 | 2.701 (2.566 - 2.836) | 28.5 (28.2 - 28.7) | 1.236 | 56.2 | |
Liang_UESTC_task1_4 | MobileNet2 | Liang2022 | 32 | 1.612 (1.560 - 1.663) | 44.1 (43.8 - 44.4) | 1.556 | 45.9 | |
DCASE2022 baseline | Baseline | 1.532 (1.490 - 1.574) | 44.2 (44.0 - 44.5) | 1.575 | 42.9 | |||
Morocutti_JKU_task1_1 | jku_stu_1 | Morocutti2022 | 12 | 1.339 (1.278 - 1.399) | 53.8 (53.5 - 54.1) | 1.279 | 53.3 | |
Morocutti_JKU_task1_2 | jku_stu_2 | Morocutti2022 | 13 | 1.355 (1.296 - 1.414) | 53.0 (52.7 - 53.2) | 1.280 | 53.4 | |
Morocutti_JKU_task1_3 | jku_stu_3 | Morocutti2022 | 10 | 1.320 (1.256 - 1.383) | 54.7 (54.4 - 55.0) | 1.291 | 52.8 | |
Morocutti_JKU_task1_4 | jku_stu_4 | Morocutti2022 | 9 | 1.311 (1.253 - 1.369) | 54.5 (54.2 - 54.8) | 1.288 | 52.7 | |
Olisaemeka_ARU_task1_1 | DepSepConv | Olisaemeka2022 | 39 | 2.055 (1.991 - 2.119) | 36.4 (36.1 - 36.6) | 1.878 | 39.0 | |
Park_KT_task1_1 | MConvNet | Kim2022 | 25 | 1.504 (1.431 - 1.576) | 51.7 (51.4 - 52.0) | 1.259 | 54.0 | |
Park_KT_task1_2 | MConvNet | Kim2022 | 19 | 1.431 (1.364 - 1.498) | 52.7 (52.4 - 53.0) | 1.259 | 54.0 | |
Schmid_CPJKU_task1_1 | t10sec | Schmid2022 | 2 | 1.092 (1.043 - 1.141) | 59.7 (59.5 - 60.0) | 1.115 | 58.6 | |
Schmid_CPJKU_task1_2 | mixstyleR8 | Schmid2022 | 4 | 1.105 (1.057 - 1.153) | 59.6 (59.3 - 59.9) | 1.110 | 59.1 | |
Schmid_CPJKU_task1_3 | mixstyleR5 | Schmid2022 | 1 | 1.091 (1.040 - 1.141) | 59.6 (59.4 - 59.9) | 1.139 | 58.0 | |
Schmid_CPJKU_task1_4 | audiosetR5 | Schmid2022 | 3 | 1.102 (1.054 - 1.151) | 59.4 (59.1 - 59.7) | 1.163 | 57.6 | |
Schmidt_FAU_task1_1 | final | Schmidt2022 | 35 | 1.731 (1.657 - 1.805) | 47.5 (47.2 - 47.8) | 1.581 | 49.0 | |
Singh_Surrey_task1_1 | Surrey_4M | Singh2022 | 28 | 1.565 (1.508 - 1.623) | 44.6 (44.3 - 44.9) | 1.449 | 46.6 | |
Singh_Surrey_task1_2 | Surrey_5M | Singh2022 | 31 | 1.606 (1.547 - 1.664) | 44.3 (44.1 - 44.6) | 1.475 | 45.9 | |
Singh_Surrey_task1_3 | Surrey_19M | Singh2022 | 23 | 1.492 (1.441 - 1.544) | 45.9 (45.6 - 46.2) | 1.392 | 47.3 | |
Singh_Surrey_task1_4 | Surrey_20M | Singh2022 | 24 | 1.499 (1.447 - 1.551) | 45.9 (45.6 - 46.2) | 1.389 | 47.5 | |
Sugahara_RION_task1_1 | RION1 | Sugahara2022 | 18 | 1.405 (1.337 - 1.473) | 51.5 (51.2 - 51.7) | 1.199 | 56.3 | |
Sugahara_RION_task1_2 | RION2 | Sugahara2022 | 15 | 1.389 (1.325 - 1.454) | 51.6 (51.3 - 51.9) | 1.179 | 56.5 | |
Sugahara_RION_task1_3 | RION3 | Sugahara2022 | 14 | 1.366 (1.305 - 1.426) | 51.7 (51.4 - 51.9) | 1.182 | 56.5 | |
Sugahara_RION_task1_4 | RION4 | Sugahara2022 | 16 | 1.397 (1.328 - 1.466) | 52.7 (52.5 - 53.0) | 1.214 | 57.1 | |
Yu_XIAOMI_task1_1 | YLSSD | Yu2022 | 21 | 1.456 (1.409 - 1.504) | 46.2 (46.0 - 46.5) | 1.305 | 51.7 | |
Zaragoza-Paredes_UPV_task1_1 | Conv_Sep_CNN_48 | Zaragoza_Paredes2022 | 44 | 2.709 (2.517 - 2.901) | 43.8 (43.6 - 44.1) | 1.440 | 50.6 | |
Zaragoza-Paredes_UPV_task1_2 | Conv_Sep_CNN_48 | Zaragoza_Paredes2022 | 46 | 2.904 (2.690 - 3.118) | 41.9 (41.7 - 42.2) | 1.440 | 50.6 | |
Zhang_THUEE_task1_1 | THUEE | Shao2022 | 40 | 2.096 (1.913 - 2.280) | 54.9 (54.7 - 55.2) | 1.360 | 54.1 | |
Zhang_THUEE_task1_2 | THUEE | Shao2022 | 48 | 3.068 (2.775 - 3.361) | 54.4 (54.1 - 54.7) | 1.360 | 53.1 | |
Zou_PKU_task1_1 | SepCNN | Xin2022 | 20 | 1.442 (1.362 - 1.521) | 56.3 (56.0 - 56.6) | 1.295 | 60.3 |
Teams ranking
Table including only the best performing system per submitting team.
Submission information | Evaluation dataset | Development dataset | |||||||
---|---|---|---|---|---|---|---|---|---|
Rank | Submission label | Name |
Technical Report |
Official system rank |
Team rank | Logloss (Evaluation dataset) |
Accuracy with 95% confidence interval (Evaluation dataset) |
Logloss (Development dataset) | Accuracy (Development dataset) |
AI4EDGE_IPL_task1_4 | AI4EDGE_4 | Anastcio2022 | 11 | 4 | 1.330 (1.281 - 1.378) | 51.6 (51.3 - 51.9) | 1.103 | 60.5 | |
AIT_Essex_task1_1 | AIT_Essex | Pham2022 | 34 | 14 | 1.636 (1.535 - 1.737) | 53.0 (52.7 - 53.3) | 1.719 | 55.6 | |
Cai_XJTLU_task1_1 | DW_S | Cai2022 | 26 | 11 | 1.515 (1.454 - 1.575) | 47.8 (47.5 - 48.0) | 1.578 | 46.2 | |
Cao_SCUT_task1_1 | KDResCG | Cao2022 | 45 | 19 | 2.795 (2.623 - 2.967) | 48.7 (48.4 - 48.9) | 1.441 | 51.1 | |
Chang_HYU_task1_1 | JH_PM_HYU1 | Lee2022 | 5 | 2 | 1.147 (1.081 - 1.214) | 60.8 (60.6 - 61.1) | 0.835 | 70.1 | |
Dong_NCUT_task1_1 | Dong1_NCUT | Dong2022 | 29 | 12 | 1.568 (1.512 - 1.623) | 48.0 (47.7 - 48.3) | 1.378 | 53.9 | |
Houyb_XDU_task1_1 | Houyb_XDU | Hou2022 | 22 | 9 | 1.481 (1.416 - 1.547) | 49.3 (49.0 - 49.5) | 1.449 | 49.7 | |
Liang_UESTC_task1_4 | MobileNet2 | Liang2022 | 32 | 13 | 1.612 (1.560 - 1.663) | 44.1 (43.8 - 44.4) | 1.556 | 45.9 | |
DCASE2022 baseline | Baseline | 1.532 (1.490 - 1.574) | 44.2 (44.0 - 44.5) | 1.575 | 42.9 | ||||
Morocutti_JKU_task1_4 | jku_stu_4 | Morocutti2022 | 9 | 3 | 1.311 (1.253 - 1.369) | 54.5 (54.2 - 54.8) | 1.288 | 52.7 | |
Olisaemeka_ARU_task1_1 | DepSepConv | Olisaemeka2022 | 39 | 16 | 2.055 (1.991 - 2.119) | 36.4 (36.1 - 36.6) | 1.878 | 39.0 | |
Park_KT_task1_2 | MConvNet | Kim2022 | 19 | 6 | 1.431 (1.364 - 1.498) | 52.7 (52.4 - 53.0) | 1.259 | 54.0 | |
Schmid_CPJKU_task1_3 | mixstyleR5 | Schmid2022 | 1 | 1 | 1.091 (1.040 - 1.141) | 59.6 (59.4 - 59.9) | 1.139 | 58.0 | |
Schmidt_FAU_task1_1 | final | Schmidt2022 | 35 | 15 | 1.731 (1.657 - 1.805) | 47.5 (47.2 - 47.8) | 1.581 | 49.0 | |
Singh_Surrey_task1_3 | Surrey_19M | Singh2022 | 23 | 10 | 1.492 (1.441 - 1.544) | 45.9 (45.6 - 46.2) | 1.392 | 47.3 | |
Sugahara_RION_task1_3 | RION3 | Sugahara2022 | 14 | 5 | 1.366 (1.305 - 1.426) | 51.7 (51.4 - 51.9) | 1.182 | 56.5 | |
Yu_XIAOMI_task1_1 | YLSSD | Yu2022 | 21 | 8 | 1.456 (1.409 - 1.504) | 46.2 (46.0 - 46.5) | 1.305 | 51.7 | |
Zaragoza-Paredes_UPV_task1_1 | Conv_Sep_CNN_48 | Zaragoza_Paredes2022 | 44 | 18 | 2.709 (2.517 - 2.901) | 43.8 (43.6 - 44.1) | 1.440 | 50.6 | |
Zhang_THUEE_task1_1 | THUEE | Shao2022 | 40 | 17 | 2.096 (1.913 - 2.280) | 54.9 (54.7 - 55.2) | 1.360 | 54.1 | |
Zou_PKU_task1_1 | SepCNN | Xin2022 | 20 | 7 | 1.442 (1.362 - 1.521) | 56.3 (56.0 - 56.6) | 1.295 | 60.3 |
System complexity
Submission information | Evaluation dataset | Acoustic model | System | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Rank | Submission label |
Technical Report |
Official system rank |
Logloss (Eval) | Accuracy (Eval) | MACS | Memory use | Parameters |
Non-zero parameters |
Sparsity |
Complexity management |
AI4EDGE_IPL_task1_1 | Anastcio2022 | 42 | 2.414 | 47.0 | 21127552 | 70612 | 68918 | 68918 | 0.0 | weight quantization | |
AI4EDGE_IPL_task1_2 | Anastcio2022 | 41 | 2.365 | 46.7 | 21127552 | 70612 | 68918 | 68918 | 0.0 | weight quantization | |
AI4EDGE_IPL_task1_3 | Anastcio2022 | 17 | 1.398 | 49.4 | 25475456 | 52852 | 51986 | 51986 | 0.0 | knowledge distillation, weight quantization | |
AI4EDGE_IPL_task1_4 | Anastcio2022 | 11 | 1.330 | 51.6 | 25475456 | 52852 | 51986 | 51986 | 0.0 | knowledge distillation, weight quantization | |
AIT_Essex_task1_1 | Pham2022 | 34 | 1.636 | 53.0 | 900000 | 33822 | 33822 | 32382 | 0.042575838211814765 | channel restriction, decomposed convolution, quantization | |
AIT_Essex_task1_2 | Pham2022 | 36 | 1.787 | 51.9 | 750000 | 31902 | 31902 | 30558 | 0.04212902012413011 | channel restriction, decomposed convolution, quantization | |
AIT_Essex_task1_3 | Pham2022 | 37 | 1.808 | 55.2 | 900000 | 115998 | 115998 | 113118 | 0.024828014276108257 | channel restriction, decomposed convolution, quantization | |
Cai_XJTLU_task1_1 | Cai2022 | 26 | 1.515 | 47.8 | 6287030 | 25526 | 25526 | 0.0 | group convolution | ||
Cai_XJTLU_task1_2 | Cai2022 | 30 | 1.580 | 46.4 | 6287030 | 25526 | 25526 | 0.0 | group convolution | ||
Cai_XJTLU_task1_3 | Cai2022 | 33 | 1.635 | 45.2 | 7337718 | 35926 | 35926 | 0.0 | group convolution | ||
Cai_XJTLU_task1_4 | Cai2022 | 27 | 1.564 | 48.0 | 7337718 | 35926 | 35926 | 0.0 | group convolution | ||
Cao_SCUT_task1_1 | Cao2022 | 45 | 2.795 | 48.7 | 8637250 | 125330 | 125330 | 125330 | 0.0 | weight quantization, knowledge distillation | |
Chang_HYU_task1_1 | Lee2022 | 5 | 1.147 | 60.8 | 26763000 | 126580 | 126580 | 0.0 | weight quantization, knowledge distillation | ||
Chang_HYU_task1_2 | Lee2022 | 6 | 1.187 | 59.2 | 26763000 | 126580 | 126580 | 0.0 | weight quantization | ||
Chang_HYU_task1_3 | Lee2022 | 8 | 1.190 | 59.4 | 26763000 | 126580 | 126580 | 0.0 | weight quantization, knowledge distillation | ||
Chang_HYU_task1_4 | Lee2022a | 7 | 1.187 | 59.3 | 26763000 | 126580 | 126580 | 0.0 | weight quantization | ||
Dong_NCUT_task1_1 | Dong2022 | 29 | 1.568 | 48.0 | 28461216 | 540672 | 70608 | 70608 | 0.0 | weight quantization | |
Houyb_XDU_task1_1 | Hou2022 | 22 | 1.481 | 49.3 | 28513000 | 78140 | 57957 | 57957 | 0.0 | weight quantization | |
Liang_UESTC_task1_1 | Liang2022 | 38 | 1.934 | 41.3 | 20500000 | 85800 | 85800 | 85800 | 0.0 | knowledge distillation, weight quantization | |
Liang_UESTC_task1_2 | Liang2022 | 47 | 2.916 | 29.9 | 20500000 | 85800 | 85800 | 85800 | 0.0 | knowledge distillation, weight quantization | |
Liang_UESTC_task1_3 | Liang2022 | 43 | 2.701 | 28.5 | 20500000 | 85800 | 85800 | 85800 | 0.0 | knowledge distillation, weight quantization | |
Liang_UESTC_task1_4 | Liang2022 | 32 | 1.612 | 44.1 | 11186000 | 110452 | 110452 | 110452 | 0.0 | weight quantization | |
DCASE2022 baseline | 1.532 | 44.2 | 29234920 | 65280 | 46512 | 46512 | 0.0 | weight quantization | |||
Morocutti_JKU_task1_1 | Morocutti2022 | 12 | 1.339 | 53.8 | 29325000 | 3510000 | 65790 | 65790 | 0.0 | weight quantization | |
Morocutti_JKU_task1_2 | Morocutti2022 | 13 | 1.355 | 53.0 | 29325000 | 3510000 | 65790 | 65790 | 0.0 | weight quantization | |
Morocutti_JKU_task1_3 | Morocutti2022 | 10 | 1.320 | 54.7 | 29325000 | 3510000 | 65790 | 65790 | 0.0 | weight quantization | |
Morocutti_JKU_task1_4 | Morocutti2022 | 9 | 1.311 | 54.5 | 29325000 | 3510000 | 65790 | 65790 | 0.0 | weight quantization | |
Olisaemeka_ARU_task1_1 | Olisaemeka2022 | 39 | 2.055 | 36.4 | 3283692 | 65280 | 96473 | 96473 | 0.0 | weight quantization | |
Park_KT_task1_1 | Kim2022 | 25 | 1.504 | 51.7 | 29481000 | 262319 | 113378 | 113378 | 0.0 | weight quantization | |
Park_KT_task1_2 | Kim2022 | 19 | 1.431 | 52.7 | 29481000 | 262319 | 113378 | 113378 | 0.0 | weight quantization | |
Schmid_CPJKU_task1_1 | Schmid2022 | 2 | 1.092 | 59.7 | 29056324 | 127046 | 127046 | 0.0 | knowledge distillation, weight quantization | ||
Schmid_CPJKU_task1_2 | Schmid2022 | 4 | 1.105 | 59.6 | 29056324 | 127046 | 127046 | 0.0 | knowledge distillation, weight quantization | ||
Schmid_CPJKU_task1_3 | Schmid2022 | 1 | 1.091 | 59.6 | 28240924 | 121610 | 121610 | 0.0 | knowledge distillation, weight quantization | ||
Schmid_CPJKU_task1_4 | Schmid2022 | 3 | 1.102 | 59.4 | 28240924 | 121610 | 121610 | 0.0 | knowledge distillation, weight quantization | ||
Schmidt_FAU_task1_1 | Schmidt2022 | 35 | 1.731 | 47.5 | 15163468 | 5288960 | 127943 | 127943 | 0.0 | weight quantization, structured filter pruning | |
Singh_Surrey_task1_1 | Singh2022 | 28 | 1.565 | 44.6 | 4129320 | 261120 | 13138 | 13138 | 0.0 | weight quantization, pruning | |
Singh_Surrey_task1_2 | Singh2022 | 31 | 1.606 | 44.3 | 5404520 | 261120 | 14886 | 14886 | 0.0 | weight quantization, pruning | |
Singh_Surrey_task1_3 | Singh2022 | 23 | 1.492 | 45.9 | 18585480 | 261120 | 59570 | 59570 | 0.0 | weight quantization, pruning | |
Singh_Surrey_task1_4 | Singh2022 | 24 | 1.499 | 45.9 | 19831880 | 261120 | 60958 | 60958 | 0.0 | weight quantization, pruning | |
Sugahara_RION_task1_1 | Sugahara2022 | 18 | 1.405 | 51.5 | 26607000 | 120229 | 120229 | 0.0 | weight quantization | ||
Sugahara_RION_task1_2 | Sugahara2022 | 15 | 1.389 | 51.6 | 26607000 | 120229 | 120229 | 0.0 | weight quantization | ||
Sugahara_RION_task1_3 | Sugahara2022 | 14 | 1.366 | 51.7 | 26607000 | 120229 | 120229 | 0.0 | weight quantization | ||
Sugahara_RION_task1_4 | Sugahara2022 | 16 | 1.397 | 52.7 | 26610000 | 123346 | 123346 | 0.0 | weight quantization, knowledge distillation | ||
Yu_XIAOMI_task1_1 | Yu2022 | 21 | 1.456 | 46.2 | 16081000 | 5934 | 6306 | 6306 | 0.0 | weight quantization | |
Zaragoza-Paredes_UPV_task1_1 | Zaragoza_Paredes2022 | 44 | 2.709 | 43.8 | 28570080 | 1253376 | 28320 | 28320 | 0.0 | weight quantization | |
Zaragoza-Paredes_UPV_task1_2 | Zaragoza_Paredes2022 | 46 | 2.904 | 41.9 | 28570080 | 1253376 | 28320 | 28320 | 0.0 | weight quantization | |
Zhang_THUEE_task1_1 | Shao2022 | 40 | 2.096 | 54.9 | 28228320 | 2322560 | 127160 | 127160 | 0.0 | pruning, weight quantization, knowledge distillation | |
Zhang_THUEE_task1_2 | Shao2022 | 48 | 3.068 | 54.4 | 28098645 | 2322560 | 126078 | 126078 | 0.0 | pruning, weight quantization, knowledge distillation | |
Zou_PKU_task1_1 | Xin2022 | 20 | 1.442 | 56.3 | 28823618 | 68140 | 75562 | 75562 | 0.0 | weight quantization |
Generalization performance
All results with evaluation dataset.
Submission information | Overall | Devices | Cities | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Evaluation dataset | Unseen | Seen | Unseen | Seen | |||||||||
Rank | Submission label |
Technical Report |
Official system rank |
Logloss (Evaluation dataset) | Accuracy (Evaluation dataset) |
Logloss / unseen devices (Evaluation dataset) |
Accuracy / unseen devices (Evaluation dataset) |
Logloss / seen devices (Evaluation dataset) |
Accuracy / seen devices (Evaluation dataset) |
Logloss / unseen cities (Evaluation dataset) |
Accuracy / unseen cities (Evaluation dataset) |
Logloss / seen cities (Evaluation dataset) |
Accuracy / seen cities (Evaluation dataset) |
AI4EDGE_IPL_task1_1 | Anastcio2022 | 42 | 2.414 | 47.0 | 2.820 | 41.6 | 2.076 | 51.5 | 2.218 | 48.8 | 2.447 | 46.8 | |
AI4EDGE_IPL_task1_2 | Anastcio2022 | 41 | 2.365 | 46.7 | 2.923 | 41.2 | 1.900 | 51.2 | 2.211 | 48.2 | 2.390 | 46.7 | |
AI4EDGE_IPL_task1_3 | Anastcio2022 | 17 | 1.398 | 49.4 | 1.588 | 42.5 | 1.240 | 55.1 | 1.337 | 50.2 | 1.405 | 49.4 | |
AI4EDGE_IPL_task1_4 | Anastcio2022 | 11 | 1.330 | 51.6 | 1.467 | 46.5 | 1.215 | 55.9 | 1.268 | 53.0 | 1.336 | 51.5 | |
AIT_Essex_task1_1 | Pham2022 | 34 | 1.636 | 53.0 | 1.806 | 50.1 | 1.494 | 55.4 | 1.631 | 51.5 | 1.642 | 53.4 | |
AIT_Essex_task1_2 | Pham2022 | 36 | 1.787 | 51.9 | 2.138 | 47.2 | 1.494 | 55.8 | 1.810 | 50.0 | 1.792 | 52.5 | |
AIT_Essex_task1_3 | Pham2022 | 37 | 1.808 | 55.2 | 2.258 | 49.9 | 1.434 | 59.7 | 1.837 | 53.6 | 1.805 | 55.7 | |
Cai_XJTLU_task1_1 | Cai2022 | 26 | 1.515 | 47.8 | 1.847 | 40.7 | 1.238 | 53.7 | 1.553 | 45.1 | 1.500 | 48.7 | |
Cai_XJTLU_task1_2 | Cai2022 | 30 | 1.580 | 46.4 | 1.920 | 40.2 | 1.297 | 51.6 | 1.611 | 44.3 | 1.567 | 47.0 | |
Cai_XJTLU_task1_3 | Cai2022 | 33 | 1.635 | 45.2 | 2.059 | 36.6 | 1.282 | 52.3 | 1.674 | 42.8 | 1.617 | 46.0 | |
Cai_XJTLU_task1_4 | Cai2022 | 27 | 1.564 | 48.0 | 1.916 | 41.9 | 1.270 | 53.1 | 1.631 | 46.0 | 1.546 | 48.7 | |
Cao_SCUT_task1_1 | Cao2022 | 45 | 2.795 | 48.7 | 3.746 | 44.0 | 2.003 | 52.5 | 2.926 | 47.6 | 2.775 | 48.8 | |
Chang_HYU_task1_1 | Lee2022 | 5 | 1.147 | 60.8 | 1.377 | 55.1 | 0.956 | 65.6 | 1.114 | 60.6 | 1.153 | 60.9 | |
Chang_HYU_task1_2 | Lee2022 | 6 | 1.187 | 59.2 | 1.426 | 52.0 | 0.987 | 65.2 | 1.210 | 58.1 | 1.175 | 59.9 | |
Chang_HYU_task1_3 | Lee2022 | 8 | 1.190 | 59.4 | 1.428 | 52.6 | 0.992 | 65.0 | 1.224 | 57.6 | 1.183 | 60.0 | |
Chang_HYU_task1_4 | Lee2022a | 7 | 1.187 | 59.3 | 1.433 | 51.9 | 0.982 | 65.5 | 1.207 | 58.4 | 1.176 | 60.0 | |
Dong_NCUT_task1_1 | Dong2022 | 29 | 1.568 | 48.0 | 1.872 | 38.8 | 1.314 | 55.6 | 1.638 | 45.2 | 1.555 | 48.8 | |
Houyb_XDU_task1_1 | Hou2022 | 22 | 1.481 | 49.3 | 1.740 | 42.8 | 1.265 | 54.6 | 1.547 | 47.1 | 1.468 | 49.9 | |
Liang_UESTC_task1_1 | Liang2022 | 38 | 1.934 | 41.3 | 2.289 | 36.2 | 1.637 | 45.5 | 1.919 | 42.4 | 1.944 | 41.1 | |
Liang_UESTC_task1_2 | Liang2022 | 47 | 2.916 | 29.9 | 3.346 | 26.4 | 2.557 | 32.9 | 2.928 | 30.6 | 2.929 | 29.7 | |
Liang_UESTC_task1_3 | Liang2022 | 43 | 2.701 | 28.5 | 3.063 | 24.6 | 2.400 | 31.7 | 2.670 | 29.2 | 2.709 | 28.3 | |
Liang_UESTC_task1_4 | Liang2022 | 32 | 1.612 | 44.1 | 1.690 | 41.8 | 1.546 | 46.1 | 1.654 | 43.3 | 1.608 | 44.2 | |
DCASE2022 baseline | 1.532 | 44.2 | 1.725 | 38.1 | 1.372 | 49.4 | 1.552 | 43.4 | 1.530 | 44.7 | |||
Morocutti_JKU_task1_1 | Morocutti2022 | 12 | 1.339 | 53.8 | 1.509 | 48.6 | 1.197 | 58.1 | 1.328 | 54.3 | 1.338 | 53.9 | |
Morocutti_JKU_task1_2 | Morocutti2022 | 13 | 1.355 | 53.0 | 1.512 | 47.8 | 1.224 | 57.3 | 1.342 | 53.7 | 1.360 | 53.1 | |
Morocutti_JKU_task1_3 | Morocutti2022 | 10 | 1.320 | 54.7 | 1.508 | 48.8 | 1.162 | 59.5 | 1.310 | 54.7 | 1.321 | 54.9 | |
Morocutti_JKU_task1_4 | Morocutti2022 | 9 | 1.311 | 54.5 | 1.480 | 48.8 | 1.170 | 59.2 | 1.302 | 55.4 | 1.311 | 54.5 | |
Olisaemeka_ARU_task1_1 | Olisaemeka2022 | 39 | 2.055 | 36.4 | 2.515 | 28.4 | 1.671 | 43.0 | 2.180 | 32.8 | 2.021 | 37.3 | |
Park_KT_task1_1 | Kim2022 | 25 | 1.504 | 51.7 | 1.768 | 46.3 | 1.284 | 56.1 | 1.551 | 49.8 | 1.487 | 52.3 | |
Park_KT_task1_2 | Kim2022 | 19 | 1.431 | 52.7 | 1.624 | 48.4 | 1.270 | 56.2 | 1.442 | 51.9 | 1.417 | 53.2 | |
Schmid_CPJKU_task1_1 | Schmid2022 | 2 | 1.092 | 59.7 | 1.236 | 54.5 | 0.972 | 64.1 | 1.122 | 58.3 | 1.085 | 60.3 | |
Schmid_CPJKU_task1_2 | Schmid2022 | 4 | 1.105 | 59.6 | 1.218 | 55.3 | 1.011 | 63.2 | 1.126 | 58.6 | 1.103 | 59.9 | |
Schmid_CPJKU_task1_3 | Schmid2022 | 1 | 1.091 | 59.6 | 1.231 | 54.8 | 0.974 | 63.7 | 1.113 | 58.9 | 1.087 | 60.1 | |
Schmid_CPJKU_task1_4 | Schmid2022 | 3 | 1.102 | 59.4 | 1.229 | 54.9 | 0.997 | 63.1 | 1.129 | 58.4 | 1.097 | 59.7 | |
Schmidt_FAU_task1_1 | Schmidt2022 | 35 | 1.731 | 47.5 | 2.139 | 40.7 | 1.390 | 53.2 | 1.775 | 46.7 | 1.720 | 47.9 | |
Singh_Surrey_task1_1 | Singh2022 | 28 | 1.565 | 44.6 | 1.835 | 37.8 | 1.341 | 50.3 | 1.640 | 42.6 | 1.545 | 45.1 | |
Singh_Surrey_task1_2 | Singh2022 | 31 | 1.606 | 44.3 | 1.898 | 37.8 | 1.362 | 49.8 | 1.672 | 43.1 | 1.585 | 45.0 | |
Singh_Surrey_task1_3 | Singh2022 | 23 | 1.492 | 45.9 | 1.728 | 39.1 | 1.296 | 51.5 | 1.532 | 44.3 | 1.480 | 46.5 | |
Singh_Surrey_task1_4 | Singh2022 | 24 | 1.499 | 45.9 | 1.754 | 38.8 | 1.287 | 51.9 | 1.540 | 44.5 | 1.486 | 46.6 | |
Sugahara_RION_task1_1 | Sugahara2022 | 18 | 1.405 | 51.5 | 1.576 | 46.9 | 1.261 | 55.3 | 1.444 | 49.4 | 1.401 | 51.9 | |
Sugahara_RION_task1_2 | Sugahara2022 | 15 | 1.389 | 51.6 | 1.534 | 47.8 | 1.269 | 54.8 | 1.431 | 49.6 | 1.388 | 52.1 | |
Sugahara_RION_task1_3 | Sugahara2022 | 14 | 1.366 | 51.7 | 1.496 | 47.9 | 1.257 | 54.8 | 1.405 | 49.6 | 1.364 | 52.1 | |
Sugahara_RION_task1_4 | Sugahara2022 | 16 | 1.397 | 52.7 | 1.691 | 46.8 | 1.152 | 57.7 | 1.463 | 50.0 | 1.379 | 53.6 | |
Yu_XIAOMI_task1_1 | Yu2022 | 21 | 1.456 | 46.2 | 1.619 | 40.6 | 1.321 | 51.0 | 1.473 | 46.0 | 1.454 | 46.5 | |
Zaragoza-Paredes_UPV_task1_1 | Zaragoza_Paredes2022 | 44 | 2.709 | 43.8 | 3.181 | 40.1 | 2.315 | 47.0 | 2.665 | 43.4 | 2.733 | 44.0 | |
Zaragoza-Paredes_UPV_task1_2 | Zaragoza_Paredes2022 | 46 | 2.904 | 41.9 | 3.254 | 38.8 | 2.613 | 44.6 | 2.824 | 42.1 | 2.930 | 42.0 | |
Zhang_THUEE_task1_1 | Shao2022 | 40 | 2.096 | 54.9 | 2.435 | 47.0 | 1.814 | 61.5 | 2.293 | 53.1 | 2.056 | 55.6 | |
Zhang_THUEE_task1_2 | Shao2022 | 48 | 3.068 | 54.4 | 4.008 | 45.7 | 2.284 | 61.6 | 3.149 | 51.5 | 2.986 | 55.2 | |
Zou_PKU_task1_1 | Xin2022 | 20 | 1.442 | 56.3 | 1.842 | 48.6 | 1.108 | 62.7 | 1.530 | 53.0 | 1.409 | 57.5 |
Class-wise performance
Log loss
Rank | Submission label |
Technical Report |
Official system rank |
Logloss | Airport | Bus | Metro |
Metro station |
Park |
Public square |
Shopping mall |
Street pedestrian |
Street traffic |
Tram |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AI4EDGE_IPL_task1_1 | Anastcio2022 | 42 | 2.414 | 2.559 | 1.332 | 1.923 | 2.808 | 0.996 | 4.097 | 2.567 | 4.302 | 1.554 | 2.003 | |
AI4EDGE_IPL_task1_2 | Anastcio2022 | 41 | 2.365 | 2.293 | 1.227 | 1.818 | 2.520 | 1.026 | 4.030 | 3.000 | 4.196 | 1.628 | 1.912 | |
AI4EDGE_IPL_task1_3 | Anastcio2022 | 17 | 1.398 | 1.587 | 0.806 | 1.519 | 1.992 | 0.814 | 1.662 | 1.537 | 1.879 | 1.213 | 0.975 | |
AI4EDGE_IPL_task1_4 | Anastcio2022 | 11 | 1.330 | 1.429 | 0.959 | 1.193 | 1.549 | 0.901 | 1.877 | 1.314 | 1.722 | 1.160 | 1.192 | |
AIT_Essex_task1_1 | Pham2022 | 34 | 1.636 | 1.557 | 0.777 | 1.535 | 2.490 | 0.603 | 2.772 | 1.033 | 2.702 | 1.434 | 1.456 | |
AIT_Essex_task1_2 | Pham2022 | 36 | 1.787 | 1.810 | 0.544 | 2.138 | 2.715 | 1.037 | 2.748 | 1.077 | 2.948 | 1.564 | 1.288 | |
AIT_Essex_task1_3 | Pham2022 | 37 | 1.808 | 1.767 | 0.711 | 1.660 | 2.431 | 0.805 | 3.070 | 1.449 | 3.163 | 1.921 | 1.106 | |
Cai_XJTLU_task1_1 | Cai2022 | 26 | 1.515 | 1.590 | 1.258 | 1.274 | 1.845 | 1.217 | 2.132 | 1.300 | 1.986 | 1.414 | 1.131 | |
Cai_XJTLU_task1_2 | Cai2022 | 30 | 1.580 | 1.591 | 1.457 | 1.342 | 1.837 | 1.356 | 2.090 | 1.317 | 2.113 | 1.624 | 1.078 | |
Cai_XJTLU_task1_3 | Cai2022 | 33 | 1.635 | 1.583 | 1.091 | 1.290 | 1.979 | 1.346 | 2.552 | 1.521 | 2.157 | 1.761 | 1.070 | |
Cai_XJTLU_task1_4 | Cai2022 | 27 | 1.564 | 1.409 | 1.318 | 1.307 | 1.954 | 1.852 | 2.006 | 1.153 | 1.997 | 1.559 | 1.083 | |
Cao_SCUT_task1_1 | Cao2022 | 45 | 2.795 | 3.614 | 1.835 | 2.234 | 2.986 | 2.026 | 4.087 | 2.443 | 3.969 | 2.277 | 2.480 | |
Chang_HYU_task1_1 | Lee2022 | 5 | 1.147 | 1.647 | 0.539 | 1.055 | 1.100 | 0.555 | 1.867 | 1.291 | 1.837 | 0.820 | 0.763 | |
Chang_HYU_task1_2 | Lee2022 | 6 | 1.187 | 1.549 | 0.564 | 1.033 | 1.079 | 0.701 | 1.900 | 1.458 | 1.936 | 0.835 | 0.812 | |
Chang_HYU_task1_3 | Lee2022 | 8 | 1.190 | 1.504 | 0.645 | 1.041 | 1.205 | 0.610 | 1.903 | 1.375 | 1.889 | 0.907 | 0.822 | |
Chang_HYU_task1_4 | Lee2022a | 7 | 1.187 | 1.645 | 0.542 | 1.049 | 1.084 | 0.698 | 1.849 | 1.510 | 1.889 | 0.827 | 0.777 | |
Dong_NCUT_task1_1 | Dong2022 | 29 | 1.568 | 1.976 | 0.936 | 1.390 | 1.595 | 0.909 | 2.288 | 1.607 | 2.043 | 1.441 | 1.489 | |
Houyb_XDU_task1_1 | Hou2022 | 22 | 1.481 | 1.827 | 1.100 | 1.508 | 1.937 | 0.824 | 1.669 | 2.015 | 1.801 | 1.166 | 0.963 | |
Liang_UESTC_task1_1 | Liang2022 | 38 | 1.934 | 3.818 | 1.085 | 1.029 | 1.451 | 1.927 | 2.831 | 2.010 | 2.775 | 1.460 | 0.950 | |
Liang_UESTC_task1_2 | Liang2022 | 47 | 2.916 | 5.660 | 2.051 | 0.736 | 1.248 | 3.257 | 3.867 | 3.809 | 3.728 | 3.736 | 1.064 | |
Liang_UESTC_task1_3 | Liang2022 | 43 | 2.701 | 3.537 | 2.557 | 0.693 | 0.817 | 4.344 | 3.929 | 2.739 | 3.305 | 3.690 | 1.403 | |
Liang_UESTC_task1_4 | Liang2022 | 32 | 1.612 | 2.008 | 1.598 | 1.933 | 1.602 | 1.009 | 2.357 | 1.261 | 1.772 | 1.158 | 1.417 | |
DCASE2022 baseline | 1.532 | 1.596 | 1.368 | 1.489 | 1.692 | 1.635 | 1.943 | 1.289 | 1.891 | 1.219 | 1.202 | |||
Morocutti_JKU_task1_1 | Morocutti2022 | 12 | 1.339 | 1.409 | 0.841 | 1.335 | 1.714 | 0.736 | 1.742 | 1.607 | 1.953 | 1.107 | 0.943 | |
Morocutti_JKU_task1_2 | Morocutti2022 | 13 | 1.355 | 1.487 | 0.985 | 1.305 | 1.677 | 0.682 | 1.718 | 1.460 | 1.734 | 1.289 | 1.212 | |
Morocutti_JKU_task1_3 | Morocutti2022 | 10 | 1.320 | 1.537 | 0.673 | 1.247 | 1.701 | 0.712 | 1.715 | 1.427 | 1.959 | 1.309 | 0.916 | |
Morocutti_JKU_task1_4 | Morocutti2022 | 9 | 1.311 | 1.426 | 0.747 | 1.337 | 1.587 | 0.701 | 1.738 | 1.637 | 1.874 | 1.168 | 0.894 | |
Olisaemeka_ARU_task1_1 | Olisaemeka2022 | 39 | 2.055 | 1.970 | 2.535 | 1.331 | 2.020 | 2.054 | 2.868 | 1.795 | 2.608 | 1.604 | 1.765 | |
Park_KT_task1_1 | Kim2022 | 25 | 1.504 | 1.615 | 1.048 | 1.761 | 1.982 | 1.153 | 2.345 | 1.229 | 1.631 | 1.554 | 0.721 | |
Park_KT_task1_2 | Kim2022 | 19 | 1.431 | 1.166 | 1.093 | 1.448 | 2.117 | 1.210 | 1.734 | 1.198 | 2.071 | 1.416 | 0.855 | |
Schmid_CPJKU_task1_1 | Schmid2022 | 2 | 1.092 | 1.435 | 0.773 | 0.997 | 1.173 | 0.593 | 1.549 | 1.099 | 1.628 | 0.901 | 0.770 | |
Schmid_CPJKU_task1_2 | Schmid2022 | 4 | 1.105 | 1.358 | 0.759 | 1.043 | 1.217 | 0.451 | 1.609 | 1.254 | 1.616 | 0.896 | 0.851 | |
Schmid_CPJKU_task1_3 | Schmid2022 | 1 | 1.091 | 1.430 | 0.790 | 0.932 | 1.120 | 0.502 | 1.521 | 1.129 | 1.709 | 0.953 | 0.822 | |
Schmid_CPJKU_task1_4 | Schmid2022 | 3 | 1.102 | 1.349 | 0.839 | 0.918 | 1.246 | 0.534 | 1.557 | 1.120 | 1.726 | 0.956 | 0.781 | |
Schmidt_FAU_task1_1 | Schmidt2022 | 35 | 1.731 | 2.207 | 1.025 | 1.621 | 1.923 | 1.726 | 2.125 | 1.938 | 2.518 | 1.157 | 1.068 | |
Singh_Surrey_task1_1 | Singh2022 | 28 | 1.565 | 1.919 | 1.420 | 1.175 | 1.754 | 1.398 | 2.065 | 1.270 | 2.082 | 1.350 | 1.219 | |
Singh_Surrey_task1_2 | Singh2022 | 31 | 1.606 | 1.834 | 1.619 | 1.275 | 1.713 | 1.414 | 2.020 | 1.394 | 2.216 | 1.464 | 1.109 | |
Singh_Surrey_task1_3 | Singh2022 | 23 | 1.492 | 1.656 | 1.604 | 1.242 | 1.696 | 1.289 | 1.893 | 1.239 | 1.898 | 1.363 | 1.041 | |
Singh_Surrey_task1_4 | Singh2022 | 24 | 1.499 | 1.660 | 1.496 | 1.242 | 1.700 | 1.370 | 1.895 | 1.281 | 1.966 | 1.359 | 1.023 | |
Sugahara_RION_task1_1 | Sugahara2022 | 18 | 1.405 | 1.421 | 1.028 | 1.349 | 1.426 | 0.352 | 2.227 | 1.235 | 2.167 | 1.676 | 1.161 | |
Sugahara_RION_task1_2 | Sugahara2022 | 15 | 1.389 | 1.468 | 1.220 | 1.377 | 1.324 | 0.365 | 1.918 | 1.311 | 2.078 | 1.559 | 1.274 | |
Sugahara_RION_task1_3 | Sugahara2022 | 14 | 1.366 | 1.440 | 1.201 | 1.410 | 1.326 | 0.379 | 1.830 | 1.276 | 2.045 | 1.476 | 1.273 | |
Sugahara_RION_task1_4 | Sugahara2022 | 16 | 1.397 | 1.545 | 0.723 | 1.148 | 1.447 | 1.024 | 2.212 | 1.118 | 2.483 | 1.150 | 1.122 | |
Yu_XIAOMI_task1_1 | Yu2022 | 21 | 1.456 | 1.732 | 1.177 | 1.425 | 1.789 | 1.008 | 1.808 | 1.380 | 1.792 | 1.255 | 1.200 | |
Zaragoza-Paredes_UPV_task1_1 | Zaragoza_Paredes2022 | 44 | 2.709 | 1.110 | 2.356 | 3.714 | 4.442 | 2.247 | 1.790 | 3.617 | 3.139 | 2.755 | 1.919 | |
Zaragoza-Paredes_UPV_task1_2 | Zaragoza_Paredes2022 | 46 | 2.904 | 0.995 | 2.967 | 4.176 | 5.117 | 2.167 | 1.582 | 4.277 | 3.045 | 2.690 | 2.026 | |
Zhang_THUEE_task1_1 | Shao2022 | 40 | 2.096 | 1.557 | 1.105 | 2.007 | 1.513 | 1.942 | 1.812 | 1.438 | 1.618 | 5.762 | 2.208 | |
Zhang_THUEE_task1_2 | Shao2022 | 48 | 3.068 | 2.807 | 2.400 | 1.595 | 1.378 | 3.292 | 10.393 | 1.345 | 1.799 | 4.168 | 1.503 | |
Zou_PKU_task1_1 | Xin2022 | 20 | 1.442 | 1.971 | 0.724 | 1.329 | 1.486 | 0.639 | 2.221 | 1.269 | 2.379 | 1.247 | 1.150 |
Accuracy
Rank | Submission label |
Technical Report |
Official system rank |
Accuracy | Airport | Bus | Metro |
Metro station |
Park |
Public square |
Shopping mall |
Street pedestrian |
Street traffic |
Tram |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AI4EDGE_IPL_task1_1 | Anastcio2022 | 42 | 47.0 | 41.3 | 66.9 | 51.3 | 35.3 | 75.2 | 26.7 | 40.3 | 22.3 | 60.4 | 50.4 | |
AI4EDGE_IPL_task1_2 | Anastcio2022 | 41 | 46.7 | 40.8 | 68.5 | 53.2 | 39.4 | 74.9 | 24.8 | 34.7 | 21.0 | 61.2 | 48.1 | |
AI4EDGE_IPL_task1_3 | Anastcio2022 | 17 | 49.4 | 38.2 | 72.9 | 41.2 | 33.2 | 75.2 | 38.4 | 44.1 | 25.6 | 59.2 | 65.8 | |
AI4EDGE_IPL_task1_4 | Anastcio2022 | 11 | 51.6 | 42.1 | 68.9 | 56.2 | 45.8 | 73.8 | 25.5 | 52.7 | 34.7 | 61.4 | 54.8 | |
AIT_Essex_task1_1 | Pham2022 | 34 | 53.0 | 48.6 | 78.3 | 49.0 | 31.3 | 84.2 | 30.3 | 65.9 | 26.0 | 66.1 | 50.2 | |
AIT_Essex_task1_2 | Pham2022 | 36 | 51.9 | 44.8 | 84.7 | 38.5 | 29.1 | 76.0 | 32.6 | 66.9 | 26.6 | 65.3 | 54.0 | |
AIT_Essex_task1_3 | Pham2022 | 37 | 55.2 | 51.2 | 81.4 | 49.2 | 38.2 | 83.1 | 31.8 | 61.7 | 29.4 | 60.0 | 66.4 | |
Cai_XJTLU_task1_1 | Cai2022 | 26 | 47.8 | 37.7 | 57.4 | 46.8 | 33.9 | 69.8 | 23.4 | 60.2 | 36.3 | 56.5 | 55.5 | |
Cai_XJTLU_task1_2 | Cai2022 | 30 | 46.4 | 39.7 | 48.8 | 41.7 | 38.0 | 63.9 | 28.3 | 62.7 | 28.8 | 50.2 | 62.0 | |
Cai_XJTLU_task1_3 | Cai2022 | 33 | 45.2 | 40.4 | 61.5 | 44.5 | 28.2 | 68.1 | 17.4 | 53.6 | 32.2 | 48.3 | 57.6 | |
Cai_XJTLU_task1_4 | Cai2022 | 27 | 48.0 | 46.2 | 53.3 | 46.2 | 32.8 | 61.8 | 26.0 | 62.6 | 33.3 | 56.5 | 61.5 | |
Cao_SCUT_task1_1 | Cao2022 | 45 | 48.7 | 40.1 | 58.5 | 42.1 | 38.5 | 72.3 | 24.2 | 52.0 | 35.7 | 69.0 | 54.1 | |
Chang_HYU_task1_1 | Lee2022 | 5 | 60.8 | 39.7 | 81.8 | 61.4 | 62.3 | 83.5 | 35.1 | 58.0 | 39.5 | 74.4 | 72.5 | |
Chang_HYU_task1_2 | Lee2022 | 6 | 59.2 | 42.3 | 82.3 | 62.4 | 62.8 | 79.5 | 32.6 | 51.3 | 32.6 | 74.0 | 72.2 | |
Chang_HYU_task1_3 | Lee2022 | 8 | 59.4 | 43.0 | 79.9 | 63.5 | 59.6 | 81.9 | 34.2 | 52.1 | 35.3 | 72.2 | 71.9 | |
Chang_HYU_task1_4 | Lee2022a | 7 | 59.3 | 37.6 | 83.2 | 62.3 | 63.2 | 80.0 | 34.6 | 49.4 | 34.8 | 74.5 | 73.9 | |
Dong_NCUT_task1_1 | Dong2022 | 29 | 48.0 | 30.8 | 70.4 | 50.3 | 48.3 | 72.5 | 24.7 | 45.5 | 35.3 | 56.6 | 45.6 | |
Houyb_XDU_task1_1 | Hou2022 | 22 | 49.3 | 33.2 | 59.9 | 44.8 | 39.1 | 74.3 | 40.1 | 33.7 | 37.7 | 64.1 | 65.7 | |
Liang_UESTC_task1_1 | Liang2022 | 38 | 41.3 | 8.2 | 59.0 | 58.0 | 43.0 | 49.5 | 21.8 | 35.1 | 22.9 | 56.6 | 58.5 | |
Liang_UESTC_task1_2 | Liang2022 | 47 | 29.9 | 3.3 | 35.8 | 71.6 | 48.8 | 23.8 | 16.2 | 9.0 | 14.8 | 20.0 | 55.9 | |
Liang_UESTC_task1_3 | Liang2022 | 43 | 28.5 | 8.5 | 26.4 | 72.3 | 64.4 | 13.8 | 12.5 | 13.0 | 14.4 | 18.8 | 40.5 | |
Liang_UESTC_task1_4 | Liang2022 | 32 | 44.1 | 25.2 | 44.2 | 29.7 | 43.1 | 69.3 | 17.1 | 57.1 | 39.9 | 63.7 | 51.7 | |
DCASE2022 baseline | 44.2 | 32.2 | 50.6 | 37.9 | 39.8 | 52.2 | 25.7 | 58.2 | 27.9 | 64.4 | 53.4 | |||
Morocutti_JKU_task1_1 | Morocutti2022 | 12 | 53.8 | 49.5 | 72.6 | 49.6 | 42.7 | 77.7 | 37.3 | 45.9 | 25.6 | 69.2 | 67.8 | |
Morocutti_JKU_task1_2 | Morocutti2022 | 13 | 53.0 | 41.1 | 68.7 | 51.4 | 43.4 | 80.3 | 36.3 | 48.8 | 33.2 | 66.0 | 60.5 | |
Morocutti_JKU_task1_3 | Morocutti2022 | 10 | 54.7 | 42.7 | 77.4 | 52.3 | 42.8 | 78.5 | 39.8 | 54.2 | 26.0 | 63.9 | 69.0 | |
Morocutti_JKU_task1_4 | Morocutti2022 | 9 | 54.5 | 48.5 | 75.0 | 48.0 | 47.0 | 79.4 | 37.5 | 43.5 | 27.3 | 68.3 | 70.1 | |
Olisaemeka_ARU_task1_1 | Olisaemeka2022 | 39 | 36.4 | 31.8 | 27.1 | 50.5 | 30.9 | 44.5 | 17.2 | 46.9 | 20.1 | 54.0 | 40.5 | |
Park_KT_task1_1 | Kim2022 | 25 | 51.7 | 44.0 | 64.2 | 42.5 | 39.8 | 67.1 | 25.5 | 58.0 | 42.2 | 58.5 | 74.9 | |
Park_KT_task1_2 | Kim2022 | 19 | 52.7 | 57.5 | 64.3 | 49.8 | 36.4 | 64.1 | 38.6 | 55.2 | 29.0 | 61.0 | 71.1 | |
Schmid_CPJKU_task1_1 | Schmid2022 | 2 | 59.7 | 48.0 | 76.8 | 63.8 | 58.3 | 82.7 | 43.2 | 57.1 | 32.0 | 68.9 | 66.6 | |
Schmid_CPJKU_task1_2 | Schmid2022 | 4 | 59.6 | 51.2 | 78.2 | 62.6 | 58.3 | 88.5 | 41.3 | 52.1 | 30.8 | 69.5 | 63.7 | |
Schmid_CPJKU_task1_3 | Schmid2022 | 1 | 59.6 | 47.4 | 75.5 | 66.1 | 60.4 | 85.4 | 43.6 | 55.1 | 31.8 | 66.8 | 64.4 | |
Schmid_CPJKU_task1_4 | Schmid2022 | 3 | 59.4 | 53.1 | 74.7 | 67.5 | 53.8 | 85.8 | 42.6 | 56.9 | 27.4 | 66.5 | 65.7 | |
Schmidt_FAU_task1_1 | Schmidt2022 | 35 | 47.5 | 31.6 | 66.4 | 48.1 | 40.9 | 53.5 | 33.3 | 44.4 | 26.2 | 66.1 | 64.2 | |
Singh_Surrey_task1_1 | Singh2022 | 28 | 44.6 | 23.9 | 43.7 | 54.1 | 36.6 | 61.4 | 32.0 | 62.4 | 24.2 | 59.5 | 48.3 | |
Singh_Surrey_task1_2 | Singh2022 | 31 | 44.3 | 29.3 | 37.3 | 48.8 | 39.5 | 62.9 | 37.2 | 58.3 | 17.9 | 56.8 | 55.1 | |
Singh_Surrey_task1_3 | Singh2022 | 23 | 45.9 | 29.1 | 33.0 | 49.8 | 38.6 | 63.3 | 36.7 | 62.4 | 24.6 | 60.0 | 61.2 | |
Singh_Surrey_task1_4 | Singh2022 | 24 | 45.9 | 29.8 | 37.3 | 49.2 | 39.0 | 61.5 | 37.1 | 61.7 | 22.7 | 60.0 | 60.8 | |
Sugahara_RION_task1_1 | Sugahara2022 | 18 | 51.5 | 45.4 | 66.0 | 49.8 | 48.8 | 90.8 | 24.2 | 53.4 | 31.6 | 44.5 | 60.1 | |
Sugahara_RION_task1_2 | Sugahara2022 | 15 | 51.6 | 42.5 | 61.0 | 52.3 | 53.0 | 90.4 | 29.0 | 52.5 | 33.6 | 48.8 | 53.0 | |
Sugahara_RION_task1_3 | Sugahara2022 | 14 | 51.7 | 42.5 | 61.1 | 52.1 | 53.0 | 90.5 | 29.1 | 52.7 | 33.5 | 48.9 | 53.1 | |
Sugahara_RION_task1_4 | Sugahara2022 | 16 | 52.7 | 36.3 | 76.5 | 54.1 | 47.8 | 73.7 | 26.7 | 65.4 | 23.9 | 63.5 | 59.8 | |
Yu_XIAOMI_task1_1 | Yu2022 | 21 | 46.2 | 39.5 | 59.9 | 43.8 | 32.6 | 68.7 | 37.0 | 51.8 | 24.7 | 62.7 | 42.0 | |
Zaragoza-Paredes_UPV_task1_1 | Zaragoza_Paredes2022 | 44 | 43.8 | 64.2 | 48.8 | 29.9 | 30.8 | 64.9 | 54.4 | 23.3 | 14.1 | 58.9 | 49.0 | |
Zaragoza-Paredes_UPV_task1_2 | Zaragoza_Paredes2022 | 46 | 41.9 | 67.5 | 41.5 | 27.5 | 26.0 | 64.2 | 56.9 | 15.2 | 12.1 | 58.7 | 50.0 | |
Zhang_THUEE_task1_1 | Shao2022 | 40 | 54.9 | 39.9 | 74.9 | 55.4 | 47.3 | 82.2 | 36.0 | 48.2 | 37.9 | 62.6 | 64.9 | |
Zhang_THUEE_task1_2 | Shao2022 | 48 | 54.4 | 42.1 | 80.4 | 50.1 | 47.7 | 81.3 | 34.2 | 45.6 | 39.1 | 64.0 | 59.4 | |
Zou_PKU_task1_1 | Xin2022 | 20 | 56.3 | 37.6 | 74.9 | 55.9 | 56.2 | 81.6 | 33.1 | 62.1 | 34.3 | 67.0 | 60.1 |
Device-wise performance
Log loss
Unseen devices | Seen devices | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Rank | Submission label |
Technical Report |
Official system rank |
Log loss |
Log loss / Unseen |
Log loss / Seen |
D | S7 | S8 | S9 | S10 | A | B | C | S1 | S2 | S3 |
AI4EDGE_IPL_task1_1 | Anastcio2022 | 42 | 2.414 | 2.820 | 2.076 | 3.553 | 2.482 | 2.701 | 2.691 | 2.672 | 1.734 | 2.025 | 1.789 | 2.325 | 2.385 | 2.200 | |
AI4EDGE_IPL_task1_2 | Anastcio2022 | 41 | 2.365 | 2.923 | 1.900 | 3.623 | 2.602 | 2.706 | 2.707 | 2.977 | 1.582 | 1.797 | 1.684 | 2.160 | 2.182 | 1.995 | |
AI4EDGE_IPL_task1_3 | Anastcio2022 | 17 | 1.398 | 1.588 | 1.240 | 1.782 | 1.511 | 1.592 | 1.556 | 1.500 | 1.168 | 1.220 | 1.154 | 1.255 | 1.359 | 1.286 | |
AI4EDGE_IPL_task1_4 | Anastcio2022 | 11 | 1.330 | 1.467 | 1.215 | 1.775 | 1.350 | 1.465 | 1.374 | 1.370 | 1.123 | 1.214 | 1.153 | 1.211 | 1.378 | 1.213 | |
AIT_Essex_task1_1 | Pham2022 | 34 | 1.636 | 1.806 | 1.494 | 2.479 | 1.380 | 1.653 | 1.980 | 1.541 | 1.355 | 1.775 | 1.513 | 1.357 | 1.671 | 1.292 | |
AIT_Essex_task1_2 | Pham2022 | 36 | 1.787 | 2.138 | 1.494 | 2.958 | 1.699 | 1.937 | 2.136 | 1.961 | 1.323 | 1.580 | 1.331 | 1.532 | 1.713 | 1.485 | |
AIT_Essex_task1_3 | Pham2022 | 37 | 1.808 | 2.258 | 1.434 | 2.959 | 1.692 | 1.897 | 2.581 | 2.159 | 1.205 | 1.559 | 1.481 | 1.492 | 1.470 | 1.395 | |
Cai_XJTLU_task1_1 | Cai2022 | 26 | 1.515 | 1.847 | 1.238 | 2.264 | 1.361 | 1.539 | 2.193 | 1.876 | 1.025 | 1.324 | 1.125 | 1.288 | 1.355 | 1.313 | |
Cai_XJTLU_task1_2 | Cai2022 | 30 | 1.580 | 1.920 | 1.297 | 2.219 | 1.372 | 1.600 | 2.504 | 1.906 | 1.089 | 1.366 | 1.175 | 1.346 | 1.422 | 1.386 | |
Cai_XJTLU_task1_3 | Cai2022 | 33 | 1.635 | 2.059 | 1.282 | 2.418 | 1.414 | 1.759 | 2.704 | 1.998 | 1.049 | 1.397 | 1.149 | 1.329 | 1.400 | 1.368 | |
Cai_XJTLU_task1_4 | Cai2022 | 27 | 1.564 | 1.916 | 1.270 | 2.789 | 1.323 | 1.454 | 2.081 | 1.932 | 1.045 | 1.372 | 1.112 | 1.341 | 1.429 | 1.322 | |
Cao_SCUT_task1_1 | Cao2022 | 45 | 2.795 | 3.746 | 2.003 | 9.920 | 2.145 | 2.105 | 2.184 | 2.375 | 1.774 | 2.110 | 1.902 | 2.059 | 2.011 | 2.159 | |
Chang_HYU_task1_1 | Lee2022 | 5 | 1.147 | 1.377 | 0.956 | 1.744 | 1.065 | 1.185 | 1.418 | 1.475 | 0.839 | 0.977 | 0.879 | 1.008 | 1.015 | 1.016 | |
Chang_HYU_task1_2 | Lee2022 | 6 | 1.187 | 1.426 | 0.987 | 1.823 | 1.126 | 1.127 | 1.444 | 1.610 | 0.884 | 0.978 | 0.927 | 1.054 | 1.038 | 1.043 | |
Chang_HYU_task1_3 | Lee2022 | 8 | 1.190 | 1.428 | 0.992 | 1.976 | 1.080 | 1.186 | 1.374 | 1.526 | 0.882 | 0.985 | 0.928 | 1.028 | 1.083 | 1.045 | |
Chang_HYU_task1_4 | Lee2022a | 7 | 1.187 | 1.433 | 0.982 | 1.846 | 1.132 | 1.137 | 1.440 | 1.611 | 0.856 | 0.965 | 0.936 | 1.051 | 1.042 | 1.043 | |
Dong_NCUT_task1_1 | Dong2022 | 29 | 1.568 | 1.872 | 1.314 | 2.012 | 1.547 | 1.630 | 2.315 | 1.857 | 1.083 | 1.239 | 1.170 | 1.433 | 1.494 | 1.463 | |
Houyb_XDU_task1_1 | Hou2022 | 22 | 1.481 | 1.740 | 1.265 | 2.361 | 1.557 | 1.451 | 1.724 | 1.609 | 1.028 | 1.262 | 1.132 | 1.317 | 1.476 | 1.375 | |
Liang_UESTC_task1_1 | Liang2022 | 38 | 1.934 | 2.289 | 1.637 | 3.535 | 1.861 | 1.950 | 1.726 | 2.373 | 1.291 | 1.428 | 1.493 | 2.011 | 1.789 | 1.813 | |
Liang_UESTC_task1_2 | Liang2022 | 47 | 2.916 | 3.346 | 2.557 | 4.393 | 2.989 | 2.820 | 2.730 | 3.798 | 1.890 | 2.140 | 2.204 | 3.056 | 3.097 | 2.956 | |
Liang_UESTC_task1_3 | Liang2022 | 43 | 2.701 | 3.063 | 2.400 | 3.483 | 2.931 | 2.782 | 2.601 | 3.516 | 1.805 | 2.093 | 1.973 | 2.992 | 2.757 | 2.780 | |
Liang_UESTC_task1_4 | Liang2022 | 32 | 1.612 | 1.690 | 1.546 | 2.074 | 1.588 | 1.600 | 1.527 | 1.660 | 1.448 | 1.663 | 1.509 | 1.569 | 1.524 | 1.566 | |
DCASE2022 baseline | 1.532 | 1.725 | 1.372 | 1.894 | 1.485 | 1.573 | 1.864 | 1.807 | 1.108 | 1.360 | 1.299 | 1.478 | 1.528 | 1.460 | |||
Morocutti_JKU_task1_1 | Morocutti2022 | 12 | 1.339 | 1.509 | 1.197 | 1.884 | 1.259 | 1.379 | 1.602 | 1.420 | 1.028 | 1.320 | 1.156 | 1.215 | 1.275 | 1.189 | |
Morocutti_JKU_task1_2 | Morocutti2022 | 13 | 1.355 | 1.512 | 1.224 | 1.907 | 1.340 | 1.467 | 1.424 | 1.420 | 1.025 | 1.351 | 1.186 | 1.220 | 1.314 | 1.250 | |
Morocutti_JKU_task1_3 | Morocutti2022 | 10 | 1.320 | 1.508 | 1.162 | 1.982 | 1.232 | 1.372 | 1.541 | 1.414 | 0.995 | 1.263 | 1.107 | 1.192 | 1.235 | 1.182 | |
Morocutti_JKU_task1_4 | Morocutti2022 | 9 | 1.311 | 1.480 | 1.170 | 1.937 | 1.240 | 1.337 | 1.511 | 1.376 | 1.004 | 1.275 | 1.116 | 1.191 | 1.259 | 1.175 | |
Olisaemeka_ARU_task1_1 | Olisaemeka2022 | 39 | 2.055 | 2.515 | 1.671 | 2.871 | 2.236 | 2.247 | 2.652 | 2.567 | 1.389 | 1.583 | 1.488 | 1.822 | 1.915 | 1.832 | |
Park_KT_task1_1 | Kim2022 | 25 | 1.504 | 1.768 | 1.284 | 1.942 | 1.504 | 1.562 | 1.860 | 1.970 | 1.020 | 1.326 | 1.140 | 1.331 | 1.405 | 1.480 | |
Park_KT_task1_2 | Kim2022 | 19 | 1.431 | 1.624 | 1.270 | 1.979 | 1.339 | 1.418 | 1.657 | 1.728 | 1.060 | 1.323 | 1.081 | 1.387 | 1.456 | 1.313 | |
Schmid_CPJKU_task1_1 | Schmid2022 | 2 | 1.092 | 1.236 | 0.972 | 1.431 | 1.058 | 1.042 | 1.305 | 1.343 | 0.783 | 1.048 | 0.877 | 1.026 | 1.086 | 1.010 | |
Schmid_CPJKU_task1_2 | Schmid2022 | 4 | 1.105 | 1.218 | 1.011 | 1.548 | 1.075 | 1.048 | 1.171 | 1.248 | 0.847 | 1.077 | 0.932 | 1.076 | 1.098 | 1.039 | |
Schmid_CPJKU_task1_3 | Schmid2022 | 1 | 1.091 | 1.231 | 0.974 | 1.507 | 1.042 | 1.038 | 1.272 | 1.299 | 0.792 | 1.046 | 0.915 | 1.020 | 1.084 | 0.984 | |
Schmid_CPJKU_task1_4 | Schmid2022 | 3 | 1.102 | 1.229 | 0.997 | 1.464 | 1.075 | 1.047 | 1.248 | 1.311 | 0.817 | 1.075 | 0.900 | 1.047 | 1.108 | 1.035 | |
Schmidt_FAU_task1_1 | Schmidt2022 | 35 | 1.731 | 2.139 | 1.390 | 3.108 | 1.589 | 1.652 | 2.011 | 2.337 | 1.150 | 1.435 | 1.364 | 1.428 | 1.486 | 1.480 | |
Singh_Surrey_task1_1 | Singh2022 | 28 | 1.565 | 1.835 | 1.341 | 1.909 | 1.511 | 1.505 | 2.079 | 2.168 | 1.081 | 1.313 | 1.230 | 1.465 | 1.488 | 1.468 | |
Singh_Surrey_task1_2 | Singh2022 | 31 | 1.606 | 1.898 | 1.362 | 2.122 | 1.493 | 1.516 | 2.151 | 2.207 | 1.090 | 1.355 | 1.233 | 1.483 | 1.516 | 1.496 | |
Singh_Surrey_task1_3 | Singh2022 | 23 | 1.492 | 1.728 | 1.296 | 1.808 | 1.413 | 1.445 | 1.972 | 1.999 | 1.040 | 1.300 | 1.166 | 1.407 | 1.448 | 1.416 | |
Singh_Surrey_task1_4 | Singh2022 | 24 | 1.499 | 1.754 | 1.287 | 1.820 | 1.413 | 1.453 | 2.040 | 2.042 | 1.030 | 1.282 | 1.165 | 1.406 | 1.437 | 1.404 | |
Sugahara_RION_task1_1 | Sugahara2022 | 18 | 1.405 | 1.576 | 1.261 | 1.866 | 1.461 | 1.216 | 1.378 | 1.960 | 1.032 | 1.353 | 1.180 | 1.505 | 1.236 | 1.263 | |
Sugahara_RION_task1_2 | Sugahara2022 | 15 | 1.389 | 1.534 | 1.269 | 1.798 | 1.373 | 1.204 | 1.414 | 1.880 | 1.016 | 1.422 | 1.174 | 1.431 | 1.275 | 1.298 | |
Sugahara_RION_task1_3 | Sugahara2022 | 14 | 1.366 | 1.496 | 1.257 | 1.757 | 1.344 | 1.196 | 1.396 | 1.785 | 1.011 | 1.409 | 1.172 | 1.401 | 1.269 | 1.280 | |
Sugahara_RION_task1_4 | Sugahara2022 | 16 | 1.397 | 1.691 | 1.152 | 2.857 | 1.225 | 1.235 | 1.467 | 1.670 | 1.028 | 1.248 | 1.124 | 1.166 | 1.178 | 1.171 | |
Yu_XIAOMI_task1_1 | Yu2022 | 21 | 1.456 | 1.619 | 1.321 | 1.708 | 1.383 | 1.470 | 1.974 | 1.561 | 1.040 | 1.388 | 1.143 | 1.419 | 1.540 | 1.396 | |
Zaragoza-Paredes_UPV_task1_1 | Zaragoza_Paredes2022 | 44 | 2.709 | 3.181 | 2.315 | 3.420 | 2.402 | 2.540 | 4.638 | 2.906 | 1.861 | 2.791 | 1.976 | 2.152 | 2.471 | 2.641 | |
Zaragoza-Paredes_UPV_task1_2 | Zaragoza_Paredes2022 | 46 | 2.904 | 3.254 | 2.613 | 3.502 | 2.669 | 2.861 | 4.235 | 3.004 | 2.122 | 3.252 | 2.281 | 2.326 | 2.752 | 2.943 | |
Zhang_THUEE_task1_1 | Shao2022 | 40 | 2.096 | 2.435 | 1.814 | 2.891 | 1.896 | 2.103 | 2.839 | 2.444 | 1.589 | 1.909 | 1.585 | 1.927 | 1.942 | 1.934 | |
Zhang_THUEE_task1_2 | Shao2022 | 48 | 3.068 | 4.008 | 2.284 | 4.883 | 2.930 | 3.463 | 5.471 | 3.294 | 2.096 | 2.320 | 2.148 | 2.332 | 2.533 | 2.277 | |
Zou_PKU_task1_1 | Xin2022 | 20 | 1.442 | 1.842 | 1.108 | 2.401 | 1.314 | 1.401 | 2.214 | 1.882 | 0.899 | 1.053 | 0.932 | 1.194 | 1.288 | 1.281 |
Accuracy
Unseen devices | Seen devices | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Rank | Submission label |
Technical Report |
Official system rank |
Accuracy |
Accuracy / Unseen |
Accuracy / Seen |
D | S7 | S8 | S9 | S10 | A | B | C | S1 | S2 | S3 |
AI4EDGE_IPL_task1_1 | Anastcio2022 | 42 | 47.0 | 41.6 | 51.5 | 36.2 | 43.3 | 43.1 | 41.8 | 43.4 | 54.4 | 52.1 | 53.8 | 48.8 | 49.0 | 51.0 | |
AI4EDGE_IPL_task1_2 | Anastcio2022 | 41 | 46.7 | 41.2 | 51.2 | 35.6 | 43.6 | 43.6 | 41.3 | 42.1 | 54.1 | 52.1 | 53.4 | 49.5 | 47.3 | 50.9 | |
AI4EDGE_IPL_task1_3 | Anastcio2022 | 17 | 49.4 | 42.5 | 55.1 | 34.1 | 44.9 | 44.4 | 43.9 | 45.2 | 58.1 | 55.6 | 58.2 | 54.7 | 50.7 | 53.4 | |
AI4EDGE_IPL_task1_4 | Anastcio2022 | 11 | 51.6 | 46.5 | 55.9 | 35.5 | 49.6 | 47.6 | 50.2 | 49.4 | 60.3 | 55.5 | 57.7 | 56.1 | 49.6 | 56.0 | |
AIT_Essex_task1_1 | Pham2022 | 34 | 53.0 | 50.1 | 55.4 | 42.2 | 56.7 | 52.2 | 45.8 | 53.6 | 62.6 | 49.1 | 56.2 | 56.7 | 50.1 | 57.8 | |
AIT_Essex_task1_2 | Pham2022 | 36 | 51.9 | 47.2 | 55.8 | 39.8 | 52.3 | 49.1 | 45.7 | 49.0 | 63.0 | 52.0 | 60.7 | 53.6 | 50.9 | 54.4 | |
AIT_Essex_task1_3 | Pham2022 | 37 | 55.2 | 49.9 | 59.7 | 42.9 | 56.4 | 52.6 | 46.0 | 51.5 | 67.3 | 56.0 | 61.1 | 58.4 | 56.3 | 59.2 | |
Cai_XJTLU_task1_1 | Cai2022 | 26 | 47.8 | 40.7 | 53.7 | 34.4 | 49.5 | 45.9 | 33.2 | 40.3 | 62.0 | 50.6 | 58.3 | 50.8 | 49.3 | 50.9 | |
Cai_XJTLU_task1_2 | Cai2022 | 30 | 46.4 | 40.2 | 51.6 | 37.5 | 49.3 | 45.1 | 29.5 | 39.6 | 60.0 | 49.2 | 56.0 | 49.1 | 46.0 | 49.2 | |
Cai_XJTLU_task1_3 | Cai2022 | 33 | 45.2 | 36.6 | 52.3 | 33.2 | 46.0 | 40.6 | 26.0 | 37.3 | 62.8 | 46.8 | 56.4 | 49.3 | 48.8 | 49.7 | |
Cai_XJTLU_task1_4 | Cai2022 | 27 | 48.0 | 41.9 | 53.1 | 35.2 | 50.3 | 49.4 | 35.9 | 38.9 | 62.2 | 49.3 | 58.0 | 49.7 | 48.2 | 51.1 | |
Cao_SCUT_task1_1 | Cao2022 | 45 | 48.7 | 44.0 | 52.5 | 29.2 | 49.4 | 47.5 | 47.4 | 46.6 | 61.2 | 48.6 | 55.2 | 50.9 | 50.4 | 48.9 | |
Chang_HYU_task1_1 | Lee2022 | 5 | 60.8 | 55.1 | 65.6 | 46.4 | 63.2 | 59.5 | 54.5 | 52.1 | 70.3 | 64.8 | 67.2 | 64.0 | 63.3 | 63.7 | |
Chang_HYU_task1_2 | Lee2022 | 6 | 59.2 | 52.0 | 65.2 | 39.7 | 60.3 | 59.3 | 51.8 | 48.9 | 68.8 | 65.1 | 66.8 | 63.4 | 63.6 | 63.5 | |
Chang_HYU_task1_3 | Lee2022 | 8 | 59.4 | 52.6 | 65.0 | 37.0 | 62.3 | 58.5 | 54.4 | 50.7 | 69.2 | 64.4 | 67.0 | 64.6 | 61.8 | 63.1 | |
Chang_HYU_task1_4 | Lee2022a | 7 | 59.3 | 51.9 | 65.5 | 39.9 | 60.2 | 59.4 | 52.1 | 48.1 | 70.2 | 65.4 | 66.8 | 63.4 | 63.4 | 63.6 | |
Dong_NCUT_task1_1 | Dong2022 | 29 | 48.0 | 38.8 | 55.6 | 31.9 | 48.5 | 44.7 | 28.6 | 40.6 | 64.7 | 57.8 | 60.3 | 51.0 | 49.9 | 50.0 | |
Houyb_XDU_task1_1 | Hou2022 | 22 | 49.3 | 42.8 | 54.6 | 31.4 | 46.9 | 49.4 | 43.6 | 42.8 | 62.7 | 53.9 | 59.2 | 52.4 | 48.9 | 50.7 | |
Liang_UESTC_task1_1 | Liang2022 | 38 | 41.3 | 36.2 | 45.5 | 25.4 | 41.0 | 40.8 | 44.2 | 29.6 | 55.8 | 49.7 | 49.1 | 36.7 | 41.1 | 40.4 | |
Liang_UESTC_task1_2 | Liang2022 | 47 | 29.9 | 26.4 | 32.9 | 22.4 | 27.6 | 29.6 | 30.7 | 21.5 | 43.3 | 36.3 | 37.0 | 25.9 | 26.9 | 27.8 | |
Liang_UESTC_task1_3 | Liang2022 | 43 | 28.5 | 24.6 | 31.7 | 21.4 | 25.0 | 26.7 | 28.8 | 21.0 | 41.4 | 34.3 | 37.2 | 24.4 | 26.7 | 26.4 | |
Liang_UESTC_task1_4 | Liang2022 | 32 | 44.1 | 41.8 | 46.1 | 31.6 | 44.4 | 44.1 | 46.6 | 42.1 | 49.1 | 42.6 | 47.2 | 45.7 | 46.5 | 45.2 | |
DCASE2022 baseline | 44.2 | 38.1 | 49.4 | 33.2 | 45.0 | 42.6 | 34.0 | 35.4 | 59.9 | 50.3 | 53.5 | 44.3 | 43.4 | 44.8 | |||
Morocutti_JKU_task1_1 | Morocutti2022 | 12 | 53.8 | 48.6 | 58.1 | 37.8 | 54.9 | 52.1 | 48.2 | 49.8 | 64.3 | 53.2 | 60.6 | 57.0 | 54.9 | 58.5 | |
Morocutti_JKU_task1_2 | Morocutti2022 | 13 | 53.0 | 47.8 | 57.3 | 35.9 | 52.4 | 51.1 | 50.3 | 49.3 | 65.9 | 52.1 | 58.3 | 56.2 | 53.5 | 57.6 | |
Morocutti_JKU_task1_3 | Morocutti2022 | 10 | 54.7 | 48.8 | 59.5 | 35.1 | 55.3 | 53.7 | 48.9 | 51.2 | 66.0 | 55.8 | 61.2 | 58.5 | 56.8 | 59.0 | |
Morocutti_JKU_task1_4 | Morocutti2022 | 9 | 54.5 | 48.8 | 59.2 | 36.6 | 55.1 | 53.5 | 48.5 | 50.1 | 65.4 | 55.0 | 61.7 | 57.7 | 56.6 | 59.0 | |
Olisaemeka_ARU_task1_1 | Olisaemeka2022 | 39 | 36.4 | 28.4 | 43.0 | 24.9 | 33.7 | 34.7 | 23.6 | 25.3 | 54.1 | 44.4 | 48.0 | 37.7 | 36.1 | 37.5 | |
Park_KT_task1_1 | Kim2022 | 25 | 51.7 | 46.3 | 56.1 | 40.5 | 51.6 | 52.1 | 44.1 | 43.4 | 64.8 | 54.2 | 60.6 | 53.1 | 52.6 | 51.4 | |
Park_KT_task1_2 | Kim2022 | 19 | 52.7 | 48.4 | 56.2 | 40.9 | 54.8 | 53.5 | 46.1 | 46.8 | 62.9 | 53.4 | 62.6 | 52.4 | 51.4 | 54.7 | |
Schmid_CPJKU_task1_1 | Schmid2022 | 2 | 59.7 | 54.5 | 64.1 | 46.4 | 61.1 | 61.4 | 51.1 | 52.5 | 71.5 | 61.2 | 69.1 | 62.0 | 58.7 | 62.2 | |
Schmid_CPJKU_task1_2 | Schmid2022 | 4 | 59.6 | 55.3 | 63.2 | 43.8 | 60.7 | 61.7 | 56.0 | 54.5 | 69.5 | 60.3 | 66.7 | 61.0 | 59.7 | 61.9 | |
Schmid_CPJKU_task1_3 | Schmid2022 | 1 | 59.6 | 54.8 | 63.7 | 46.4 | 61.7 | 61.1 | 52.0 | 52.9 | 71.0 | 60.9 | 66.7 | 61.9 | 58.9 | 62.7 | |
Schmid_CPJKU_task1_4 | Schmid2022 | 3 | 59.4 | 54.9 | 63.1 | 47.3 | 60.6 | 61.9 | 52.2 | 52.8 | 70.5 | 60.0 | 67.2 | 61.1 | 58.4 | 61.2 | |
Schmidt_FAU_task1_1 | Schmidt2022 | 35 | 47.5 | 40.7 | 53.2 | 30.6 | 48.2 | 47.9 | 41.4 | 35.3 | 61.4 | 50.7 | 53.9 | 51.9 | 49.9 | 51.1 | |
Singh_Surrey_task1_1 | Singh2022 | 28 | 44.6 | 37.8 | 50.3 | 36.9 | 44.4 | 44.5 | 32.9 | 30.5 | 61.0 | 49.9 | 56.4 | 45.6 | 44.1 | 44.6 | |
Singh_Surrey_task1_2 | Singh2022 | 31 | 44.3 | 37.8 | 49.8 | 34.9 | 45.1 | 44.8 | 32.6 | 31.6 | 60.8 | 49.0 | 55.7 | 45.4 | 42.4 | 45.4 | |
Singh_Surrey_task1_3 | Singh2022 | 23 | 45.9 | 39.1 | 51.5 | 37.7 | 46.5 | 46.0 | 32.4 | 32.8 | 62.1 | 50.0 | 57.5 | 47.3 | 45.1 | 47.2 | |
Singh_Surrey_task1_4 | Singh2022 | 24 | 45.9 | 38.8 | 51.9 | 37.9 | 46.5 | 45.7 | 31.8 | 32.0 | 62.2 | 50.8 | 58.1 | 47.5 | 45.2 | 47.4 | |
Sugahara_RION_task1_1 | Sugahara2022 | 18 | 51.5 | 46.9 | 55.3 | 39.3 | 50.7 | 55.3 | 49.6 | 39.5 | 63.0 | 51.8 | 56.9 | 49.1 | 56.0 | 54.9 | |
Sugahara_RION_task1_2 | Sugahara2022 | 15 | 51.6 | 47.8 | 54.8 | 42.1 | 51.9 | 55.2 | 49.1 | 40.9 | 63.3 | 50.6 | 58.0 | 48.8 | 54.1 | 54.1 | |
Sugahara_RION_task1_3 | Sugahara2022 | 14 | 51.7 | 47.9 | 54.8 | 42.1 | 51.9 | 55.2 | 49.1 | 41.0 | 63.3 | 50.6 | 58.0 | 48.8 | 54.1 | 54.2 | |
Sugahara_RION_task1_4 | Sugahara2022 | 16 | 52.7 | 46.8 | 57.7 | 29.1 | 55.5 | 55.8 | 49.7 | 44.2 | 61.7 | 55.2 | 59.0 | 56.5 | 57.1 | 56.5 | |
Yu_XIAOMI_task1_1 | Yu2022 | 21 | 46.2 | 40.6 | 51.0 | 34.9 | 48.3 | 44.5 | 32.0 | 43.1 | 62.1 | 47.7 | 58.1 | 47.3 | 43.1 | 47.6 | |
Zaragoza-Paredes_UPV_task1_1 | Zaragoza_Paredes2022 | 44 | 43.8 | 40.1 | 47.0 | 42.8 | 42.8 | 41.9 | 34.0 | 38.8 | 54.5 | 42.8 | 52.6 | 45.9 | 42.7 | 43.5 | |
Zaragoza-Paredes_UPV_task1_2 | Zaragoza_Paredes2022 | 46 | 41.9 | 38.8 | 44.6 | 41.4 | 40.5 | 40.0 | 35.0 | 37.1 | 51.8 | 40.1 | 49.7 | 44.3 | 40.8 | 40.7 | |
Zhang_THUEE_task1_1 | Shao2022 | 40 | 54.9 | 47.0 | 61.5 | 40.6 | 51.8 | 53.7 | 37.9 | 51.1 | 69.1 | 60.5 | 65.3 | 58.0 | 58.5 | 57.9 | |
Zhang_THUEE_task1_2 | Shao2022 | 48 | 54.4 | 45.7 | 61.6 | 38.4 | 53.1 | 51.0 | 35.0 | 51.2 | 67.5 | 60.0 | 65.7 | 57.8 | 58.9 | 59.9 | |
Zou_PKU_task1_1 | Xin2022 | 20 | 56.3 | 48.6 | 62.7 | 39.8 | 58.5 | 55.4 | 42.3 | 47.1 | 69.9 | 62.8 | 68.1 | 59.1 | 58.2 | 57.8 |
System characteristics
General characteristics
Rank | Submission label |
Technical Report |
Official system rank |
Logloss (Eval) |
Accuracy (Eval) |
Sampling rate |
Data augmentation |
Features | Embeddings |
---|---|---|---|---|---|---|---|---|---|
AI4EDGE_IPL_task1_1 | Anastcio2022 | 42 | 2.414 | 47.0 | 8kHz | pitch shifting, time stretching, mixup, time masking, frequency masking | log-mel energies | ||
AI4EDGE_IPL_task1_2 | Anastcio2022 | 41 | 2.365 | 46.7 | 8kHz | pitch shifting, time stretching, mixup, time masking, frequency masking | log-mel energies | ||
AI4EDGE_IPL_task1_3 | Anastcio2022 | 17 | 1.398 | 49.4 | 8kHz | pitch shifting, time stretching, mixup, time masking, frequency masking | log-mel energies | ||
AI4EDGE_IPL_task1_4 | Anastcio2022 | 11 | 1.330 | 51.6 | 8kHz | pitch shifting, time stretching, mixup, time masking, frequency masking | log-mel energies | ||
AIT_Essex_task1_1 | Pham2022 | 34 | 1.636 | 53.0 | 44.1kHz | mixup, random cropping, SpecAugment | CQT, Gammatonegram, Mel | ||
AIT_Essex_task1_2 | Pham2022 | 36 | 1.787 | 51.9 | 44.1kHz | mixup, random cropping, SpecAugment | CQT, Gammatonegram, Mel | ||
AIT_Essex_task1_3 | Pham2022 | 37 | 1.808 | 55.2 | 44.1kHz | mixup, random cropping, SpecAugment | CQT, Gammatonegram, Mel | ||
Cai_XJTLU_task1_1 | Cai2022 | 26 | 1.515 | 47.8 | 44.1kHz | log-mel energies | |||
Cai_XJTLU_task1_2 | Cai2022 | 30 | 1.580 | 46.4 | 44.1kHz | log-mel energies | |||
Cai_XJTLU_task1_3 | Cai2022 | 33 | 1.635 | 45.2 | 44.1kHz | mixup, pitch shifting, spectrum correction | log-mel energies | ||
Cai_XJTLU_task1_4 | Cai2022 | 27 | 1.564 | 48.0 | 44.1kHz | mixup, pitch shifting, spectrum correction | log-mel energies | ||
Cao_SCUT_task1_1 | Cao2022 | 45 | 2.795 | 48.7 | 44.1kHz | mixup, time stretching,pitch shifting,spectrum correction | log-mel energies | ||
Chang_HYU_task1_1 | Lee2022 | 5 | 1.147 | 60.8 | 16kHz | mixup, SpecAugment, time masking, frequency masking, temporal shuffle | log-mel energies | ||
Chang_HYU_task1_2 | Lee2022 | 6 | 1.187 | 59.2 | 16kHz | mixup, SpecAugment, time masking, frequency masking, temporal shuffle | log-mel energies | ||
Chang_HYU_task1_3 | Lee2022 | 8 | 1.190 | 59.4 | 16kHz | mixup, SpecAugment, time masking, frequency masking, temporal shuffle | log-mel energies | ||
Chang_HYU_task1_4 | Lee2022a | 7 | 1.187 | 59.3 | 16kHz | mixup, SpecAugment, time masking, frequency masking, temporal shuffle | log-mel energies | ||
Dong_NCUT_task1_1 | Dong2022 | 29 | 1.568 | 48.0 | 44.1kHz | mixup, SpecAugment | log-mel energies,delta and delta-delta | ||
Houyb_XDU_task1_1 | Hou2022 | 22 | 1.481 | 49.3 | 44.1kHz | SpecAugment, mixup | log-mel energies | ||
Liang_UESTC_task1_1 | Liang2022 | 38 | 1.934 | 41.3 | 44.1kHz | time masking, frequency masking, time warping, mixup | log-mel energies | ||
Liang_UESTC_task1_2 | Liang2022 | 47 | 2.916 | 29.9 | 44.1kHz | time masking, frequency masking, time warping, mixup | log-mel energies | ||
Liang_UESTC_task1_3 | Liang2022 | 43 | 2.701 | 28.5 | 44.1kHz | time masking, frequency masking, time warping, frequency warping, mixup | log-mel energies | ||
Liang_UESTC_task1_4 | Liang2022 | 32 | 1.612 | 44.1 | 44.1kHz | noise addition, pitch shifting, speed changing, time masking, mixup | log-mel energies | ||
DCASE2022 baseline | 1.532 | 44.2 | 44.1kHz | log-mel energies | |||||
Morocutti_JKU_task1_1 | Morocutti2022 | 12 | 1.339 | 53.8 | 22.05kHz | mixup, pitch shifting, time stretching, shifting, adding gaussian noise | mel-spectrogram | ||
Morocutti_JKU_task1_2 | Morocutti2022 | 13 | 1.355 | 53.0 | 22.05kHz | mixup, pitch shifting, time stretching, shifting, adding gaussian noise | mel-spectrogram | ||
Morocutti_JKU_task1_3 | Morocutti2022 | 10 | 1.320 | 54.7 | 22.05kHz | mixup, pitch shifting, time stretching, shifting, adding gaussian noise | mel-spectrogram | ||
Morocutti_JKU_task1_4 | Morocutti2022 | 9 | 1.311 | 54.5 | 22.05kHz | mixup, pitch shifting, time stretching, shifting, adding gaussian noise | mel-spectrogram | ||
Olisaemeka_ARU_task1_1 | Olisaemeka2022 | 39 | 2.055 | 36.4 | 44.1kHz | log-mel energies | |||
Park_KT_task1_1 | Kim2022 | 25 | 1.504 | 51.7 | 22.05kHz | SpecAugment | log-mel energies | ||
Park_KT_task1_2 | Kim2022 | 19 | 1.431 | 52.7 | 22.05kHz | SpecAugment | log-mel energies | ||
Schmid_CPJKU_task1_1 | Schmid2022 | 2 | 1.092 | 59.7 | 32.0kHz | mixup, mixstyle, pitch shifting | log-mel energies | ||
Schmid_CPJKU_task1_2 | Schmid2022 | 4 | 1.105 | 59.6 | 32.0kHz | mixstyle, pitch shifting | log-mel energies | ||
Schmid_CPJKU_task1_3 | Schmid2022 | 1 | 1.091 | 59.6 | 32.0kHz | mixstyle, pitch shifting | log-mel energies | ||
Schmid_CPJKU_task1_4 | Schmid2022 | 3 | 1.102 | 59.4 | 32.0kHz | mixup, mixstyle, pitch shifting | log-mel energies | ||
Schmidt_FAU_task1_1 | Schmidt2022 | 35 | 1.731 | 47.5 | 16kHz | mixup, rolling, SpecAugment | log-mel energies | ||
Singh_Surrey_task1_1 | Singh2022 | 28 | 1.565 | 44.6 | 44.1kHz | log-mel energies | |||
Singh_Surrey_task1_2 | Singh2022 | 31 | 1.606 | 44.3 | 44.1kHz | log-mel energies | |||
Singh_Surrey_task1_3 | Singh2022 | 23 | 1.492 | 45.9 | 44.1kHz | log-mel energies | |||
Singh_Surrey_task1_4 | Singh2022 | 24 | 1.499 | 45.9 | 44.1kHz | log-mel energies | |||
Sugahara_RION_task1_1 | Sugahara2022 | 18 | 1.405 | 51.5 | 44.1kHz | mixup, SpecAugment, spectrum modulation | log-mel energies, deltas | ||
Sugahara_RION_task1_2 | Sugahara2022 | 15 | 1.389 | 51.6 | 44.1kHz | mixup, SpecAugment, spectrum modulation | log-mel energies, deltas | ||
Sugahara_RION_task1_3 | Sugahara2022 | 14 | 1.366 | 51.7 | 44.1kHz | mixup, SpecAugment, spectrum modulation | log-mel energies, deltas | ||
Sugahara_RION_task1_4 | Sugahara2022 | 16 | 1.397 | 52.7 | 44.1kHz | mixup, SpecAugment, spectrum modulation | log-mel energies, deltas | ||
Yu_XIAOMI_task1_1 | Yu2022 | 21 | 1.456 | 46.2 | 44.1kHz | log-mel energies, spectral entropy, spectral flatness | dilated-CNN | ||
Zaragoza-Paredes_UPV_task1_1 | Zaragoza_Paredes2022 | 44 | 2.709 | 43.8 | 44.1kHz | log-mel energies | |||
Zaragoza-Paredes_UPV_task1_2 | Zaragoza_Paredes2022 | 46 | 2.904 | 41.9 | 44.1kHz | log-mel energies | |||
Zhang_THUEE_task1_1 | Shao2022 | 40 | 2.096 | 54.9 | 44.1kHz | mixup, ImageDataGenerator, temporal crop, Auto levels, pix2pix | log-mel energies | ||
Zhang_THUEE_task1_2 | Shao2022 | 48 | 3.068 | 54.4 | 44.1kHz | mixup, ImageDataGenerator, temporal crop, Auto levels, pix2pix | log-mel energies | ||
Zou_PKU_task1_1 | Xin2022 | 20 | 1.442 | 56.3 | 44.1kHz | SpecAugment++, time shifting | spectrogram | CNN6 |
Machine learning characteristics
Rank | Code |
Technical Report |
Official system rank |
Logloss (Eval) |
Accuracy (Eval) |
External data usage |
External data sources |
Model complexity |
Model MACS |
Classifier |
Ensemble subsystems |
Decision making |
Framework | Pipeline |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AI4EDGE_IPL_task1_1 | Anastcio2022 | 42 | 2.414 | 47.0 | 68918 | 21127552 | CNN, ensemble | 2 | keras/tensorflow | pretraining, ensemble, training, weight quantization | ||||
AI4EDGE_IPL_task1_2 | Anastcio2022 | 41 | 2.365 | 46.7 | 68918 | 21127552 | CNN, ensemble | 2 | keras/tensorflow | pretraining, ensemble, training, weight quantization | ||||
AI4EDGE_IPL_task1_3 | Anastcio2022 | 17 | 1.398 | 49.4 | 51986 | 25475456 | CNN, ensemble | 10 | keras/tensorflow | pretraining, ensemble, training, knowledge distillation, weight quantization | ||||
AI4EDGE_IPL_task1_4 | Anastcio2022 | 11 | 1.330 | 51.6 | 51986 | 25475456 | CNN, ensemble | 10 | keras/tensorflow | pretraining, ensemble, training, knowledge distillation, weight quantization | ||||
AIT_Essex_task1_1 | Pham2022 | 34 | 1.636 | 53.0 | 33822 | 900000 | CNN | 3 | late fusion of predicted probabilities | tensorflow | training | |||
AIT_Essex_task1_2 | Pham2022 | 36 | 1.787 | 51.9 | 31902 | 750000 | CNN | 3 | late fusion of predicted probabilities | tensorflow | training | |||
AIT_Essex_task1_3 | Pham2022 | 37 | 1.808 | 55.2 | 115998 | 900000 | CNN | 3 | late fusion of predicted probabilities | tensorflow | training | |||
Cai_XJTLU_task1_1 | Cai2022 | 26 | 1.515 | 47.8 | 25526 | 6287030 | CNN | pytorch | ||||||
Cai_XJTLU_task1_2 | Cai2022 | 30 | 1.580 | 46.4 | 25526 | 6287030 | CNN | pytorch | ||||||
Cai_XJTLU_task1_3 | Cai2022 | 33 | 1.635 | 45.2 | 35926 | 7337718 | CNN | pytorch | ||||||
Cai_XJTLU_task1_4 | Cai2022 | 27 | 1.564 | 48.0 | 35926 | 7337718 | CNN | pytorch | ||||||
Cao_SCUT_task1_1 | Cao2022 | 45 | 2.795 | 48.7 | embeddings, pre-trained model | 125330 | 8637250 | BC-ResNet, CNN | pytorch | pretraining, training, adaptation | ||||
Chang_HYU_task1_1 | Lee2022 | 5 | 1.147 | 60.8 | directly | 126580 | 26763000 | CNN, BC-Res2Net | categorical cross entropy | pytorch | pretraining, weight quantization, fine tuning, data-random-drop | |||
Chang_HYU_task1_2 | Lee2022 | 6 | 1.187 | 59.2 | directly | 126580 | 26763000 | CNN, BC-Res2Net | categorical cross entropy | pytorch | pretraining, weight quantization, fine tuning, data-random-drop | |||
Chang_HYU_task1_3 | Lee2022 | 8 | 1.190 | 59.4 | directly | 126580 | 26763000 | CNN, BC-Res2Net | categorical cross entropy | pytorch | pretraining, weight quantization, fine tuning, data-random-drop | |||
Chang_HYU_task1_4 | Lee2022a | 7 | 1.187 | 59.3 | directly | 126580 | 26763000 | CNN, BC-Res2Net | categorical cross entropy | pytorch | pretraining, weight quantization, fine tuning, data-random-drop | |||
Dong_NCUT_task1_1 | Dong2022 | 29 | 1.568 | 48.0 | 70608 | 28461216 | FHR_Mobilenet | average | keras/tensorflow | training, weight quantization | ||||
Houyb_XDU_task1_1 | Hou2022 | 22 | 1.481 | 49.3 | embeddings | 57957 | 28513000 | CNN | pytorch | data augment, training, adaptation, weight quantization | ||||
Liang_UESTC_task1_1 | Liang2022 | 38 | 1.934 | 41.3 | 85800 | 20500000 | BC-ResNet | pytorch | training, knowledge distillation, weight quantization | |||||
Liang_UESTC_task1_2 | Liang2022 | 47 | 2.916 | 29.9 | 85800 | 20500000 | BC-ResNet | pytorch | training, knowledge distillation, weight quantization | |||||
Liang_UESTC_task1_3 | Liang2022 | 43 | 2.701 | 28.5 | 85800 | 20500000 | BC-ResNet | pytorch | training, knowledge distillation, weight quantization | |||||
Liang_UESTC_task1_4 | Liang2022 | 32 | 1.612 | 44.1 | 110452 | 11186000 | MobileNetV2 | keras/tensorflow | training, adaptation, pruning, weight quantization | |||||
DCASE2022 baseline | 1.532 | 44.2 | 46512 | 29234920 | CNN | keras/tensorflow | pretraining, training, adaptation, pruning, weight quantization | |||||||
Morocutti_JKU_task1_1 | Morocutti2022 | 12 | 1.339 | 53.8 | 65790 | 29325000 | ensemble, CNN | 3 | average | pytorch | preprocessing, training teacher, training student, weight quantization | |||
Morocutti_JKU_task1_2 | Morocutti2022 | 13 | 1.355 | 53.0 | 65790 | 29325000 | ensemble, CNN | 3 | average | pytorch | preprocessing, training teacher, training student, weight quantization | |||
Morocutti_JKU_task1_3 | Morocutti2022 | 10 | 1.320 | 54.7 | 65790 | 29325000 | ensemble, CNN | 3 | average | pytorch | preprocessing, training teacher, training student, weight quantization | |||
Morocutti_JKU_task1_4 | Morocutti2022 | 9 | 1.311 | 54.5 | 65790 | 29325000 | ensemble, CNN | 3 | average | pytorch | preprocessing, training teacher, training student, weight quantization | |||
Olisaemeka_ARU_task1_1 | Olisaemeka2022 | 39 | 2.055 | 36.4 | 96473 | 3283692 | CNN | keras/tensorflow | pretraining, training, weight quantization | |||||
Park_KT_task1_1 | Kim2022 | 25 | 1.504 | 51.7 | 113378 | 29481000 | CNN | pytorch | training, weight quantization | |||||
Park_KT_task1_2 | Kim2022 | 19 | 1.431 | 52.7 | 113378 | 29481000 | CNN | pytorch | training, weight quantization | |||||
Schmid_CPJKU_task1_1 | Schmid2022 | 2 | 1.092 | 59.7 | pre-trained model | PaSST | 127046 | 29056324 | RF-regularized CNNs, PaSST transformer | pytorch | training teacher, training student, knowledge distillation, weight quantization | |||
Schmid_CPJKU_task1_2 | Schmid2022 | 4 | 1.105 | 59.6 | pre-trained model | PaSST | 127046 | 29056324 | RF-regularized CNNs, PaSST transformer | pytorch | training teacher, training student, knowledge distillation, weight quantization | |||
Schmid_CPJKU_task1_3 | Schmid2022 | 1 | 1.091 | 59.6 | pre-trained model | PaSST | 121610 | 28240924 | RF-regularized CNNs, PaSST transformer | pytorch | training teacher, training student, knowledge distillation, weight quantization | |||
Schmid_CPJKU_task1_4 | Schmid2022 | 3 | 1.102 | 59.4 | pre-trained model | PaSST, AudioSet | 121610 | 28240924 | RF-regularized CNNs, PaSST transformer | pytorch | training teacher, training student, knowledge distillation, weight quantization | |||
Schmidt_FAU_task1_1 | Schmidt2022 | 35 | 1.731 | 47.5 | 127943 | 15163468 | CNN, SVM | pytorch | pretraining, training, pruning, weight quantization | |||||
Singh_Surrey_task1_1 | Singh2022 | 28 | 1.565 | 44.6 | directly | 13138 | 4129320 | CNN | maximum likelihood | keras/tensorflow | training (from scratch), pruning, weight quantization | |||
Singh_Surrey_task1_2 | Singh2022 | 31 | 1.606 | 44.3 | directly | 14886 | 5404520 | CNN | maximum likelihood | keras/tensorflow | training (from scratch), pruning, weight quantization | |||
Singh_Surrey_task1_3 | Singh2022 | 23 | 1.492 | 45.9 | directly | 59570 | 18585480 | CNN | 5 | average | keras/tensorflow | training (from scratch), pruning, weight quantization | ||
Singh_Surrey_task1_4 | Singh2022 | 24 | 1.499 | 45.9 | directly | 60958 | 19831880 | CNN | 5 | average | keras/tensorflow | training (from scratch), pruning, weight quantization | ||
Sugahara_RION_task1_1 | Sugahara2022 | 18 | 1.405 | 51.5 | 120229 | 26607000 | MobileNet | weighted average | pytorch | training (from scratch), weight quantization | ||||
Sugahara_RION_task1_2 | Sugahara2022 | 15 | 1.389 | 51.6 | 120229 | 26607000 | MobileNet | weighted average | pytorch | training (from scratch), weight quantization | ||||
Sugahara_RION_task1_3 | Sugahara2022 | 14 | 1.366 | 51.7 | 120229 | 26607000 | MobileNet | weighted average | pytorch | training (from scratch), weight quantization | ||||
Sugahara_RION_task1_4 | Sugahara2022 | 16 | 1.397 | 52.7 | 123346 | 26610000 | MobileNet | pytorch | training (from scratch), weight quantization | |||||
Yu_XIAOMI_task1_1 | Yu2022 | 21 | 1.456 | 46.2 | embeddings | 6306 | 16081000 | CNN | keras/tensorflow | pretraining, training, weight quantization | ||||
Zaragoza-Paredes_UPV_task1_1 | Zaragoza_Paredes2022 | 44 | 2.709 | 43.8 | 28320 | 28570080 | CNN | keras/tensorflow | training, weight quantization | |||||
Zaragoza-Paredes_UPV_task1_2 | Zaragoza_Paredes2022 | 46 | 2.904 | 41.9 | 28320 | 28570080 | CNN | keras/tensorflow | training, weight quantization | |||||
Zhang_THUEE_task1_1 | Shao2022 | 40 | 2.096 | 54.9 | 127160 | 28228320 | Mini-SegNet | 3 | keras/tensorflow | training, pruning, quantization aware training, weight quantization, knowledge distillation | ||||
Zhang_THUEE_task1_2 | Shao2022 | 48 | 3.068 | 54.4 | 126078 | 28098645 | Mini-SegNet | 2 | keras/tensorflow | training, pruning, quantization aware training, weight quantization, knowledge distillation | ||||
Zou_PKU_task1_1 | Xin2022 | 20 | 1.442 | 56.3 | embeddings | 75562 | 28823618 | CNN | pytorch | training, weight quantization |
Technical reports
Ai4edgept Submission to DCASE 2022 Low Complexity Acoustic Scene Classification Task1
Ricardo Anastácio1, Luís Ferreira2, Figueiredo Mónica1,3 and Conde Bento Luís1,4
1electronic engineering, Politécnico de Leiria, Leiria, Portugal, 2University of Coimbra, Coimbra, Portugal, 3Instituto de Telecomunicações, Portugal, 4Institute of Systems and Robotics, Coimbra, Portugal
AI4EDGE_IPL_task1_1 AI4EDGE_IPL_task1_2 AI4EDGE_IPL_task1_3 AI4EDGE_IPL_task1_4
Ai4edgept Submission to DCASE 2022 Low Complexity Acoustic Scene Classification Task1
Ricardo Anastácio1, Luís Ferreira2, Figueiredo Mónica1,3 and Conde Bento Luís1,4
1electronic engineering, Politécnico de Leiria, Leiria, Portugal, 2University of Coimbra, Coimbra, Portugal, 3Instituto de Telecomunicações, Portugal, 4Institute of Systems and Robotics, Coimbra, Portugal
Abstract
This report details the submission to task1 of DCASE2022 competition. The task aims to classify acoustic scenes using devices with low computational power and memory. We propose two ensemble models for scene classification. The first model clusters classes into 2 groups, each of a two-network ensemble being responsible for intra-group discrimination, i.e. discriminating between the classes that are most related in the confusion matrix. The second model implements a canonical one-versus-all ten-network ensemble architecture followed by knowledge distillation, i.e. the ensemble model is used as the teacher network. The student is an optimised version of the DCASE2022 baseline architecture. In both models we resort to three different data pre-processing techniques: audio downsample; mel-spectrogram tuning; and data augmentation. We’ve used the DCASE2022 baseline for all networks - two-network ensemble, ten-network ensemble and student network - on which we have conducted an architecture’s hyperparameter search to identify the best performing architecture, while being compliant with DCASE2022 performance metrics. Results revealed that data pre-processing and knowledge distillation techniques improve overall performance. Nevertheless, a simple two-network ensemble without knowledge distillation, maintains the MACS and parameters size low, while achieving similar results.
System characteristics
Sampling rate | 8kHz |
Data augmentation | pitch shifting, time stretching, mixup, time masking, frequency masking |
Features | log-mel energies |
Classifier | CNN, ensemble |
Low-Complexity Model Based on Depthwise Separable CNN for Acoustic Scene Classification
Yiqiang Cai1, He Tang1, Chenyang Zhu2, Shengchen Li1 and Xi Shao3
1School of Advanced Technology, Xi'an Jiaotong-Liverpool University, Suzhou, China, 2School of Artificial Intelligence and Computing Sciences, Jiangnan University, Wuxi, China, 3College of Tellecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China
Cai_XJTLU_task1_1 Cai_XJTLU_task1_2 Cai_XJTLU_task1_3 Cai_XJTLU_task1_4
Low-Complexity Model Based on Depthwise Separable CNN for Acoustic Scene Classification
Yiqiang Cai1, He Tang1, Chenyang Zhu2, Shengchen Li1 and Xi Shao3
1School of Advanced Technology, Xi'an Jiaotong-Liverpool University, Suzhou, China, 2School of Artificial Intelligence and Computing Sciences, Jiangnan University, Wuxi, China, 3College of Tellecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China
Abstract
The task1 of DCASE 2022 put forward higher requirements for system complexity and the new datasets also brought greater challenges. We tried to reproduce several models in previous years, but did not get a good performance. Therefore, we introduced the depthwise separable CNN method to the baseline architecture, which successfully reduces the complexity and improves the accuracy. We also used three methods of data augmentation, mixup, pitch shifting and stretching to further improve the results.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, pitch shifting, spectrum correction |
Features | log-mel energies |
Classifier | CNN |
Low-Complexity Acoustic Scene Classification Using Broadcasted ResNet and Data Augmentation
Wenchang Cao, Yanxiong Li, Qisheng Huang and Mingle Liu
School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China
Cao_SCUT_task1_1
Low-Complexity Acoustic Scene Classification Using Broadcasted ResNet and Data Augmentation
Wenchang Cao, Yanxiong Li, Qisheng Huang and Mingle Liu
School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China
Abstract
Acoustic scene classification (ASC) is a task to classify each input audio recording into one class of pre-given acoustic scenes. As an important task in Detection and Classification of Acoustic Scenes and Events (DCASE), ASC has attracted a lot of attention from researchers in the community of audio and acoustic signal processing in recent years [1]-[4]. In the work of this report, we focus on the task of low-complexity ASC with multiple devices, namely, Task 1 of the DCASE2022 challenge [5]. In this task, a low-complexity model is required to classify audio recordings recorded by multiple devices (real and simulated). In the proposed ASC method, the BC-ResNet-Mod [6] is used as the backbone of our model whose training strategy is the Cross-Gradient Training (CGT) [7]. In addition, some data augmentation techniques are adopted for further improving the performance of the proposed method. The size of our model is 125.33 KB after model compression, which is lower than the size limit of 128 KB. Evaluated on the development dataset, our system obtains classification accuracy of 51.1%.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, time stretching,pitch shifting,spectrum correction |
Features | log-mel energies |
Classifier | BC-ResNet, CNN |
Acoustic Scene Classification Based on Fhr_mobilenet
Hongxia Dong1, Lin Zhang1, Xichang Cai1, Menglong Wu1, Ziling Qiao1, Yanggang Gan2 and Juan Wu2
1Electronic and Communication Engineering, North China University of Technology, Beijing, China, 2Electronic and Communication Engineering, North China University Of Technology, Beijing, China
Dong_NCUT_task1_1
Acoustic Scene Classification Based on Fhr_mobilenet
Hongxia Dong1, Lin Zhang1, Xichang Cai1, Menglong Wu1, Ziling Qiao1, Yanggang Gan2 and Juan Wu2
1Electronic and Communication Engineering, North China University of Technology, Beijing, China, 2Electronic and Communication Engineering, North China University Of Technology, Beijing, China
Abstract
This technical report describes our submission for Task1 of DCASE2022 challenge. We calculated 128 log-mel energies under the original sampling rate of 44.1KHz for each time slice by taking 2048 FFT points with 50% overlap. Additionally, deltas and deltadeltas were calculated from the log Mel spectrogram and stacked into the channel axis. The resulting spectrograms were of size 128 frequency bins, 43 time samples and 3 channels with each representing log-mel spectrograms, its delta features and its delta-delta features respectively. Then, the three channel feature map is fed into the mobilenet-based frequency high-resolution network. Finally, after 1 × 1 convolution and global average pooling, the classification results are obtained through softmax output. The classification accuracy of our proposed model is 53.9% with a loss value of 1.378. The number of parameters of the model is 70.608K, where each parameter is represented using int8 and the MACs are 28.461M.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, SpecAugment |
Features | log-mel energies,delta and delta-delta |
Classifier | FHR_Mobilenet |
Decision making | average |
Low-Complexity for DCASE 2022 Task 1A Challenge
YuanBo Hou
Telecommunications Engineering, xidian university, Xi'an, China
Abstract
This technical report describes the systems for task1/subtask A of the DCASE 2022 challenge. In order to reduce the number of model parameters and improve accuracy. In this work, I use a simple neural networks with causal convolution and bottleneck structure. The log-mel spectrograms are extracted to train the acoustic scene classification model. The mix-up and Specaugmentation are used to augment the acoustic features. My system achieves higher classification accuracies and lower log loss in the development dataset than baseline system.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | SpecAugment, mixup |
Features | log-mel energies |
Classifier | CNN |
Kt Submission for the DCASE 2022 Challenge: Modernized Convolutional Neural Networks for Acoustic Scene Classification
TaeSoo Kim, GaHui Lee and JaeHan Park
AI2XL, KT Corporation, Seoul, South Korea
Park_KT_task1_1 Park_KT_task1_2
Kt Submission for the DCASE 2022 Challenge: Modernized Convolutional Neural Networks for Acoustic Scene Classification
TaeSoo Kim, GaHui Lee and JaeHan Park
AI2XL, KT Corporation, Seoul, South Korea
Abstract
In this technical reports, we present our team’s submission for DCASE 2022 TASK1 which is the low complexity Acoustic Scene Classification (ASC). We gradually modernized a neural network architecture design starting from the baseline model and discover several key components that contribute to the performance. To meet constraints of the model complexity, the number of parameters and the number of MACs are considered while applying each designs. As a result, our model achieves 1.2593 log-loss and 54.03% accuracy on the development set, while having less than 114k of total parameters (including the zero-valued) and 30 million MACs.
System characteristics
Sampling rate | 22.05kHz |
Data augmentation | SpecAugment |
Features | log-mel energies |
Classifier | CNN |
Hyu Submission for the DCASE 2022: Efficient Fine-Tuning Method Using Device-Aware Data-Random-Drop for Device-Imbalanced Acoustic Scene Classification
Joo-Hyun Lee, Jeong-Hwan Choi, Pil Moo Byun and Joon-Hyuk Chang
Electronic Engineering, Hanyang University, Seoul, Republic of Korea
Chang_HYU_task1_1 Chang_HYU_task1_2 Chang_HYU_task1_3 Chang_HYU_task1_4
Hyu Submission for the DCASE 2022: Efficient Fine-Tuning Method Using Device-Aware Data-Random-Drop for Device-Imbalanced Acoustic Scene Classification
Joo-Hyun Lee, Jeong-Hwan Choi, Pil Moo Byun and Joon-Hyuk Chang
Electronic Engineering, Hanyang University, Seoul, Republic of Korea
Abstract
This paper address the Hanyang University team submission for the DCASE 2022 Challenge Low-Complexity Acoustic Scene Classification task. The task aims to design a generalized audio scene classification system for various devices under low complexity and short input time conditions. We followed two strategies to achieve our goal: improving the model structure for short segmented audio and adopting transfer learning methods that are generalizable to unknown devices. Based on the BC-ResNet, which showed the best performance in DCASE 2021 challenge, we incorporated the method proposed in the field of short-duration speaker verification to secure high accuracy. In addition, we proposed a novel finetuning method using device-aware data-random-drop to get a generalized model across multiple devices. Most of the training dataset is data recorded with a specific device. We devised a fine tuning method that gradually excludes data recorded with a specific device from mini-batch during training, and this method improves generalization performance. Following the official protocol of cross validation setup from the TAU Urban Acoustic Scenes 2022 Mobile development dataset, we achieve 70.1% accuracy and 0.835 multi class cross-entropy loss, respectively.
System characteristics
Sampling rate | 16kHz |
Data augmentation | mixup, SpecAugment, time masking, frequency masking, temporal shuffle |
Features | log-mel energies |
Classifier | CNN, BC-Res2Net |
Decision making | categorical cross entropy |
Low-Complexity Acoustic Scene Classification Based on Residual Net
Jiangnan Liang, Cheng Zeng, Chuang Shi, Le Zhang, Yisen Zhou, Yuehong Li, Yanyu Zhou and Tianqi Tan
University of Electronic Science and Technology of China, Chengdu, China
Liang_UESTC_task1_1 Liang_UESTC_task1_2 Liang_UESTC_task1_3 Liang_UESTC_task1_4
Low-Complexity Acoustic Scene Classification Based on Residual Net
Jiangnan Liang, Cheng Zeng, Chuang Shi, Le Zhang, Yisen Zhou, Yuehong Li, Yanyu Zhou and Tianqi Tan
University of Electronic Science and Technology of China, Chengdu, China
Abstract
This technical report describes the submitted systems for task 1 of the DCASE 2022 challenge. The log-mel energies, delta features and delta-delta features were extracted to train the model. We adopted a total of eight data augmentation methods. BC-ResNet and MobileNetV2 were used as training model. We used knowledge distillation and quantization to compress the model. Our systems achieved lower log loss and higher accuracy in the development dataset than the baseline system.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | time masking, frequency masking, time warping, mixup; time masking, frequency masking, time warping, frequency warping, mixup; noise addition, pitch shifting, speed changing, time masking, mixup |
Features | log-mel energies |
Classifier | BC-ResNet; MobileNetV2 |
Receptive Field Regularized CNNs with Traditional Audio Augmentations
Tobias Morocutti and Diaaeldin Shalaby
Johannes Kepler University, Linz, Austria
Morocutti_JKU_task1_1 Morocutti_JKU_task1_2 Morocutti_JKU_task1_3 Morocutti_JKU_task1_4
Receptive Field Regularized CNNs with Traditional Audio Augmentations
Tobias Morocutti and Diaaeldin Shalaby
Johannes Kepler University, Linz, Austria
Abstract
This technical report describes our system for Task 1 (Low-Complexity Acoustic Scene Classification) of the DCASE2022 Challenge. Due to the limited allowed complexity of the model to submit, we use a teacher-student approach. The teacher is a Receptive Field (RF) regularized CNN model and the student is a simpler 5-layer CNN with batch normalization, dropout and maxpool layers. In addition, some data augmentation techniques, such as adding gaussian noise, shifting, pitch shifting and time stretching are adopted for expanding the diversity of the dataset. Our system achieves an accuracy of 53.4% and a multiclass cross-entropy (log loss) of 1.279 on the development dataset. The student model has 21,930 parameters and a Multiply accumulate count of 9.775 million.
System characteristics
Sampling rate | 22.05kHz |
Data augmentation | mixup, pitch shifting, time stretching, shifting, adding gaussian noise |
Features | mel-spectrogram |
Classifier | ensemble, CNN |
Decision making | average |
Submission to DCASE 2022 Task 1: Depthwise Separable Convolutions for Low-Complexity Acoustic Scene Classification
Chukwuebuka Olisaemeka and Lakshmi Babu Saheer
Computing Sciences, Anglia Ruskin University, Cambridge, United Kingdom
Olisaemeka_ARU_task1_1
Submission to DCASE 2022 Task 1: Depthwise Separable Convolutions for Low-Complexity Acoustic Scene Classification
Chukwuebuka Olisaemeka and Lakshmi Babu Saheer
Computing Sciences, Anglia Ruskin University, Cambridge, United Kingdom
Abstract
This technical report describes the details of the TASK1 submission to the DCASE2022 challenge. The aim of this task is to design an acoustic scene classification system that targets devices with low memory and computational allowance. The task also aims to build systems that can generalize across multiple devices. To achieve this objective, a model using Depthwise Separable Convolutional layers is proposed, which reduces the number of parameters and computations required compared to the normal convolutional layers. This work further proposes the use of dilated kernels, which increase the receptive field of the convolutional layer without increasing the number of parameters to be learned. Finally, quantization is applied to reduce the model complexity. The proposed system achieves an average test accuracy of 39% and log loss of 1.878 on TAU Urban Acoustic Scenes 2022 Mobile, development dataset with a parameter count of 96.473k and 3.284 MMACs.
System characteristics
Sampling rate | 44.1kHz |
Features | log-mel energies |
Classifier | CNN |
Low-Complexity Deep Learning Frameworks for Acoustic Scene Classification
Lam Pham1, Ngo Dat2, Anahid Naghibzadeh-Jalali1 and Alexander Schindler1
1Center for Digital Safety & Security, Austrian Institute of Technology, Vienna, Austria, 2School of computer science and electronic engineering, Essex University, UK
AIT_Essex_task1_1 AIT_Essex_task1_2 AIT_Essex_task1_3
Low-Complexity Deep Learning Frameworks for Acoustic Scene Classification
Lam Pham1, Ngo Dat2, Anahid Naghibzadeh-Jalali1 and Alexander Schindler1
1Center for Digital Safety & Security, Austrian Institute of Technology, Vienna, Austria, 2School of computer science and electronic engineering, Essex University, UK
Abstract
In this report, we presents low-complexity deep learning frameworks for acoustic scene classification (ASC). The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities. In particular, we initially transform audio recordings into Mel, Gammatone, and CQT spectrograms. Next, data augmentation methods of Random Cropping, Specaugment, and Mixup are then applied on spectrograms. Augmented spectrograms are then fed into deep learning based classifiers. Finally, probabilities which obtained from three individual classifiers, which are trained with three type of spectrograms independently, are fused to achieve the best performance. Our experiments, which are conducted on DCASE 2022 Task 1 Development dataset, achieve low-complexity frameworks and the best classification accuracy of 60.1%, improving DCASE baseline by 17.2%.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, random cropping, SpecAugment |
Features | CQT, Gammatonegram, Mel |
Classifier | CNN |
Decision making | late fusion of predicted probabilities |
CP-JKU Submission to Dcase22: Distilling Knowledge for Low-Complexity Convolutional Neural Networks From a Patchout Audio Transformer
Florian Schmid1,2, Shahed Masoudian2, Khaled Koutini2 and Gerhard Widmer1,2
1Computational Perception (CP), Johannes Kepler University (JKU) Linz, Linz, Austria, 2LIT Artificial Intelligence Lab, Johannes Kepler University (JKU) Linz, Linz, Austria
Schmid_CPJKU_task1_1 Schmid_CPJKU_task1_2 Schmid_CPJKU_task1_3 Schmid_CPJKU_task1_4
Judges’ award
CP-JKU Submission to Dcase22: Distilling Knowledge for Low-Complexity Convolutional Neural Networks From a Patchout Audio Transformer
Florian Schmid1,2, Shahed Masoudian2, Khaled Koutini2 and Gerhard Widmer1,2
1Computational Perception (CP), Johannes Kepler University (JKU) Linz, Linz, Austria, 2LIT Artificial Intelligence Lab, Johannes Kepler University (JKU) Linz, Linz, Austria
Abstract
In this technical report, we describe the CP-JKU team’s submission for Task 1 Low-Complexity Acoustic Scene Classification of the DCASE 22 challenge [1]. We use Knowledge Distillation to teach low-complexity CNN student models from Patchout Spectrogram Transformer (PaSST) models. We use the pre-trained PaSST models on Audioset and fine-tune them on the TAU Urban Acoustic Scenes 2022 Mobile development dataset. We experiment with using an ensemble of teachers, different receptive fields of the student models, and mixing frequency-wise statistics of spectrograms to enhance generalization to unseen devices. Finally, the student models are quantized in order to perform inference computations using 8 bit integers, simulating the low-complexity constraints of edge devices.
Awards: Judges’ award
System characteristics
Sampling rate | 32.0kHz |
Data augmentation | mixup, mixstyle, pitch shifting; mixstyle, pitch shifting |
Features | log-mel energies |
Classifier | RF-regularized CNNs, PaSST transformer |
Structured Filter Pruning and Feature Selection for Low Complexity Acoustic Scene Classification
Lorenz Schmidt, Beran Kiliç and Nils Peters
International Audio Laboratories, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Schmidt_FAU_task1_1
Structured Filter Pruning and Feature Selection for Low Complexity Acoustic Scene Classification
Lorenz Schmidt, Beran Kiliç and Nils Peters
International Audio Laboratories, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Abstract
The DCASE challenge track 1 provides a dataset for Acoustic Scene Classification (ASC), a popular problem in machine learning. This years challenge shortens the provided audio clips to 1 sec, adds a Multiply-Accumulate operations (MAC) constrain and additionally counts all parameters of the model. We tackle the problem by using three approaches: First we use a linear model with global moments of the spectrogram, getting into reach of the baseline; then we use feature selection to reduce generalization gap and MACs; and finally, structured filter pruning to bring the number of parameters below the parameter constraint. Using the evaluation split of the development dataset, our result shows an increase to 49.1% overall accuracy compared to the baseline system with 42.9% accuracy.
System characteristics
Sampling rate | 16kHz |
Data augmentation | mixup, rolling, SpecAugment |
Features | log-mel energies |
Classifier | CNN, SVM |
Mini-Segnet for Low-Complexity Acoustic Scene Classification
Yun-Fei Shao1, Xuan Zhang2, Ge-Ge Bing1, Ke-Meng Zhao1, Jun-Jie Xu2, Yong Ma2 and Wei-Qiang Zhang1
1Department of Electronic Engineering, Tsinghua University, Beijing, China, 2School of Lingustic Sciences and Arts, Jiangsu Normal University, Xuzhou, China
Zhang_THUEE_task1_1 Zhang_THUEE_task1_2
Mini-Segnet for Low-Complexity Acoustic Scene Classification
Yun-Fei Shao1, Xuan Zhang2, Ge-Ge Bing1, Ke-Meng Zhao1, Jun-Jie Xu2, Yong Ma2 and Wei-Qiang Zhang1
1Department of Electronic Engineering, Tsinghua University, Beijing, China, 2School of Lingustic Sciences and Arts, Jiangsu Normal University, Xuzhou, China
Abstract
This report details the architecture we used to address task 1 of the DCASE2022 challenge. The goal of the task is to design an audio scene classification system for device-imbalanced datasets under the constraints of model complexity. Our architecture is based on SegNet, adding an instance normalization layer to normalize the activations of the previous layer at each step. Log-mel spectrograms, delta features, and delta-delta features are extracted to train the acoustic scene classification model. A total of 6 data augmentations are applied as follows: mixup, time and frequency domain masking, image augmentation, auto level, pix2pix, and random crop. We apply three model compression schemes: pruning, quantization, and knowledge distillation to reduce model complexity. The proposed system achieves higher classification accuracies and lower log loss than the baseline system. After model compression, our model achieves an average accuracy of 54.11% within the 127.2 K parameters size, 8-bit quantization, and MMACs less than 30 M.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, ImageDataGenerator, temporal crop, Auto levels, pix2pix |
Features | log-mel energies |
Classifier | Mini-SegNet |
Low-Complexity CNNs for Acoustic Scene Classification
Arshdeep Singh, James A King, Xubo Liu, Wenwu Wang and Mark D. Plumbley
CVSSP, University of Surrey, Guildford, UK
Abstract
This technical report describes the SurreyAudioTeam22’s submission for DCASE 2022 ASC Task 1, Low-Complexity Acoustic Scene Classification (ASC). The task has two rules, (a) the ASC framework should have maximum 128K parameters, and (b) there should be a maximum of 30 millions multiply-accumulate operations (MACs) per inference. In this report, we present lowcomplexity systems for ASC that follow the rules intended for the task
System characteristics
Sampling rate | 44.1kHz |
Features | log-mel energies |
Classifier | CNN |
Decision making | maximum likelihood; average |
Self-Ensemble with Multi-Task Learning for Low-Complexity Acoustic Scene Classification
Reiko Sugahara, Ryo Sato, Masatoshi Osawa, Yuuki Yuno and Chiho Haruta
RION CO., LTD., Tokyo, Japan
Abstract
This technical report describes a procedure for Task 1 in Detection and Classification of Acoustic Scenes and Events (DCASE) 2022. The proposed method adopts MobileNet-based models with log-mel energies and deltas as inputs. The accuracy was improved by self-ensemble with multi-task learning. Data augmentations, e.g., mixup, SpecAugment, and spectrum modulation, were applied to prevent overfitting. To meet system complexity requirements, we adopted depth-separable convolution and quantization aware training. The model contains 120,505 parameters and requires 26.607 million multiply-and-accumulate operations. Consequently, the proposed system achieved a 56.5% accuracy and a log-loss of 1.179 based on the development data.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, SpecAugment, spectrum modulation |
Features | log-mel energies, deltas |
Classifier | MobileNet |
Decision making | weighted average |
Low-Complexity Acoustic Scene Classification with Mismatch-Devices Using Separable Convolutions and Coordinate Attention
Yifei Xin1, Yuexian Zou1, Fan Cui2 and Yujun Wang2
1Peking University, Shenzhen, China, 2Xiaomi Corporation, Beijing, China
Zou_PKU_task1_1
Low-Complexity Acoustic Scene Classification with Mismatch-Devices Using Separable Convolutions and Coordinate Attention
Yifei Xin1, Yuexian Zou1, Fan Cui2 and Yujun Wang2
1Peking University, Shenzhen, China, 2Xiaomi Corporation, Beijing, China
Abstract
This report details the architecture we used to address Task 1 of the of DCASE2022 challenge. Our architecture is based on 4 layer convolutional neural network taking as input a log-mel spectrogram. The complexity of this network is controlled by using separable convolutions in the channel, time and frequency dimensions. Moreover, we introduce a novel attention mechanism by embedding positional information into channel attention, which we call coordinate attention to improve the accuracy of a CNN-based framework. Besides, we use SpecAugment++, time shifting and test time augmentations to further improve the performance of the system.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | SpecAugment++, time shifting |
Features | spectrogram |
Embeddings | CNN6 |
Classifier | CNN |
Acoustic Scene Classification Based on Feature Fusion and Dilated-Convolution
Junfei Yu, Runyu Shi, Tianrui He and Kaibin Guo
Mobile Phone, Xiaomi, Beijing, China
Yu_XIAOMI_task1_1
Acoustic Scene Classification Based on Feature Fusion and Dilated-Convolution
Junfei Yu, Runyu Shi, Tianrui He and Kaibin Guo
Mobile Phone, Xiaomi, Beijing, China
Abstract
This technical report describes our submission for Task 1 of the DCASE Challenge 2022. The goal of task 1 is to classify the recorded audios for acoustic scene classification using an int8 quantized model that does not exceed 128KB in size. In our submission, a variety of timefrequency features are extracted and fused to be the input of the deep learning network. As the backbone of the network, the dilated-convolution is applied for embedding of various input features. Furthermore, we make use of multiple time-frequency data augmentation on the original data to increase the diversity of the data. After the network training is completed, the variable type of the weight data is converted into INT8. This INT8 model achieves a log loss of 1.305 and an accuracy of 51.7% on the standard test set of the TAU Urban Acoustic Scenes 2022 Mobile development dataset.
System characteristics
Sampling rate | 44.1kHz |
Features | log-mel energies, spectral entropy, spectral flatness |
Embeddings | dilated-CNN |
Classifier | CNN |
DCASE 2022: Comparative Analysis of CNNs for Acoustic Scene Classification Under Low-Complexity Considerations
Josep Zaragoza Paredes1, Javier Naranjo Alcázar2, Valery Naranjo Ornedo1 and Pedro Zuccarello2
1ETSIT, Universitat Politècnica de València, Valencia, Spain, 2R+D, Instituto Tecnológico de Informática, Valencia, Spain
Zaragoza-Paredes_UPV_task1_1 Zaragoza-Paredes_UPV_task1_2
DCASE 2022: Comparative Analysis of CNNs for Acoustic Scene Classification Under Low-Complexity Considerations
Josep Zaragoza Paredes1, Javier Naranjo Alcázar2, Valery Naranjo Ornedo1 and Pedro Zuccarello2
1ETSIT, Universitat Politècnica de València, Valencia, Spain, 2R+D, Instituto Tecnológico de Informática, Valencia, Spain
Abstract
Acoustic scene classification is an automatic listening problem that aims to assign an audio recording to a pre-defined scene based on its audio data. Over the years (and in past editions of the DCASE) this problem has often been solved with techniques known as ensembles (use of several machine learning models to combine their predictions in the inference phase). While these solutions can show performance in terms of accuracy, they can be very expensive in terms of computational capacity, making it impossible to deploy them in IoT devices. Due to the drift in this field of study, this task has two limitations in terms of model complexity. It should be noted that there is also the added complexity of mismatching devices (the audios provided are recorded by different sources of information). This technical report makes a comparative study of two different network architectures: conventional CNN and Convmixer. Although both networks exceed the baseline required by the competition, the conventional CNN shows a higher performance, exceeding the baseline by 8 percentage points. Solutions based on Conv-mixer architectures show worse performance although they are much lighter solutions.
System characteristics
Sampling rate | 44.1kHz |
Features | log-mel energies |
Classifier | CNN |