Low-Complexity Acoustic Scene Classification


Challenge results

Task description

The goal of acoustic scene classification is to classify a test recording into one of the predefined ten acoustic scene classes. This targets acoustic scene classification with devices with low computational and memory allowance, which impose certain limits on the model complexity, such as the model’s number of parameters and the multiply-accumulate operations count. In addition to low-complexity, the aim is generalization across a number of different devices. For this purpose, the task will use audio data recorded and simulated with a variety of devices.

The development dataset consists of recordings from 10 European cities using 9 different devices: 3 real devices (A, B, C) and 6 simulated devices (S1-S6). Data from devices B, C, and S1-S6 consists of randomly selected segments from the simultaneous recordings, therefore all overlap with the data from device A, but not necessarily with each other. The total amount of audio in the development set is 64 hours.

The evaluation dataset contains data from 12 cities, 10 acoustic scenes, 11 devices. There are five new devices (not available in the development set): real device D and simulated devices S7-S11. Evaluation data contains 22 hours of audio.

The device A consists in a Soundman OKM II Klassik/studio A3, electret binaural microphone and a Zoom F8 audio recorder using 48kHz sampling rate and 24-bit resolution. The other devices are commonly available customer devices: device B is a Samsung Galaxy S7, device C is iPhone SE, and device D is a GoPro Hero5 Session.

More detailed task description can be found in the task description page

Systems ranking

Submission information Evaluation dataset Development dataset
Rank Submission label Name Technical
Report
Official
system rank
Logloss
with 95% confidence interval
Accuracy
with 95% confidence interval
Logloss Accuracy
AI4EDGE_IPL_task1_1 AI4EDGE_1 Anastcio2022 42 2.414 (2.264 - 2.564) 47.0 (46.7 - 47.3) 0.742 75.6
AI4EDGE_IPL_task1_2 AI4EDGE_2 Anastcio2022 41 2.365 (2.226 - 2.504) 46.7 (46.4 - 46.9) 0.791 73.5
AI4EDGE_IPL_task1_3 AI4EDGE_3 Anastcio2022 17 1.398 (1.343 - 1.454) 49.4 (49.1 - 49.7) 1.347 50.5
AI4EDGE_IPL_task1_4 AI4EDGE_4 Anastcio2022 11 1.330 (1.281 - 1.378) 51.6 (51.3 - 51.9) 1.103 60.5
AIT_Essex_task1_1 AIT_Essex Pham2022 34 1.636 (1.535 - 1.737) 53.0 (52.7 - 53.3) 1.719 55.6
AIT_Essex_task1_2 AIT_Essex Pham2022 36 1.787 (1.680 - 1.894) 51.9 (51.6 - 52.2) 51.4
AIT_Essex_task1_3 AIT_Essex Pham2022 37 1.808 (1.689 - 1.928) 55.2 (55.0 - 55.5) 1.306 60.1
Cai_XJTLU_task1_1 DW_S Cai2022 26 1.515 (1.454 - 1.575) 47.8 (47.5 - 48.0) 1.578 46.2
Cai_XJTLU_task1_2 DW Cai2022 30 1.580 (1.519 - 1.642) 46.4 (46.1 - 46.7) 1.551 45.6
Cai_XJTLU_task1_3 DW_AUG_S Cai2022 33 1.635 (1.566 - 1.704) 45.2 (44.9 - 45.5) 1.437 48.3
Cai_XJTLU_task1_4 DW_AUG Cai2022 27 1.564 (1.501 - 1.627) 48.0 (47.7 - 48.3) 1.327 49.3
Cao_SCUT_task1_1 KDResCG Cao2022 45 2.795 (2.623 - 2.967) 48.7 (48.4 - 48.9) 1.441 51.1
Chang_HYU_task1_1 JH_PM_HYU1 Lee2022 5 1.147 (1.081 - 1.214) 60.8 (60.6 - 61.1) 0.835 70.1
Chang_HYU_task1_2 JH_PM_HYU2 Lee2022 6 1.187 (1.125 - 1.249) 59.2 (58.9 - 59.5) 1.065 62.6
Chang_HYU_task1_3 JH_PM_HYU3 Lee2022 8 1.190 (1.130 - 1.251) 59.4 (59.1 - 59.6) 1.005 64.9
Chang_HYU_task1_4 JH_PM_HYU4 Lee2022a 7 1.187 (1.126 - 1.248) 59.3 (59.1 - 59.6) 1.072 62.2
Dong_NCUT_task1_1 Dong1_NCUT Dong2022 29 1.568 (1.512 - 1.623) 48.0 (47.7 - 48.3) 1.378 53.9
Houyb_XDU_task1_1 Houyb_XDU Hou2022 22 1.481 (1.416 - 1.547) 49.3 (49.0 - 49.5) 1.449 49.7
Liang_UESTC_task1_1 BC-ResNet1 Liang2022 38 1.934 (1.830 - 2.038) 41.3 (41.0 - 41.5) 1.263 53.8
Liang_UESTC_task1_2 BC-ResNet2 Liang2022 47 2.916 (2.751 - 3.081) 29.9 (29.6 - 30.2) 1.267 55.9
Liang_UESTC_task1_3 BC-ResNet3 Liang2022 43 2.701 (2.566 - 2.836) 28.5 (28.2 - 28.7) 1.236 56.2
Liang_UESTC_task1_4 MobileNet2 Liang2022 32 1.612 (1.560 - 1.663) 44.1 (43.8 - 44.4) 1.556 45.9
DCASE2022 baseline Baseline 1.532 (1.490 - 1.574) 44.2 (44.0 - 44.5) 1.575 42.9
Morocutti_JKU_task1_1 jku_stu_1 Morocutti2022 12 1.339 (1.278 - 1.399) 53.8 (53.5 - 54.1) 1.279 53.3
Morocutti_JKU_task1_2 jku_stu_2 Morocutti2022 13 1.355 (1.296 - 1.414) 53.0 (52.7 - 53.2) 1.280 53.4
Morocutti_JKU_task1_3 jku_stu_3 Morocutti2022 10 1.320 (1.256 - 1.383) 54.7 (54.4 - 55.0) 1.291 52.8
Morocutti_JKU_task1_4 jku_stu_4 Morocutti2022 9 1.311 (1.253 - 1.369) 54.5 (54.2 - 54.8) 1.288 52.7
Olisaemeka_ARU_task1_1 DepSepConv Olisaemeka2022 39 2.055 (1.991 - 2.119) 36.4 (36.1 - 36.6) 1.878 39.0
Park_KT_task1_1 MConvNet Kim2022 25 1.504 (1.431 - 1.576) 51.7 (51.4 - 52.0) 1.259 54.0
Park_KT_task1_2 MConvNet Kim2022 19 1.431 (1.364 - 1.498) 52.7 (52.4 - 53.0) 1.259 54.0
Schmid_CPJKU_task1_1 t10sec Schmid2022 2 1.092 (1.043 - 1.141) 59.7 (59.5 - 60.0) 1.115 58.6
Schmid_CPJKU_task1_2 mixstyleR8 Schmid2022 4 1.105 (1.057 - 1.153) 59.6 (59.3 - 59.9) 1.110 59.1
Schmid_CPJKU_task1_3 mixstyleR5 Schmid2022 1 1.091 (1.040 - 1.141) 59.6 (59.4 - 59.9) 1.139 58.0
Schmid_CPJKU_task1_4 audiosetR5 Schmid2022 3 1.102 (1.054 - 1.151) 59.4 (59.1 - 59.7) 1.163 57.6
Schmidt_FAU_task1_1 final Schmidt2022 35 1.731 (1.657 - 1.805) 47.5 (47.2 - 47.8) 1.581 49.0
Singh_Surrey_task1_1 Surrey_4M Singh2022 28 1.565 (1.508 - 1.623) 44.6 (44.3 - 44.9) 1.449 46.6
Singh_Surrey_task1_2 Surrey_5M Singh2022 31 1.606 (1.547 - 1.664) 44.3 (44.1 - 44.6) 1.475 45.9
Singh_Surrey_task1_3 Surrey_19M Singh2022 23 1.492 (1.441 - 1.544) 45.9 (45.6 - 46.2) 1.392 47.3
Singh_Surrey_task1_4 Surrey_20M Singh2022 24 1.499 (1.447 - 1.551) 45.9 (45.6 - 46.2) 1.389 47.5
Sugahara_RION_task1_1 RION1 Sugahara2022 18 1.405 (1.337 - 1.473) 51.5 (51.2 - 51.7) 1.199 56.3
Sugahara_RION_task1_2 RION2 Sugahara2022 15 1.389 (1.325 - 1.454) 51.6 (51.3 - 51.9) 1.179 56.5
Sugahara_RION_task1_3 RION3 Sugahara2022 14 1.366 (1.305 - 1.426) 51.7 (51.4 - 51.9) 1.182 56.5
Sugahara_RION_task1_4 RION4 Sugahara2022 16 1.397 (1.328 - 1.466) 52.7 (52.5 - 53.0) 1.214 57.1
Yu_XIAOMI_task1_1 YLSSD Yu2022 21 1.456 (1.409 - 1.504) 46.2 (46.0 - 46.5) 1.305 51.7
Zaragoza-Paredes_UPV_task1_1 Conv_Sep_CNN_48 Zaragoza_Paredes2022 44 2.709 (2.517 - 2.901) 43.8 (43.6 - 44.1) 1.440 50.6
Zaragoza-Paredes_UPV_task1_2 Conv_Sep_CNN_48 Zaragoza_Paredes2022 46 2.904 (2.690 - 3.118) 41.9 (41.7 - 42.2) 1.440 50.6
Zhang_THUEE_task1_1 THUEE Shao2022 40 2.096 (1.913 - 2.280) 54.9 (54.7 - 55.2) 1.360 54.1
Zhang_THUEE_task1_2 THUEE Shao2022 48 3.068 (2.775 - 3.361) 54.4 (54.1 - 54.7) 1.360 53.1
Zou_PKU_task1_1 SepCNN Xin2022 20 1.442 (1.362 - 1.521) 56.3 (56.0 - 56.6) 1.295 60.3

Teams ranking

Table including only the best performing system per submitting team.

Submission information Evaluation dataset Development dataset
Rank Submission label Name Technical
Report
Official
system rank
Team rank Logloss Accuracy
with 95% confidence interval
Logloss Accuracy
AI4EDGE_IPL_task1_4 AI4EDGE_4 Anastcio2022 11 4 1.330 (1.281 - 1.378) 51.6 (51.3 - 51.9) 1.103 60.5
AIT_Essex_task1_1 AIT_Essex Pham2022 34 14 1.636 (1.535 - 1.737) 53.0 (52.7 - 53.3) 1.719 55.6
Cai_XJTLU_task1_1 DW_S Cai2022 26 11 1.515 (1.454 - 1.575) 47.8 (47.5 - 48.0) 1.578 46.2
Cao_SCUT_task1_1 KDResCG Cao2022 45 19 2.795 (2.623 - 2.967) 48.7 (48.4 - 48.9) 1.441 51.1
Chang_HYU_task1_1 JH_PM_HYU1 Lee2022 5 2 1.147 (1.081 - 1.214) 60.8 (60.6 - 61.1) 0.835 70.1
Dong_NCUT_task1_1 Dong1_NCUT Dong2022 29 12 1.568 (1.512 - 1.623) 48.0 (47.7 - 48.3) 1.378 53.9
Houyb_XDU_task1_1 Houyb_XDU Hou2022 22 9 1.481 (1.416 - 1.547) 49.3 (49.0 - 49.5) 1.449 49.7
Liang_UESTC_task1_4 MobileNet2 Liang2022 32 13 1.612 (1.560 - 1.663) 44.1 (43.8 - 44.4) 1.556 45.9
DCASE2022 baseline Baseline 1.532 (1.490 - 1.574) 44.2 (44.0 - 44.5) 1.575 42.9
Morocutti_JKU_task1_4 jku_stu_4 Morocutti2022 9 3 1.311 (1.253 - 1.369) 54.5 (54.2 - 54.8) 1.288 52.7
Olisaemeka_ARU_task1_1 DepSepConv Olisaemeka2022 39 16 2.055 (1.991 - 2.119) 36.4 (36.1 - 36.6) 1.878 39.0
Park_KT_task1_2 MConvNet Kim2022 19 6 1.431 (1.364 - 1.498) 52.7 (52.4 - 53.0) 1.259 54.0
Schmid_CPJKU_task1_3 mixstyleR5 Schmid2022 1 1 1.091 (1.040 - 1.141) 59.6 (59.4 - 59.9) 1.139 58.0
Schmidt_FAU_task1_1 final Schmidt2022 35 15 1.731 (1.657 - 1.805) 47.5 (47.2 - 47.8) 1.581 49.0
Singh_Surrey_task1_3 Surrey_19M Singh2022 23 10 1.492 (1.441 - 1.544) 45.9 (45.6 - 46.2) 1.392 47.3
Sugahara_RION_task1_3 RION3 Sugahara2022 14 5 1.366 (1.305 - 1.426) 51.7 (51.4 - 51.9) 1.182 56.5
Yu_XIAOMI_task1_1 YLSSD Yu2022 21 8 1.456 (1.409 - 1.504) 46.2 (46.0 - 46.5) 1.305 51.7
Zaragoza-Paredes_UPV_task1_1 Conv_Sep_CNN_48 Zaragoza_Paredes2022 44 18 2.709 (2.517 - 2.901) 43.8 (43.6 - 44.1) 1.440 50.6
Zhang_THUEE_task1_1 THUEE Shao2022 40 17 2.096 (1.913 - 2.280) 54.9 (54.7 - 55.2) 1.360 54.1
Zou_PKU_task1_1 SepCNN Xin2022 20 7 1.442 (1.362 - 1.521) 56.3 (56.0 - 56.6) 1.295 60.3

System complexity

Submission information Evaluation dataset Acoustic model System
Rank Submission label Technical
Report
Official
system
rank
Logloss Accuracy MACS Memory use Parameters Non-zero
parameters
Sparsity Complexity
management
AI4EDGE_IPL_task1_1 Anastcio2022 42 2.414 47.0 21127552 70612 68918 68918 0.0 weight quantization
AI4EDGE_IPL_task1_2 Anastcio2022 41 2.365 46.7 21127552 70612 68918 68918 0.0 weight quantization
AI4EDGE_IPL_task1_3 Anastcio2022 17 1.398 49.4 25475456 52852 51986 51986 0.0 knowledge distillation, weight quantization
AI4EDGE_IPL_task1_4 Anastcio2022 11 1.330 51.6 25475456 52852 51986 51986 0.0 knowledge distillation, weight quantization
AIT_Essex_task1_1 Pham2022 34 1.636 53.0 900000 33822 33822 32382 0.042575838211814765 channel restriction, decomposed convolution, quantization
AIT_Essex_task1_2 Pham2022 36 1.787 51.9 750000 31902 31902 30558 0.04212902012413011 channel restriction, decomposed convolution, quantization
AIT_Essex_task1_3 Pham2022 37 1.808 55.2 900000 115998 115998 113118 0.024828014276108257 channel restriction, decomposed convolution, quantization
Cai_XJTLU_task1_1 Cai2022 26 1.515 47.8 6287030 25526 25526 0.0 group convolution
Cai_XJTLU_task1_2 Cai2022 30 1.580 46.4 6287030 25526 25526 0.0 group convolution
Cai_XJTLU_task1_3 Cai2022 33 1.635 45.2 7337718 35926 35926 0.0 group convolution
Cai_XJTLU_task1_4 Cai2022 27 1.564 48.0 7337718 35926 35926 0.0 group convolution
Cao_SCUT_task1_1 Cao2022 45 2.795 48.7 8637250 125330 125330 125330 0.0 weight quantization, knowledge distillation
Chang_HYU_task1_1 Lee2022 5 1.147 60.8 26763000 126580 126580 0.0 weight quantization, knowledge distillation
Chang_HYU_task1_2 Lee2022 6 1.187 59.2 26763000 126580 126580 0.0 weight quantization
Chang_HYU_task1_3 Lee2022 8 1.190 59.4 26763000 126580 126580 0.0 weight quantization, knowledge distillation
Chang_HYU_task1_4 Lee2022a 7 1.187 59.3 26763000 126580 126580 0.0 weight quantization
Dong_NCUT_task1_1 Dong2022 29 1.568 48.0 28461216 540672 70608 70608 0.0 weight quantization
Houyb_XDU_task1_1 Hou2022 22 1.481 49.3 28513000 78140 57957 57957 0.0 weight quantization
Liang_UESTC_task1_1 Liang2022 38 1.934 41.3 20500000 85800 85800 85800 0.0 knowledge distillation, weight quantization
Liang_UESTC_task1_2 Liang2022 47 2.916 29.9 20500000 85800 85800 85800 0.0 knowledge distillation, weight quantization
Liang_UESTC_task1_3 Liang2022 43 2.701 28.5 20500000 85800 85800 85800 0.0 knowledge distillation, weight quantization
Liang_UESTC_task1_4 Liang2022 32 1.612 44.1 11186000 110452 110452 110452 0.0 weight quantization
DCASE2022 baseline 1.532 44.2 29234920 65280 46512 46512 0.0 weight quantization
Morocutti_JKU_task1_1 Morocutti2022 12 1.339 53.8 29325000 3510000 65790 65790 0.0 weight quantization
Morocutti_JKU_task1_2 Morocutti2022 13 1.355 53.0 29325000 3510000 65790 65790 0.0 weight quantization
Morocutti_JKU_task1_3 Morocutti2022 10 1.320 54.7 29325000 3510000 65790 65790 0.0 weight quantization
Morocutti_JKU_task1_4 Morocutti2022 9 1.311 54.5 29325000 3510000 65790 65790 0.0 weight quantization
Olisaemeka_ARU_task1_1 Olisaemeka2022 39 2.055 36.4 3283692 65280 96473 96473 0.0 weight quantization
Park_KT_task1_1 Kim2022 25 1.504 51.7 29481000 262319 113378 113378 0.0 weight quantization
Park_KT_task1_2 Kim2022 19 1.431 52.7 29481000 262319 113378 113378 0.0 weight quantization
Schmid_CPJKU_task1_1 Schmid2022 2 1.092 59.7 29056324 127046 127046 0.0 knowledge distillation, weight quantization
Schmid_CPJKU_task1_2 Schmid2022 4 1.105 59.6 29056324 127046 127046 0.0 knowledge distillation, weight quantization
Schmid_CPJKU_task1_3 Schmid2022 1 1.091 59.6 28240924 121610 121610 0.0 knowledge distillation, weight quantization
Schmid_CPJKU_task1_4 Schmid2022 3 1.102 59.4 28240924 121610 121610 0.0 knowledge distillation, weight quantization
Schmidt_FAU_task1_1 Schmidt2022 35 1.731 47.5 15163468 5288960 127943 127943 0.0 weight quantization, structured filter pruning
Singh_Surrey_task1_1 Singh2022 28 1.565 44.6 4129320 261120 13138 13138 0.0 weight quantization, pruning
Singh_Surrey_task1_2 Singh2022 31 1.606 44.3 5404520 261120 14886 14886 0.0 weight quantization, pruning
Singh_Surrey_task1_3 Singh2022 23 1.492 45.9 18585480 261120 59570 59570 0.0 weight quantization, pruning
Singh_Surrey_task1_4 Singh2022 24 1.499 45.9 19831880 261120 60958 60958 0.0 weight quantization, pruning
Sugahara_RION_task1_1 Sugahara2022 18 1.405 51.5 26607000 120229 120229 0.0 weight quantization
Sugahara_RION_task1_2 Sugahara2022 15 1.389 51.6 26607000 120229 120229 0.0 weight quantization
Sugahara_RION_task1_3 Sugahara2022 14 1.366 51.7 26607000 120229 120229 0.0 weight quantization
Sugahara_RION_task1_4 Sugahara2022 16 1.397 52.7 26610000 123346 123346 0.0 weight quantization, knowledge distillation
Yu_XIAOMI_task1_1 Yu2022 21 1.456 46.2 16081000 5934 6306 6306 0.0 weight quantization
Zaragoza-Paredes_UPV_task1_1 Zaragoza_Paredes2022 44 2.709 43.8 28570080 1253376 28320 28320 0.0 weight quantization
Zaragoza-Paredes_UPV_task1_2 Zaragoza_Paredes2022 46 2.904 41.9 28570080 1253376 28320 28320 0.0 weight quantization
Zhang_THUEE_task1_1 Shao2022 40 2.096 54.9 28228320 2322560 127160 127160 0.0 pruning, weight quantization, knowledge distillation
Zhang_THUEE_task1_2 Shao2022 48 3.068 54.4 28098645 2322560 126078 126078 0.0 pruning, weight quantization, knowledge distillation
Zou_PKU_task1_1 Xin2022 20 1.442 56.3 28823618 68140 75562 75562 0.0 weight quantization


Generalization performance

All results with evaluation dataset.

Submission information Overall Devices Cities
Evaluation dataset Unseen Seen Unseen Seen
Rank Submission label Technical
Report
Official
system
rank
Logloss Accuracy Logloss Accuracy Logloss Accuracy Logloss Accuracy Logloss Accuracy
AI4EDGE_IPL_task1_1 Anastcio2022 42 2.414 47.0 2.820 41.6 2.076 51.5 2.218 48.8 2.447 46.8
AI4EDGE_IPL_task1_2 Anastcio2022 41 2.365 46.7 2.923 41.2 1.900 51.2 2.211 48.2 2.390 46.7
AI4EDGE_IPL_task1_3 Anastcio2022 17 1.398 49.4 1.588 42.5 1.240 55.1 1.337 50.2 1.405 49.4
AI4EDGE_IPL_task1_4 Anastcio2022 11 1.330 51.6 1.467 46.5 1.215 55.9 1.268 53.0 1.336 51.5
AIT_Essex_task1_1 Pham2022 34 1.636 53.0 1.806 50.1 1.494 55.4 1.631 51.5 1.642 53.4
AIT_Essex_task1_2 Pham2022 36 1.787 51.9 2.138 47.2 1.494 55.8 1.810 50.0 1.792 52.5
AIT_Essex_task1_3 Pham2022 37 1.808 55.2 2.258 49.9 1.434 59.7 1.837 53.6 1.805 55.7
Cai_XJTLU_task1_1 Cai2022 26 1.515 47.8 1.847 40.7 1.238 53.7 1.553 45.1 1.500 48.7
Cai_XJTLU_task1_2 Cai2022 30 1.580 46.4 1.920 40.2 1.297 51.6 1.611 44.3 1.567 47.0
Cai_XJTLU_task1_3 Cai2022 33 1.635 45.2 2.059 36.6 1.282 52.3 1.674 42.8 1.617 46.0
Cai_XJTLU_task1_4 Cai2022 27 1.564 48.0 1.916 41.9 1.270 53.1 1.631 46.0 1.546 48.7
Cao_SCUT_task1_1 Cao2022 45 2.795 48.7 3.746 44.0 2.003 52.5 2.926 47.6 2.775 48.8
Chang_HYU_task1_1 Lee2022 5 1.147 60.8 1.377 55.1 0.956 65.6 1.114 60.6 1.153 60.9
Chang_HYU_task1_2 Lee2022 6 1.187 59.2 1.426 52.0 0.987 65.2 1.210 58.1 1.175 59.9
Chang_HYU_task1_3 Lee2022 8 1.190 59.4 1.428 52.6 0.992 65.0 1.224 57.6 1.183 60.0
Chang_HYU_task1_4 Lee2022a 7 1.187 59.3 1.433 51.9 0.982 65.5 1.207 58.4 1.176 60.0
Dong_NCUT_task1_1 Dong2022 29 1.568 48.0 1.872 38.8 1.314 55.6 1.638 45.2 1.555 48.8
Houyb_XDU_task1_1 Hou2022 22 1.481 49.3 1.740 42.8 1.265 54.6 1.547 47.1 1.468 49.9
Liang_UESTC_task1_1 Liang2022 38 1.934 41.3 2.289 36.2 1.637 45.5 1.919 42.4 1.944 41.1
Liang_UESTC_task1_2 Liang2022 47 2.916 29.9 3.346 26.4 2.557 32.9 2.928 30.6 2.929 29.7
Liang_UESTC_task1_3 Liang2022 43 2.701 28.5 3.063 24.6 2.400 31.7 2.670 29.2 2.709 28.3
Liang_UESTC_task1_4 Liang2022 32 1.612 44.1 1.690 41.8 1.546 46.1 1.654 43.3 1.608 44.2
DCASE2022 baseline 1.532 44.2 1.725 38.1 1.372 49.4 1.552 43.4 1.530 44.7
Morocutti_JKU_task1_1 Morocutti2022 12 1.339 53.8 1.509 48.6 1.197 58.1 1.328 54.3 1.338 53.9
Morocutti_JKU_task1_2 Morocutti2022 13 1.355 53.0 1.512 47.8 1.224 57.3 1.342 53.7 1.360 53.1
Morocutti_JKU_task1_3 Morocutti2022 10 1.320 54.7 1.508 48.8 1.162 59.5 1.310 54.7 1.321 54.9
Morocutti_JKU_task1_4 Morocutti2022 9 1.311 54.5 1.480 48.8 1.170 59.2 1.302 55.4 1.311 54.5
Olisaemeka_ARU_task1_1 Olisaemeka2022 39 2.055 36.4 2.515 28.4 1.671 43.0 2.180 32.8 2.021 37.3
Park_KT_task1_1 Kim2022 25 1.504 51.7 1.768 46.3 1.284 56.1 1.551 49.8 1.487 52.3
Park_KT_task1_2 Kim2022 19 1.431 52.7 1.624 48.4 1.270 56.2 1.442 51.9 1.417 53.2
Schmid_CPJKU_task1_1 Schmid2022 2 1.092 59.7 1.236 54.5 0.972 64.1 1.122 58.3 1.085 60.3
Schmid_CPJKU_task1_2 Schmid2022 4 1.105 59.6 1.218 55.3 1.011 63.2 1.126 58.6 1.103 59.9
Schmid_CPJKU_task1_3 Schmid2022 1 1.091 59.6 1.231 54.8 0.974 63.7 1.113 58.9 1.087 60.1
Schmid_CPJKU_task1_4 Schmid2022 3 1.102 59.4 1.229 54.9 0.997 63.1 1.129 58.4 1.097 59.7
Schmidt_FAU_task1_1 Schmidt2022 35 1.731 47.5 2.139 40.7 1.390 53.2 1.775 46.7 1.720 47.9
Singh_Surrey_task1_1 Singh2022 28 1.565 44.6 1.835 37.8 1.341 50.3 1.640 42.6 1.545 45.1
Singh_Surrey_task1_2 Singh2022 31 1.606 44.3 1.898 37.8 1.362 49.8 1.672 43.1 1.585 45.0
Singh_Surrey_task1_3 Singh2022 23 1.492 45.9 1.728 39.1 1.296 51.5 1.532 44.3 1.480 46.5
Singh_Surrey_task1_4 Singh2022 24 1.499 45.9 1.754 38.8 1.287 51.9 1.540 44.5 1.486 46.6
Sugahara_RION_task1_1 Sugahara2022 18 1.405 51.5 1.576 46.9 1.261 55.3 1.444 49.4 1.401 51.9
Sugahara_RION_task1_2 Sugahara2022 15 1.389 51.6 1.534 47.8 1.269 54.8 1.431 49.6 1.388 52.1
Sugahara_RION_task1_3 Sugahara2022 14 1.366 51.7 1.496 47.9 1.257 54.8 1.405 49.6 1.364 52.1
Sugahara_RION_task1_4 Sugahara2022 16 1.397 52.7 1.691 46.8 1.152 57.7 1.463 50.0 1.379 53.6
Yu_XIAOMI_task1_1 Yu2022 21 1.456 46.2 1.619 40.6 1.321 51.0 1.473 46.0 1.454 46.5
Zaragoza-Paredes_UPV_task1_1 Zaragoza_Paredes2022 44 2.709 43.8 3.181 40.1 2.315 47.0 2.665 43.4 2.733 44.0
Zaragoza-Paredes_UPV_task1_2 Zaragoza_Paredes2022 46 2.904 41.9 3.254 38.8 2.613 44.6 2.824 42.1 2.930 42.0
Zhang_THUEE_task1_1 Shao2022 40 2.096 54.9 2.435 47.0 1.814 61.5 2.293 53.1 2.056 55.6
Zhang_THUEE_task1_2 Shao2022 48 3.068 54.4 4.008 45.7 2.284 61.6 3.149 51.5 2.986 55.2
Zou_PKU_task1_1 Xin2022 20 1.442 56.3 1.842 48.6 1.108 62.7 1.530 53.0 1.409 57.5

Class-wise performance

Log loss

Rank Submission label Technical
Report
Official
system
rank
Logloss Airport Bus Metro Metro
station
Park Public
square
Shopping
mall
Street
pedestrian
Street
traffic
Tram
AI4EDGE_IPL_task1_1 Anastcio2022 42 2.414 2.559 1.332 1.923 2.808 0.996 4.097 2.567 4.302 1.554 2.003
AI4EDGE_IPL_task1_2 Anastcio2022 41 2.365 2.293 1.227 1.818 2.520 1.026 4.030 3.000 4.196 1.628 1.912
AI4EDGE_IPL_task1_3 Anastcio2022 17 1.398 1.587 0.806 1.519 1.992 0.814 1.662 1.537 1.879 1.213 0.975
AI4EDGE_IPL_task1_4 Anastcio2022 11 1.330 1.429 0.959 1.193 1.549 0.901 1.877 1.314 1.722 1.160 1.192
AIT_Essex_task1_1 Pham2022 34 1.636 1.557 0.777 1.535 2.490 0.603 2.772 1.033 2.702 1.434 1.456
AIT_Essex_task1_2 Pham2022 36 1.787 1.810 0.544 2.138 2.715 1.037 2.748 1.077 2.948 1.564 1.288
AIT_Essex_task1_3 Pham2022 37 1.808 1.767 0.711 1.660 2.431 0.805 3.070 1.449 3.163 1.921 1.106
Cai_XJTLU_task1_1 Cai2022 26 1.515 1.590 1.258 1.274 1.845 1.217 2.132 1.300 1.986 1.414 1.131
Cai_XJTLU_task1_2 Cai2022 30 1.580 1.591 1.457 1.342 1.837 1.356 2.090 1.317 2.113 1.624 1.078
Cai_XJTLU_task1_3 Cai2022 33 1.635 1.583 1.091 1.290 1.979 1.346 2.552 1.521 2.157 1.761 1.070
Cai_XJTLU_task1_4 Cai2022 27 1.564 1.409 1.318 1.307 1.954 1.852 2.006 1.153 1.997 1.559 1.083
Cao_SCUT_task1_1 Cao2022 45 2.795 3.614 1.835 2.234 2.986 2.026 4.087 2.443 3.969 2.277 2.480
Chang_HYU_task1_1 Lee2022 5 1.147 1.647 0.539 1.055 1.100 0.555 1.867 1.291 1.837 0.820 0.763
Chang_HYU_task1_2 Lee2022 6 1.187 1.549 0.564 1.033 1.079 0.701 1.900 1.458 1.936 0.835 0.812
Chang_HYU_task1_3 Lee2022 8 1.190 1.504 0.645 1.041 1.205 0.610 1.903 1.375 1.889 0.907 0.822
Chang_HYU_task1_4 Lee2022a 7 1.187 1.645 0.542 1.049 1.084 0.698 1.849 1.510 1.889 0.827 0.777
Dong_NCUT_task1_1 Dong2022 29 1.568 1.976 0.936 1.390 1.595 0.909 2.288 1.607 2.043 1.441 1.489
Houyb_XDU_task1_1 Hou2022 22 1.481 1.827 1.100 1.508 1.937 0.824 1.669 2.015 1.801 1.166 0.963
Liang_UESTC_task1_1 Liang2022 38 1.934 3.818 1.085 1.029 1.451 1.927 2.831 2.010 2.775 1.460 0.950
Liang_UESTC_task1_2 Liang2022 47 2.916 5.660 2.051 0.736 1.248 3.257 3.867 3.809 3.728 3.736 1.064
Liang_UESTC_task1_3 Liang2022 43 2.701 3.537 2.557 0.693 0.817 4.344 3.929 2.739 3.305 3.690 1.403
Liang_UESTC_task1_4 Liang2022 32 1.612 2.008 1.598 1.933 1.602 1.009 2.357 1.261 1.772 1.158 1.417
DCASE2022 baseline 1.532 1.596 1.368 1.489 1.692 1.635 1.943 1.289 1.891 1.219 1.202
Morocutti_JKU_task1_1 Morocutti2022 12 1.339 1.409 0.841 1.335 1.714 0.736 1.742 1.607 1.953 1.107 0.943
Morocutti_JKU_task1_2 Morocutti2022 13 1.355 1.487 0.985 1.305 1.677 0.682 1.718 1.460 1.734 1.289 1.212
Morocutti_JKU_task1_3 Morocutti2022 10 1.320 1.537 0.673 1.247 1.701 0.712 1.715 1.427 1.959 1.309 0.916
Morocutti_JKU_task1_4 Morocutti2022 9 1.311 1.426 0.747 1.337 1.587 0.701 1.738 1.637 1.874 1.168 0.894
Olisaemeka_ARU_task1_1 Olisaemeka2022 39 2.055 1.970 2.535 1.331 2.020 2.054 2.868 1.795 2.608 1.604 1.765
Park_KT_task1_1 Kim2022 25 1.504 1.615 1.048 1.761 1.982 1.153 2.345 1.229 1.631 1.554 0.721
Park_KT_task1_2 Kim2022 19 1.431 1.166 1.093 1.448 2.117 1.210 1.734 1.198 2.071 1.416 0.855
Schmid_CPJKU_task1_1 Schmid2022 2 1.092 1.435 0.773 0.997 1.173 0.593 1.549 1.099 1.628 0.901 0.770
Schmid_CPJKU_task1_2 Schmid2022 4 1.105 1.358 0.759 1.043 1.217 0.451 1.609 1.254 1.616 0.896 0.851
Schmid_CPJKU_task1_3 Schmid2022 1 1.091 1.430 0.790 0.932 1.120 0.502 1.521 1.129 1.709 0.953 0.822
Schmid_CPJKU_task1_4 Schmid2022 3 1.102 1.349 0.839 0.918 1.246 0.534 1.557 1.120 1.726 0.956 0.781
Schmidt_FAU_task1_1 Schmidt2022 35 1.731 2.207 1.025 1.621 1.923 1.726 2.125 1.938 2.518 1.157 1.068
Singh_Surrey_task1_1 Singh2022 28 1.565 1.919 1.420 1.175 1.754 1.398 2.065 1.270 2.082 1.350 1.219
Singh_Surrey_task1_2 Singh2022 31 1.606 1.834 1.619 1.275 1.713 1.414 2.020 1.394 2.216 1.464 1.109
Singh_Surrey_task1_3 Singh2022 23 1.492 1.656 1.604 1.242 1.696 1.289 1.893 1.239 1.898 1.363 1.041
Singh_Surrey_task1_4 Singh2022 24 1.499 1.660 1.496 1.242 1.700 1.370 1.895 1.281 1.966 1.359 1.023
Sugahara_RION_task1_1 Sugahara2022 18 1.405 1.421 1.028 1.349 1.426 0.352 2.227 1.235 2.167 1.676 1.161
Sugahara_RION_task1_2 Sugahara2022 15 1.389 1.468 1.220 1.377 1.324 0.365 1.918 1.311 2.078 1.559 1.274
Sugahara_RION_task1_3 Sugahara2022 14 1.366 1.440 1.201 1.410 1.326 0.379 1.830 1.276 2.045 1.476 1.273
Sugahara_RION_task1_4 Sugahara2022 16 1.397 1.545 0.723 1.148 1.447 1.024 2.212 1.118 2.483 1.150 1.122
Yu_XIAOMI_task1_1 Yu2022 21 1.456 1.732 1.177 1.425 1.789 1.008 1.808 1.380 1.792 1.255 1.200
Zaragoza-Paredes_UPV_task1_1 Zaragoza_Paredes2022 44 2.709 1.110 2.356 3.714 4.442 2.247 1.790 3.617 3.139 2.755 1.919
Zaragoza-Paredes_UPV_task1_2 Zaragoza_Paredes2022 46 2.904 0.995 2.967 4.176 5.117 2.167 1.582 4.277 3.045 2.690 2.026
Zhang_THUEE_task1_1 Shao2022 40 2.096 1.557 1.105 2.007 1.513 1.942 1.812 1.438 1.618 5.762 2.208
Zhang_THUEE_task1_2 Shao2022 48 3.068 2.807 2.400 1.595 1.378 3.292 10.393 1.345 1.799 4.168 1.503
Zou_PKU_task1_1 Xin2022 20 1.442 1.971 0.724 1.329 1.486 0.639 2.221 1.269 2.379 1.247 1.150

Accuracy

Rank Submission label Technical
Report
Official
system
rank
Accuracy Airport Bus Metro Metro
station
Park Public
square
Shopping
mall
Street
pedestrian
Street
traffic
Tram
AI4EDGE_IPL_task1_1 Anastcio2022 42 47.0 41.3 66.9 51.3 35.3 75.2 26.7 40.3 22.3 60.4 50.4
AI4EDGE_IPL_task1_2 Anastcio2022 41 46.7 40.8 68.5 53.2 39.4 74.9 24.8 34.7 21.0 61.2 48.1
AI4EDGE_IPL_task1_3 Anastcio2022 17 49.4 38.2 72.9 41.2 33.2 75.2 38.4 44.1 25.6 59.2 65.8
AI4EDGE_IPL_task1_4 Anastcio2022 11 51.6 42.1 68.9 56.2 45.8 73.8 25.5 52.7 34.7 61.4 54.8
AIT_Essex_task1_1 Pham2022 34 53.0 48.6 78.3 49.0 31.3 84.2 30.3 65.9 26.0 66.1 50.2
AIT_Essex_task1_2 Pham2022 36 51.9 44.8 84.7 38.5 29.1 76.0 32.6 66.9 26.6 65.3 54.0
AIT_Essex_task1_3 Pham2022 37 55.2 51.2 81.4 49.2 38.2 83.1 31.8 61.7 29.4 60.0 66.4
Cai_XJTLU_task1_1 Cai2022 26 47.8 37.7 57.4 46.8 33.9 69.8 23.4 60.2 36.3 56.5 55.5
Cai_XJTLU_task1_2 Cai2022 30 46.4 39.7 48.8 41.7 38.0 63.9 28.3 62.7 28.8 50.2 62.0
Cai_XJTLU_task1_3 Cai2022 33 45.2 40.4 61.5 44.5 28.2 68.1 17.4 53.6 32.2 48.3 57.6
Cai_XJTLU_task1_4 Cai2022 27 48.0 46.2 53.3 46.2 32.8 61.8 26.0 62.6 33.3 56.5 61.5
Cao_SCUT_task1_1 Cao2022 45 48.7 40.1 58.5 42.1 38.5 72.3 24.2 52.0 35.7 69.0 54.1
Chang_HYU_task1_1 Lee2022 5 60.8 39.7 81.8 61.4 62.3 83.5 35.1 58.0 39.5 74.4 72.5
Chang_HYU_task1_2 Lee2022 6 59.2 42.3 82.3 62.4 62.8 79.5 32.6 51.3 32.6 74.0 72.2
Chang_HYU_task1_3 Lee2022 8 59.4 43.0 79.9 63.5 59.6 81.9 34.2 52.1 35.3 72.2 71.9
Chang_HYU_task1_4 Lee2022a 7 59.3 37.6 83.2 62.3 63.2 80.0 34.6 49.4 34.8 74.5 73.9
Dong_NCUT_task1_1 Dong2022 29 48.0 30.8 70.4 50.3 48.3 72.5 24.7 45.5 35.3 56.6 45.6
Houyb_XDU_task1_1 Hou2022 22 49.3 33.2 59.9 44.8 39.1 74.3 40.1 33.7 37.7 64.1 65.7
Liang_UESTC_task1_1 Liang2022 38 41.3 8.2 59.0 58.0 43.0 49.5 21.8 35.1 22.9 56.6 58.5
Liang_UESTC_task1_2 Liang2022 47 29.9 3.3 35.8 71.6 48.8 23.8 16.2 9.0 14.8 20.0 55.9
Liang_UESTC_task1_3 Liang2022 43 28.5 8.5 26.4 72.3 64.4 13.8 12.5 13.0 14.4 18.8 40.5
Liang_UESTC_task1_4 Liang2022 32 44.1 25.2 44.2 29.7 43.1 69.3 17.1 57.1 39.9 63.7 51.7
DCASE2022 baseline 44.2 32.2 50.6 37.9 39.8 52.2 25.7 58.2 27.9 64.4 53.4
Morocutti_JKU_task1_1 Morocutti2022 12 53.8 49.5 72.6 49.6 42.7 77.7 37.3 45.9 25.6 69.2 67.8
Morocutti_JKU_task1_2 Morocutti2022 13 53.0 41.1 68.7 51.4 43.4 80.3 36.3 48.8 33.2 66.0 60.5
Morocutti_JKU_task1_3 Morocutti2022 10 54.7 42.7 77.4 52.3 42.8 78.5 39.8 54.2 26.0 63.9 69.0
Morocutti_JKU_task1_4 Morocutti2022 9 54.5 48.5 75.0 48.0 47.0 79.4 37.5 43.5 27.3 68.3 70.1
Olisaemeka_ARU_task1_1 Olisaemeka2022 39 36.4 31.8 27.1 50.5 30.9 44.5 17.2 46.9 20.1 54.0 40.5
Park_KT_task1_1 Kim2022 25 51.7 44.0 64.2 42.5 39.8 67.1 25.5 58.0 42.2 58.5 74.9
Park_KT_task1_2 Kim2022 19 52.7 57.5 64.3 49.8 36.4 64.1 38.6 55.2 29.0 61.0 71.1
Schmid_CPJKU_task1_1 Schmid2022 2 59.7 48.0 76.8 63.8 58.3 82.7 43.2 57.1 32.0 68.9 66.6
Schmid_CPJKU_task1_2 Schmid2022 4 59.6 51.2 78.2 62.6 58.3 88.5 41.3 52.1 30.8 69.5 63.7
Schmid_CPJKU_task1_3 Schmid2022 1 59.6 47.4 75.5 66.1 60.4 85.4 43.6 55.1 31.8 66.8 64.4
Schmid_CPJKU_task1_4 Schmid2022 3 59.4 53.1 74.7 67.5 53.8 85.8 42.6 56.9 27.4 66.5 65.7
Schmidt_FAU_task1_1 Schmidt2022 35 47.5 31.6 66.4 48.1 40.9 53.5 33.3 44.4 26.2 66.1 64.2
Singh_Surrey_task1_1 Singh2022 28 44.6 23.9 43.7 54.1 36.6 61.4 32.0 62.4 24.2 59.5 48.3
Singh_Surrey_task1_2 Singh2022 31 44.3 29.3 37.3 48.8 39.5 62.9 37.2 58.3 17.9 56.8 55.1
Singh_Surrey_task1_3 Singh2022 23 45.9 29.1 33.0 49.8 38.6 63.3 36.7 62.4 24.6 60.0 61.2
Singh_Surrey_task1_4 Singh2022 24 45.9 29.8 37.3 49.2 39.0 61.5 37.1 61.7 22.7 60.0 60.8
Sugahara_RION_task1_1 Sugahara2022 18 51.5 45.4 66.0 49.8 48.8 90.8 24.2 53.4 31.6 44.5 60.1
Sugahara_RION_task1_2 Sugahara2022 15 51.6 42.5 61.0 52.3 53.0 90.4 29.0 52.5 33.6 48.8 53.0
Sugahara_RION_task1_3 Sugahara2022 14 51.7 42.5 61.1 52.1 53.0 90.5 29.1 52.7 33.5 48.9 53.1
Sugahara_RION_task1_4 Sugahara2022 16 52.7 36.3 76.5 54.1 47.8 73.7 26.7 65.4 23.9 63.5 59.8
Yu_XIAOMI_task1_1 Yu2022 21 46.2 39.5 59.9 43.8 32.6 68.7 37.0 51.8 24.7 62.7 42.0
Zaragoza-Paredes_UPV_task1_1 Zaragoza_Paredes2022 44 43.8 64.2 48.8 29.9 30.8 64.9 54.4 23.3 14.1 58.9 49.0
Zaragoza-Paredes_UPV_task1_2 Zaragoza_Paredes2022 46 41.9 67.5 41.5 27.5 26.0 64.2 56.9 15.2 12.1 58.7 50.0
Zhang_THUEE_task1_1 Shao2022 40 54.9 39.9 74.9 55.4 47.3 82.2 36.0 48.2 37.9 62.6 64.9
Zhang_THUEE_task1_2 Shao2022 48 54.4 42.1 80.4 50.1 47.7 81.3 34.2 45.6 39.1 64.0 59.4
Zou_PKU_task1_1 Xin2022 20 56.3 37.6 74.9 55.9 56.2 81.6 33.1 62.1 34.3 67.0 60.1

Device-wise performance

Log loss

Unseen devices Seen devices
Rank Submission label Technical
Report
Official
system
rank
Log loss Log loss /
Unseen
Log loss /
Seen
D S7 S8 S9 S10 A B C S1 S2 S3
AI4EDGE_IPL_task1_1 Anastcio2022 42 2.414 2.820 2.076 3.553 2.482 2.701 2.691 2.672 1.734 2.025 1.789 2.325 2.385 2.200
AI4EDGE_IPL_task1_2 Anastcio2022 41 2.365 2.923 1.900 3.623 2.602 2.706 2.707 2.977 1.582 1.797 1.684 2.160 2.182 1.995
AI4EDGE_IPL_task1_3 Anastcio2022 17 1.398 1.588 1.240 1.782 1.511 1.592 1.556 1.500 1.168 1.220 1.154 1.255 1.359 1.286
AI4EDGE_IPL_task1_4 Anastcio2022 11 1.330 1.467 1.215 1.775 1.350 1.465 1.374 1.370 1.123 1.214 1.153 1.211 1.378 1.213
AIT_Essex_task1_1 Pham2022 34 1.636 1.806 1.494 2.479 1.380 1.653 1.980 1.541 1.355 1.775 1.513 1.357 1.671 1.292
AIT_Essex_task1_2 Pham2022 36 1.787 2.138 1.494 2.958 1.699 1.937 2.136 1.961 1.323 1.580 1.331 1.532 1.713 1.485
AIT_Essex_task1_3 Pham2022 37 1.808 2.258 1.434 2.959 1.692 1.897 2.581 2.159 1.205 1.559 1.481 1.492 1.470 1.395
Cai_XJTLU_task1_1 Cai2022 26 1.515 1.847 1.238 2.264 1.361 1.539 2.193 1.876 1.025 1.324 1.125 1.288 1.355 1.313
Cai_XJTLU_task1_2 Cai2022 30 1.580 1.920 1.297 2.219 1.372 1.600 2.504 1.906 1.089 1.366 1.175 1.346 1.422 1.386
Cai_XJTLU_task1_3 Cai2022 33 1.635 2.059 1.282 2.418 1.414 1.759 2.704 1.998 1.049 1.397 1.149 1.329 1.400 1.368
Cai_XJTLU_task1_4 Cai2022 27 1.564 1.916 1.270 2.789 1.323 1.454 2.081 1.932 1.045 1.372 1.112 1.341 1.429 1.322
Cao_SCUT_task1_1 Cao2022 45 2.795 3.746 2.003 9.920 2.145 2.105 2.184 2.375 1.774 2.110 1.902 2.059 2.011 2.159
Chang_HYU_task1_1 Lee2022 5 1.147 1.377 0.956 1.744 1.065 1.185 1.418 1.475 0.839 0.977 0.879 1.008 1.015 1.016
Chang_HYU_task1_2 Lee2022 6 1.187 1.426 0.987 1.823 1.126 1.127 1.444 1.610 0.884 0.978 0.927 1.054 1.038 1.043
Chang_HYU_task1_3 Lee2022 8 1.190 1.428 0.992 1.976 1.080 1.186 1.374 1.526 0.882 0.985 0.928 1.028 1.083 1.045
Chang_HYU_task1_4 Lee2022a 7 1.187 1.433 0.982 1.846 1.132 1.137 1.440 1.611 0.856 0.965 0.936 1.051 1.042 1.043
Dong_NCUT_task1_1 Dong2022 29 1.568 1.872 1.314 2.012 1.547 1.630 2.315 1.857 1.083 1.239 1.170 1.433 1.494 1.463
Houyb_XDU_task1_1 Hou2022 22 1.481 1.740 1.265 2.361 1.557 1.451 1.724 1.609 1.028 1.262 1.132 1.317 1.476 1.375
Liang_UESTC_task1_1 Liang2022 38 1.934 2.289 1.637 3.535 1.861 1.950 1.726 2.373 1.291 1.428 1.493 2.011 1.789 1.813
Liang_UESTC_task1_2 Liang2022 47 2.916 3.346 2.557 4.393 2.989 2.820 2.730 3.798 1.890 2.140 2.204 3.056 3.097 2.956
Liang_UESTC_task1_3 Liang2022 43 2.701 3.063 2.400 3.483 2.931 2.782 2.601 3.516 1.805 2.093 1.973 2.992 2.757 2.780
Liang_UESTC_task1_4 Liang2022 32 1.612 1.690 1.546 2.074 1.588 1.600 1.527 1.660 1.448 1.663 1.509 1.569 1.524 1.566
DCASE2022 baseline 1.532 1.725 1.372 1.894 1.485 1.573 1.864 1.807 1.108 1.360 1.299 1.478 1.528 1.460
Morocutti_JKU_task1_1 Morocutti2022 12 1.339 1.509 1.197 1.884 1.259 1.379 1.602 1.420 1.028 1.320 1.156 1.215 1.275 1.189
Morocutti_JKU_task1_2 Morocutti2022 13 1.355 1.512 1.224 1.907 1.340 1.467 1.424 1.420 1.025 1.351 1.186 1.220 1.314 1.250
Morocutti_JKU_task1_3 Morocutti2022 10 1.320 1.508 1.162 1.982 1.232 1.372 1.541 1.414 0.995 1.263 1.107 1.192 1.235 1.182
Morocutti_JKU_task1_4 Morocutti2022 9 1.311 1.480 1.170 1.937 1.240 1.337 1.511 1.376 1.004 1.275 1.116 1.191 1.259 1.175
Olisaemeka_ARU_task1_1 Olisaemeka2022 39 2.055 2.515 1.671 2.871 2.236 2.247 2.652 2.567 1.389 1.583 1.488 1.822 1.915 1.832
Park_KT_task1_1 Kim2022 25 1.504 1.768 1.284 1.942 1.504 1.562 1.860 1.970 1.020 1.326 1.140 1.331 1.405 1.480
Park_KT_task1_2 Kim2022 19 1.431 1.624 1.270 1.979 1.339 1.418 1.657 1.728 1.060 1.323 1.081 1.387 1.456 1.313
Schmid_CPJKU_task1_1 Schmid2022 2 1.092 1.236 0.972 1.431 1.058 1.042 1.305 1.343 0.783 1.048 0.877 1.026 1.086 1.010
Schmid_CPJKU_task1_2 Schmid2022 4 1.105 1.218 1.011 1.548 1.075 1.048 1.171 1.248 0.847 1.077 0.932 1.076 1.098 1.039
Schmid_CPJKU_task1_3 Schmid2022 1 1.091 1.231 0.974 1.507 1.042 1.038 1.272 1.299 0.792 1.046 0.915 1.020 1.084 0.984
Schmid_CPJKU_task1_4 Schmid2022 3 1.102 1.229 0.997 1.464 1.075 1.047 1.248 1.311 0.817 1.075 0.900 1.047 1.108 1.035
Schmidt_FAU_task1_1 Schmidt2022 35 1.731 2.139 1.390 3.108 1.589 1.652 2.011 2.337 1.150 1.435 1.364 1.428 1.486 1.480
Singh_Surrey_task1_1 Singh2022 28 1.565 1.835 1.341 1.909 1.511 1.505 2.079 2.168 1.081 1.313 1.230 1.465 1.488 1.468
Singh_Surrey_task1_2 Singh2022 31 1.606 1.898 1.362 2.122 1.493 1.516 2.151 2.207 1.090 1.355 1.233 1.483 1.516 1.496
Singh_Surrey_task1_3 Singh2022 23 1.492 1.728 1.296 1.808 1.413 1.445 1.972 1.999 1.040 1.300 1.166 1.407 1.448 1.416
Singh_Surrey_task1_4 Singh2022 24 1.499 1.754 1.287 1.820 1.413 1.453 2.040 2.042 1.030 1.282 1.165 1.406 1.437 1.404
Sugahara_RION_task1_1 Sugahara2022 18 1.405 1.576 1.261 1.866 1.461 1.216 1.378 1.960 1.032 1.353 1.180 1.505 1.236 1.263
Sugahara_RION_task1_2 Sugahara2022 15 1.389 1.534 1.269 1.798 1.373 1.204 1.414 1.880 1.016 1.422 1.174 1.431 1.275 1.298
Sugahara_RION_task1_3 Sugahara2022 14 1.366 1.496 1.257 1.757 1.344 1.196 1.396 1.785 1.011 1.409 1.172 1.401 1.269 1.280
Sugahara_RION_task1_4 Sugahara2022 16 1.397 1.691 1.152 2.857 1.225 1.235 1.467 1.670 1.028 1.248 1.124 1.166 1.178 1.171
Yu_XIAOMI_task1_1 Yu2022 21 1.456 1.619 1.321 1.708 1.383 1.470 1.974 1.561 1.040 1.388 1.143 1.419 1.540 1.396
Zaragoza-Paredes_UPV_task1_1 Zaragoza_Paredes2022 44 2.709 3.181 2.315 3.420 2.402 2.540 4.638 2.906 1.861 2.791 1.976 2.152 2.471 2.641
Zaragoza-Paredes_UPV_task1_2 Zaragoza_Paredes2022 46 2.904 3.254 2.613 3.502 2.669 2.861 4.235 3.004 2.122 3.252 2.281 2.326 2.752 2.943
Zhang_THUEE_task1_1 Shao2022 40 2.096 2.435 1.814 2.891 1.896 2.103 2.839 2.444 1.589 1.909 1.585 1.927 1.942 1.934
Zhang_THUEE_task1_2 Shao2022 48 3.068 4.008 2.284 4.883 2.930 3.463 5.471 3.294 2.096 2.320 2.148 2.332 2.533 2.277
Zou_PKU_task1_1 Xin2022 20 1.442 1.842 1.108 2.401 1.314 1.401 2.214 1.882 0.899 1.053 0.932 1.194 1.288 1.281

Accuracy

Unseen devices Seen devices
Rank Submission label Technical
Report
Official
system
rank
Accuracy Accuracy /
Unseen
Accuracy /
Seen
D S7 S8 S9 S10 A B C S1 S2 S3
AI4EDGE_IPL_task1_1 Anastcio2022 42 47.0 41.6 51.5 36.2 43.3 43.1 41.8 43.4 54.4 52.1 53.8 48.8 49.0 51.0
AI4EDGE_IPL_task1_2 Anastcio2022 41 46.7 41.2 51.2 35.6 43.6 43.6 41.3 42.1 54.1 52.1 53.4 49.5 47.3 50.9
AI4EDGE_IPL_task1_3 Anastcio2022 17 49.4 42.5 55.1 34.1 44.9 44.4 43.9 45.2 58.1 55.6 58.2 54.7 50.7 53.4
AI4EDGE_IPL_task1_4 Anastcio2022 11 51.6 46.5 55.9 35.5 49.6 47.6 50.2 49.4 60.3 55.5 57.7 56.1 49.6 56.0
AIT_Essex_task1_1 Pham2022 34 53.0 50.1 55.4 42.2 56.7 52.2 45.8 53.6 62.6 49.1 56.2 56.7 50.1 57.8
AIT_Essex_task1_2 Pham2022 36 51.9 47.2 55.8 39.8 52.3 49.1 45.7 49.0 63.0 52.0 60.7 53.6 50.9 54.4
AIT_Essex_task1_3 Pham2022 37 55.2 49.9 59.7 42.9 56.4 52.6 46.0 51.5 67.3 56.0 61.1 58.4 56.3 59.2
Cai_XJTLU_task1_1 Cai2022 26 47.8 40.7 53.7 34.4 49.5 45.9 33.2 40.3 62.0 50.6 58.3 50.8 49.3 50.9
Cai_XJTLU_task1_2 Cai2022 30 46.4 40.2 51.6 37.5 49.3 45.1 29.5 39.6 60.0 49.2 56.0 49.1 46.0 49.2
Cai_XJTLU_task1_3 Cai2022 33 45.2 36.6 52.3 33.2 46.0 40.6 26.0 37.3 62.8 46.8 56.4 49.3 48.8 49.7
Cai_XJTLU_task1_4 Cai2022 27 48.0 41.9 53.1 35.2 50.3 49.4 35.9 38.9 62.2 49.3 58.0 49.7 48.2 51.1
Cao_SCUT_task1_1 Cao2022 45 48.7 44.0 52.5 29.2 49.4 47.5 47.4 46.6 61.2 48.6 55.2 50.9 50.4 48.9
Chang_HYU_task1_1 Lee2022 5 60.8 55.1 65.6 46.4 63.2 59.5 54.5 52.1 70.3 64.8 67.2 64.0 63.3 63.7
Chang_HYU_task1_2 Lee2022 6 59.2 52.0 65.2 39.7 60.3 59.3 51.8 48.9 68.8 65.1 66.8 63.4 63.6 63.5
Chang_HYU_task1_3 Lee2022 8 59.4 52.6 65.0 37.0 62.3 58.5 54.4 50.7 69.2 64.4 67.0 64.6 61.8 63.1
Chang_HYU_task1_4 Lee2022a 7 59.3 51.9 65.5 39.9 60.2 59.4 52.1 48.1 70.2 65.4 66.8 63.4 63.4 63.6
Dong_NCUT_task1_1 Dong2022 29 48.0 38.8 55.6 31.9 48.5 44.7 28.6 40.6 64.7 57.8 60.3 51.0 49.9 50.0
Houyb_XDU_task1_1 Hou2022 22 49.3 42.8 54.6 31.4 46.9 49.4 43.6 42.8 62.7 53.9 59.2 52.4 48.9 50.7
Liang_UESTC_task1_1 Liang2022 38 41.3 36.2 45.5 25.4 41.0 40.8 44.2 29.6 55.8 49.7 49.1 36.7 41.1 40.4
Liang_UESTC_task1_2 Liang2022 47 29.9 26.4 32.9 22.4 27.6 29.6 30.7 21.5 43.3 36.3 37.0 25.9 26.9 27.8
Liang_UESTC_task1_3 Liang2022 43 28.5 24.6 31.7 21.4 25.0 26.7 28.8 21.0 41.4 34.3 37.2 24.4 26.7 26.4
Liang_UESTC_task1_4 Liang2022 32 44.1 41.8 46.1 31.6 44.4 44.1 46.6 42.1 49.1 42.6 47.2 45.7 46.5 45.2
DCASE2022 baseline 44.2 38.1 49.4 33.2 45.0 42.6 34.0 35.4 59.9 50.3 53.5 44.3 43.4 44.8
Morocutti_JKU_task1_1 Morocutti2022 12 53.8 48.6 58.1 37.8 54.9 52.1 48.2 49.8 64.3 53.2 60.6 57.0 54.9 58.5
Morocutti_JKU_task1_2 Morocutti2022 13 53.0 47.8 57.3 35.9 52.4 51.1 50.3 49.3 65.9 52.1 58.3 56.2 53.5 57.6
Morocutti_JKU_task1_3 Morocutti2022 10 54.7 48.8 59.5 35.1 55.3 53.7 48.9 51.2 66.0 55.8 61.2 58.5 56.8 59.0
Morocutti_JKU_task1_4 Morocutti2022 9 54.5 48.8 59.2 36.6 55.1 53.5 48.5 50.1 65.4 55.0 61.7 57.7 56.6 59.0
Olisaemeka_ARU_task1_1 Olisaemeka2022 39 36.4 28.4 43.0 24.9 33.7 34.7 23.6 25.3 54.1 44.4 48.0 37.7 36.1 37.5
Park_KT_task1_1 Kim2022 25 51.7 46.3 56.1 40.5 51.6 52.1 44.1 43.4 64.8 54.2 60.6 53.1 52.6 51.4
Park_KT_task1_2 Kim2022 19 52.7 48.4 56.2 40.9 54.8 53.5 46.1 46.8 62.9 53.4 62.6 52.4 51.4 54.7
Schmid_CPJKU_task1_1 Schmid2022 2 59.7 54.5 64.1 46.4 61.1 61.4 51.1 52.5 71.5 61.2 69.1 62.0 58.7 62.2
Schmid_CPJKU_task1_2 Schmid2022 4 59.6 55.3 63.2 43.8 60.7 61.7 56.0 54.5 69.5 60.3 66.7 61.0 59.7 61.9
Schmid_CPJKU_task1_3 Schmid2022 1 59.6 54.8 63.7 46.4 61.7 61.1 52.0 52.9 71.0 60.9 66.7 61.9 58.9 62.7
Schmid_CPJKU_task1_4 Schmid2022 3 59.4 54.9 63.1 47.3 60.6 61.9 52.2 52.8 70.5 60.0 67.2 61.1 58.4 61.2
Schmidt_FAU_task1_1 Schmidt2022 35 47.5 40.7 53.2 30.6 48.2 47.9 41.4 35.3 61.4 50.7 53.9 51.9 49.9 51.1
Singh_Surrey_task1_1 Singh2022 28 44.6 37.8 50.3 36.9 44.4 44.5 32.9 30.5 61.0 49.9 56.4 45.6 44.1 44.6
Singh_Surrey_task1_2 Singh2022 31 44.3 37.8 49.8 34.9 45.1 44.8 32.6 31.6 60.8 49.0 55.7 45.4 42.4 45.4
Singh_Surrey_task1_3 Singh2022 23 45.9 39.1 51.5 37.7 46.5 46.0 32.4 32.8 62.1 50.0 57.5 47.3 45.1 47.2
Singh_Surrey_task1_4 Singh2022 24 45.9 38.8 51.9 37.9 46.5 45.7 31.8 32.0 62.2 50.8 58.1 47.5 45.2 47.4
Sugahara_RION_task1_1 Sugahara2022 18 51.5 46.9 55.3 39.3 50.7 55.3 49.6 39.5 63.0 51.8 56.9 49.1 56.0 54.9
Sugahara_RION_task1_2 Sugahara2022 15 51.6 47.8 54.8 42.1 51.9 55.2 49.1 40.9 63.3 50.6 58.0 48.8 54.1 54.1
Sugahara_RION_task1_3 Sugahara2022 14 51.7 47.9 54.8 42.1 51.9 55.2 49.1 41.0 63.3 50.6 58.0 48.8 54.1 54.2
Sugahara_RION_task1_4 Sugahara2022 16 52.7 46.8 57.7 29.1 55.5 55.8 49.7 44.2 61.7 55.2 59.0 56.5 57.1 56.5
Yu_XIAOMI_task1_1 Yu2022 21 46.2 40.6 51.0 34.9 48.3 44.5 32.0 43.1 62.1 47.7 58.1 47.3 43.1 47.6
Zaragoza-Paredes_UPV_task1_1 Zaragoza_Paredes2022 44 43.8 40.1 47.0 42.8 42.8 41.9 34.0 38.8 54.5 42.8 52.6 45.9 42.7 43.5
Zaragoza-Paredes_UPV_task1_2 Zaragoza_Paredes2022 46 41.9 38.8 44.6 41.4 40.5 40.0 35.0 37.1 51.8 40.1 49.7 44.3 40.8 40.7
Zhang_THUEE_task1_1 Shao2022 40 54.9 47.0 61.5 40.6 51.8 53.7 37.9 51.1 69.1 60.5 65.3 58.0 58.5 57.9
Zhang_THUEE_task1_2 Shao2022 48 54.4 45.7 61.6 38.4 53.1 51.0 35.0 51.2 67.5 60.0 65.7 57.8 58.9 59.9
Zou_PKU_task1_1 Xin2022 20 56.3 48.6 62.7 39.8 58.5 55.4 42.3 47.1 69.9 62.8 68.1 59.1 58.2 57.8

System characteristics

General characteristics

Rank Submission label Technical
Report
Official
system
rank
Logloss
(Eval)
Accuracy
(Eval)
Sampling
rate
Data
augmentation
Features Embeddings
AI4EDGE_IPL_task1_1 Anastcio2022 42 2.414 47.0 8kHz pitch shifting, time stretching, mixup, time masking, frequency masking log-mel energies
AI4EDGE_IPL_task1_2 Anastcio2022 41 2.365 46.7 8kHz pitch shifting, time stretching, mixup, time masking, frequency masking log-mel energies
AI4EDGE_IPL_task1_3 Anastcio2022 17 1.398 49.4 8kHz pitch shifting, time stretching, mixup, time masking, frequency masking log-mel energies
AI4EDGE_IPL_task1_4 Anastcio2022 11 1.330 51.6 8kHz pitch shifting, time stretching, mixup, time masking, frequency masking log-mel energies
AIT_Essex_task1_1 Pham2022 34 1.636 53.0 44.1kHz mixup, random cropping, SpecAugment CQT, Gammatonegram, Mel
AIT_Essex_task1_2 Pham2022 36 1.787 51.9 44.1kHz mixup, random cropping, SpecAugment CQT, Gammatonegram, Mel
AIT_Essex_task1_3 Pham2022 37 1.808 55.2 44.1kHz mixup, random cropping, SpecAugment CQT, Gammatonegram, Mel
Cai_XJTLU_task1_1 Cai2022 26 1.515 47.8 44.1kHz log-mel energies
Cai_XJTLU_task1_2 Cai2022 30 1.580 46.4 44.1kHz log-mel energies
Cai_XJTLU_task1_3 Cai2022 33 1.635 45.2 44.1kHz mixup, pitch shifting, spectrum correction log-mel energies
Cai_XJTLU_task1_4 Cai2022 27 1.564 48.0 44.1kHz mixup, pitch shifting, spectrum correction log-mel energies
Cao_SCUT_task1_1 Cao2022 45 2.795 48.7 44.1kHz mixup, time stretching,pitch shifting,spectrum correction log-mel energies
Chang_HYU_task1_1 Lee2022 5 1.147 60.8 16kHz mixup, SpecAugment, time masking, frequency masking, temporal shuffle log-mel energies
Chang_HYU_task1_2 Lee2022 6 1.187 59.2 16kHz mixup, SpecAugment, time masking, frequency masking, temporal shuffle log-mel energies
Chang_HYU_task1_3 Lee2022 8 1.190 59.4 16kHz mixup, SpecAugment, time masking, frequency masking, temporal shuffle log-mel energies
Chang_HYU_task1_4 Lee2022a 7 1.187 59.3 16kHz mixup, SpecAugment, time masking, frequency masking, temporal shuffle log-mel energies
Dong_NCUT_task1_1 Dong2022 29 1.568 48.0 44.1kHz mixup, SpecAugment log-mel energies,delta and delta-delta
Houyb_XDU_task1_1 Hou2022 22 1.481 49.3 44.1kHz SpecAugment, mixup log-mel energies
Liang_UESTC_task1_1 Liang2022 38 1.934 41.3 44.1kHz time masking, frequency masking, time warping, mixup log-mel energies
Liang_UESTC_task1_2 Liang2022 47 2.916 29.9 44.1kHz time masking, frequency masking, time warping, mixup log-mel energies
Liang_UESTC_task1_3 Liang2022 43 2.701 28.5 44.1kHz time masking, frequency masking, time warping, frequency warping, mixup log-mel energies
Liang_UESTC_task1_4 Liang2022 32 1.612 44.1 44.1kHz noise addition, pitch shifting, speed changing, time masking, mixup log-mel energies
DCASE2022 baseline 1.532 44.2 44.1kHz log-mel energies
Morocutti_JKU_task1_1 Morocutti2022 12 1.339 53.8 22.05kHz mixup, pitch shifting, time stretching, shifting, adding gaussian noise mel-spectrogram
Morocutti_JKU_task1_2 Morocutti2022 13 1.355 53.0 22.05kHz mixup, pitch shifting, time stretching, shifting, adding gaussian noise mel-spectrogram
Morocutti_JKU_task1_3 Morocutti2022 10 1.320 54.7 22.05kHz mixup, pitch shifting, time stretching, shifting, adding gaussian noise mel-spectrogram
Morocutti_JKU_task1_4 Morocutti2022 9 1.311 54.5 22.05kHz mixup, pitch shifting, time stretching, shifting, adding gaussian noise mel-spectrogram
Olisaemeka_ARU_task1_1 Olisaemeka2022 39 2.055 36.4 44.1kHz log-mel energies
Park_KT_task1_1 Kim2022 25 1.504 51.7 22.05kHz SpecAugment log-mel energies
Park_KT_task1_2 Kim2022 19 1.431 52.7 22.05kHz SpecAugment log-mel energies
Schmid_CPJKU_task1_1 Schmid2022 2 1.092 59.7 32.0kHz mixup, mixstyle, pitch shifting log-mel energies
Schmid_CPJKU_task1_2 Schmid2022 4 1.105 59.6 32.0kHz mixstyle, pitch shifting log-mel energies
Schmid_CPJKU_task1_3 Schmid2022 1 1.091 59.6 32.0kHz mixstyle, pitch shifting log-mel energies
Schmid_CPJKU_task1_4 Schmid2022 3 1.102 59.4 32.0kHz mixup, mixstyle, pitch shifting log-mel energies
Schmidt_FAU_task1_1 Schmidt2022 35 1.731 47.5 16kHz mixup, rolling, SpecAugment log-mel energies
Singh_Surrey_task1_1 Singh2022 28 1.565 44.6 44.1kHz log-mel energies
Singh_Surrey_task1_2 Singh2022 31 1.606 44.3 44.1kHz log-mel energies
Singh_Surrey_task1_3 Singh2022 23 1.492 45.9 44.1kHz log-mel energies
Singh_Surrey_task1_4 Singh2022 24 1.499 45.9 44.1kHz log-mel energies
Sugahara_RION_task1_1 Sugahara2022 18 1.405 51.5 44.1kHz mixup, SpecAugment, spectrum modulation log-mel energies, deltas
Sugahara_RION_task1_2 Sugahara2022 15 1.389 51.6 44.1kHz mixup, SpecAugment, spectrum modulation log-mel energies, deltas
Sugahara_RION_task1_3 Sugahara2022 14 1.366 51.7 44.1kHz mixup, SpecAugment, spectrum modulation log-mel energies, deltas
Sugahara_RION_task1_4 Sugahara2022 16 1.397 52.7 44.1kHz mixup, SpecAugment, spectrum modulation log-mel energies, deltas
Yu_XIAOMI_task1_1 Yu2022 21 1.456 46.2 44.1kHz log-mel energies, spectral entropy, spectral flatness dilated-CNN
Zaragoza-Paredes_UPV_task1_1 Zaragoza_Paredes2022 44 2.709 43.8 44.1kHz log-mel energies
Zaragoza-Paredes_UPV_task1_2 Zaragoza_Paredes2022 46 2.904 41.9 44.1kHz log-mel energies
Zhang_THUEE_task1_1 Shao2022 40 2.096 54.9 44.1kHz mixup, ImageDataGenerator, temporal crop, Auto levels, pix2pix log-mel energies
Zhang_THUEE_task1_2 Shao2022 48 3.068 54.4 44.1kHz mixup, ImageDataGenerator, temporal crop, Auto levels, pix2pix log-mel energies
Zou_PKU_task1_1 Xin2022 20 1.442 56.3 44.1kHz SpecAugment++, time shifting spectrogram CNN6



Machine learning characteristics

Rank Code Technical
Report
Official
system
rank
Logloss
(Eval)
Accuracy
(Eval)
External
data usage
External
data sources
Model
complexity
Model
MACS
Classifier Ensemble
subsystems
Decision
making
Framework Pipeline
AI4EDGE_IPL_task1_1 Anastcio2022 42 2.414 47.0 68918 21127552 CNN, ensemble 2 keras/tensorflow pretraining, ensemble, training, weight quantization
AI4EDGE_IPL_task1_2 Anastcio2022 41 2.365 46.7 68918 21127552 CNN, ensemble 2 keras/tensorflow pretraining, ensemble, training, weight quantization
AI4EDGE_IPL_task1_3 Anastcio2022 17 1.398 49.4 51986 25475456 CNN, ensemble 10 keras/tensorflow pretraining, ensemble, training, knowledge distillation, weight quantization
AI4EDGE_IPL_task1_4 Anastcio2022 11 1.330 51.6 51986 25475456 CNN, ensemble 10 keras/tensorflow pretraining, ensemble, training, knowledge distillation, weight quantization
AIT_Essex_task1_1 Pham2022 34 1.636 53.0 33822 900000 CNN 3 late fusion of predicted probabilities tensorflow training
AIT_Essex_task1_2 Pham2022 36 1.787 51.9 31902 750000 CNN 3 late fusion of predicted probabilities tensorflow training
AIT_Essex_task1_3 Pham2022 37 1.808 55.2 115998 900000 CNN 3 late fusion of predicted probabilities tensorflow training
Cai_XJTLU_task1_1 Cai2022 26 1.515 47.8 25526 6287030 CNN pytorch
Cai_XJTLU_task1_2 Cai2022 30 1.580 46.4 25526 6287030 CNN pytorch
Cai_XJTLU_task1_3 Cai2022 33 1.635 45.2 35926 7337718 CNN pytorch
Cai_XJTLU_task1_4 Cai2022 27 1.564 48.0 35926 7337718 CNN pytorch
Cao_SCUT_task1_1 Cao2022 45 2.795 48.7 embeddings, pre-trained model 125330 8637250 BC-ResNet, CNN pytorch pretraining, training, adaptation
Chang_HYU_task1_1 Lee2022 5 1.147 60.8 directly 126580 26763000 CNN, BC-Res2Net categorical cross entropy pytorch pretraining, weight quantization, fine tuning, data-random-drop
Chang_HYU_task1_2 Lee2022 6 1.187 59.2 directly 126580 26763000 CNN, BC-Res2Net categorical cross entropy pytorch pretraining, weight quantization, fine tuning, data-random-drop
Chang_HYU_task1_3 Lee2022 8 1.190 59.4 directly 126580 26763000 CNN, BC-Res2Net categorical cross entropy pytorch pretraining, weight quantization, fine tuning, data-random-drop
Chang_HYU_task1_4 Lee2022a 7 1.187 59.3 directly 126580 26763000 CNN, BC-Res2Net categorical cross entropy pytorch pretraining, weight quantization, fine tuning, data-random-drop
Dong_NCUT_task1_1 Dong2022 29 1.568 48.0 70608 28461216 FHR_Mobilenet average keras/tensorflow training, weight quantization
Houyb_XDU_task1_1 Hou2022 22 1.481 49.3 embeddings 57957 28513000 CNN pytorch data augment, training, adaptation, weight quantization
Liang_UESTC_task1_1 Liang2022 38 1.934 41.3 85800 20500000 BC-ResNet pytorch training, knowledge distillation, weight quantization
Liang_UESTC_task1_2 Liang2022 47 2.916 29.9 85800 20500000 BC-ResNet pytorch training, knowledge distillation, weight quantization
Liang_UESTC_task1_3 Liang2022 43 2.701 28.5 85800 20500000 BC-ResNet pytorch training, knowledge distillation, weight quantization
Liang_UESTC_task1_4 Liang2022 32 1.612 44.1 110452 11186000 MobileNetV2 keras/tensorflow training, adaptation, pruning, weight quantization
DCASE2022 baseline 1.532 44.2 46512 29234920 CNN keras/tensorflow pretraining, training, adaptation, pruning, weight quantization
Morocutti_JKU_task1_1 Morocutti2022 12 1.339 53.8 65790 29325000 ensemble, CNN 3 average pytorch preprocessing, training teacher, training student, weight quantization
Morocutti_JKU_task1_2 Morocutti2022 13 1.355 53.0 65790 29325000 ensemble, CNN 3 average pytorch preprocessing, training teacher, training student, weight quantization
Morocutti_JKU_task1_3 Morocutti2022 10 1.320 54.7 65790 29325000 ensemble, CNN 3 average pytorch preprocessing, training teacher, training student, weight quantization
Morocutti_JKU_task1_4 Morocutti2022 9 1.311 54.5 65790 29325000 ensemble, CNN 3 average pytorch preprocessing, training teacher, training student, weight quantization
Olisaemeka_ARU_task1_1 Olisaemeka2022 39 2.055 36.4 96473 3283692 CNN keras/tensorflow pretraining, training, weight quantization
Park_KT_task1_1 Kim2022 25 1.504 51.7 113378 29481000 CNN pytorch training, weight quantization
Park_KT_task1_2 Kim2022 19 1.431 52.7 113378 29481000 CNN pytorch training, weight quantization
Schmid_CPJKU_task1_1 Schmid2022 2 1.092 59.7 pre-trained model PaSST 127046 29056324 RF-regularized CNNs, PaSST transformer pytorch training teacher, training student, knowledge distillation, weight quantization
Schmid_CPJKU_task1_2 Schmid2022 4 1.105 59.6 pre-trained model PaSST 127046 29056324 RF-regularized CNNs, PaSST transformer pytorch training teacher, training student, knowledge distillation, weight quantization
Schmid_CPJKU_task1_3 Schmid2022 1 1.091 59.6 pre-trained model PaSST 121610 28240924 RF-regularized CNNs, PaSST transformer pytorch training teacher, training student, knowledge distillation, weight quantization
Schmid_CPJKU_task1_4 Schmid2022 3 1.102 59.4 pre-trained model PaSST, AudioSet 121610 28240924 RF-regularized CNNs, PaSST transformer pytorch training teacher, training student, knowledge distillation, weight quantization
Schmidt_FAU_task1_1 Schmidt2022 35 1.731 47.5 127943 15163468 CNN, SVM pytorch pretraining, training, pruning, weight quantization
Singh_Surrey_task1_1 Singh2022 28 1.565 44.6 directly 13138 4129320 CNN maximum likelihood keras/tensorflow training (from scratch), pruning, weight quantization
Singh_Surrey_task1_2 Singh2022 31 1.606 44.3 directly 14886 5404520 CNN maximum likelihood keras/tensorflow training (from scratch), pruning, weight quantization
Singh_Surrey_task1_3 Singh2022 23 1.492 45.9 directly 59570 18585480 CNN 5 average keras/tensorflow training (from scratch), pruning, weight quantization
Singh_Surrey_task1_4 Singh2022 24 1.499 45.9 directly 60958 19831880 CNN 5 average keras/tensorflow training (from scratch), pruning, weight quantization
Sugahara_RION_task1_1 Sugahara2022 18 1.405 51.5 120229 26607000 MobileNet weighted average pytorch training (from scratch), weight quantization
Sugahara_RION_task1_2 Sugahara2022 15 1.389 51.6 120229 26607000 MobileNet weighted average pytorch training (from scratch), weight quantization
Sugahara_RION_task1_3 Sugahara2022 14 1.366 51.7 120229 26607000 MobileNet weighted average pytorch training (from scratch), weight quantization
Sugahara_RION_task1_4 Sugahara2022 16 1.397 52.7 123346 26610000 MobileNet pytorch training (from scratch), weight quantization
Yu_XIAOMI_task1_1 Yu2022 21 1.456 46.2 embeddings 6306 16081000 CNN keras/tensorflow pretraining, training, weight quantization
Zaragoza-Paredes_UPV_task1_1 Zaragoza_Paredes2022 44 2.709 43.8 28320 28570080 CNN keras/tensorflow training, weight quantization
Zaragoza-Paredes_UPV_task1_2 Zaragoza_Paredes2022 46 2.904 41.9 28320 28570080 CNN keras/tensorflow training, weight quantization
Zhang_THUEE_task1_1 Shao2022 40 2.096 54.9 127160 28228320 Mini-SegNet 3 keras/tensorflow training, pruning, quantization aware training, weight quantization, knowledge distillation
Zhang_THUEE_task1_2 Shao2022 48 3.068 54.4 126078 28098645 Mini-SegNet 2 keras/tensorflow training, pruning, quantization aware training, weight quantization, knowledge distillation
Zou_PKU_task1_1 Xin2022 20 1.442 56.3 embeddings 75562 28823618 CNN pytorch training, weight quantization

Technical reports

Ai4edgept Submission to DCASE 2022 Low Complexity Acoustic Scene Classification Task1

Ricardo Anastácio1, Luís Ferreira2, Figueiredo Mónica1,3 and Conde Bento Luís1,4
1electronic engineering, Politécnico de Leiria, Leiria, Portugal, 2University of Coimbra, Coimbra, Portugal, 3Instituto de Telecomunicações, Portugal, 4Institute of Systems and Robotics, Coimbra, Portugal

Abstract

This report details the submission to task1 of DCASE2022 competition. The task aims to classify acoustic scenes using devices with low computational power and memory. We propose two ensemble models for scene classification. The first model clusters classes into 2 groups, each of a two-network ensemble being responsible for intra-group discrimination, i.e. discriminating between the classes that are most related in the confusion matrix. The second model implements a canonical one-versus-all ten-network ensemble architecture followed by knowledge distillation, i.e. the ensemble model is used as the teacher network. The student is an optimised version of the DCASE2022 baseline architecture. In both models we resort to three different data pre-processing techniques: audio downsample; mel-spectrogram tuning; and data augmentation. We’ve used the DCASE2022 baseline for all networks - two-network ensemble, ten-network ensemble and student network - on which we have conducted an architecture’s hyperparameter search to identify the best performing architecture, while being compliant with DCASE2022 performance metrics. Results revealed that data pre-processing and knowledge distillation techniques improve overall performance. Nevertheless, a simple two-network ensemble without knowledge distillation, maintains the MACS and parameters size low, while achieving similar results.

System characteristics
Sampling rate 8kHz
Data augmentation pitch shifting, time stretching, mixup, time masking, frequency masking
Features log-mel energies
Classifier CNN, ensemble
PDF

Low-Complexity Model Based on Depthwise Separable CNN for Acoustic Scene Classification

Yiqiang Cai1, He Tang1, Chenyang Zhu2, Shengchen Li1 and Xi Shao3
1School of Advanced Technology, Xi'an Jiaotong-Liverpool University, Suzhou, China, 2School of Artificial Intelligence and Computing Sciences, Jiangnan University, Wuxi, China, 3College of Tellecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China

Abstract

The task1 of DCASE 2022 put forward higher requirements for system complexity and the new datasets also brought greater challenges. We tried to reproduce several models in previous years, but did not get a good performance. Therefore, we introduced the depthwise separable CNN method to the baseline architecture, which successfully reduces the complexity and improves the accuracy. We also used three methods of data augmentation, mixup, pitch shifting and stretching to further improve the results.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup, pitch shifting, spectrum correction
Features log-mel energies
Classifier CNN
PDF

Low-Complexity Acoustic Scene Classification Using Broadcasted ResNet and Data Augmentation

Wenchang Cao, Yanxiong Li, Qisheng Huang and Mingle Liu
School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China

Abstract

Acoustic scene classification (ASC) is a task to classify each input audio recording into one class of pre-given acoustic scenes. As an important task in Detection and Classification of Acoustic Scenes and Events (DCASE), ASC has attracted a lot of attention from researchers in the community of audio and acoustic signal processing in recent years [1]-[4]. In the work of this report, we focus on the task of low-complexity ASC with multiple devices, namely, Task 1 of the DCASE2022 challenge [5]. In this task, a low-complexity model is required to classify audio recordings recorded by multiple devices (real and simulated). In the proposed ASC method, the BC-ResNet-Mod [6] is used as the backbone of our model whose training strategy is the Cross-Gradient Training (CGT) [7]. In addition, some data augmentation techniques are adopted for further improving the performance of the proposed method. The size of our model is 125.33 KB after model compression, which is lower than the size limit of 128 KB. Evaluated on the development dataset, our system obtains classification accuracy of 51.1%.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup, time stretching,pitch shifting,spectrum correction
Features log-mel energies
Classifier BC-ResNet, CNN
PDF

Acoustic Scene Classification Based on Fhr_mobilenet

Hongxia Dong1, Lin Zhang1, Xichang Cai1, Menglong Wu1, Ziling Qiao1, Yanggang Gan2 and Juan Wu2
1Electronic and Communication Engineering, North China University of Technology, Beijing, China, 2Electronic and Communication Engineering, North China University Of Technology, Beijing, China

Abstract

This technical report describes our submission for Task1 of DCASE2022 challenge. We calculated 128 log-mel energies under the original sampling rate of 44.1KHz for each time slice by taking 2048 FFT points with 50% overlap. Additionally, deltas and deltadeltas were calculated from the log Mel spectrogram and stacked into the channel axis. The resulting spectrograms were of size 128 frequency bins, 43 time samples and 3 channels with each representing log-mel spectrograms, its delta features and its delta-delta features respectively. Then, the three channel feature map is fed into the mobilenet-based frequency high-resolution network. Finally, after 1 × 1 convolution and global average pooling, the classification results are obtained through softmax output. The classification accuracy of our proposed model is 53.9% with a loss value of 1.378. The number of parameters of the model is 70.608K, where each parameter is represented using int8 and the MACs are 28.461M.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup, SpecAugment
Features log-mel energies,delta and delta-delta
Classifier FHR_Mobilenet
Decision making average
PDF

Low-Complexity for DCASE 2022 Task 1A Challenge

YuanBo Hou
Telecommunications Engineering, xidian university, Xi'an, China

Abstract

This technical report describes the systems for task1/subtask A of the DCASE 2022 challenge. In order to reduce the number of model parameters and improve accuracy. In this work, I use a simple neural networks with causal convolution and bottleneck structure. The log-mel spectrograms are extracted to train the acoustic scene classification model. The mix-up and Specaugmentation are used to augment the acoustic features. My system achieves higher classification accuracies and lower log loss in the development dataset than baseline system.

System characteristics
Sampling rate 44.1kHz
Data augmentation SpecAugment, mixup
Features log-mel energies
Classifier CNN
PDF

Kt Submission for the DCASE 2022 Challenge: Modernized Convolutional Neural Networks for Acoustic Scene Classification

TaeSoo Kim, GaHui Lee and JaeHan Park
AI2XL, KT Corporation, Seoul, South Korea

Abstract

In this technical reports, we present our team’s submission for DCASE 2022 TASK1 which is the low complexity Acoustic Scene Classification (ASC). We gradually modernized a neural network architecture design starting from the baseline model and discover several key components that contribute to the performance. To meet constraints of the model complexity, the number of parameters and the number of MACs are considered while applying each designs. As a result, our model achieves 1.2593 log-loss and 54.03% accuracy on the development set, while having less than 114k of total parameters (including the zero-valued) and 30 million MACs.

System characteristics
Sampling rate 22.05kHz
Data augmentation SpecAugment
Features log-mel energies
Classifier CNN
PDF

Hyu Submission for the DCASE 2022: Efficient Fine-Tuning Method Using Device-Aware Data-Random-Drop for Device-Imbalanced Acoustic Scene Classification

Joo-Hyun Lee, Jeong-Hwan Choi, Pil Moo Byun and Joon-Hyuk Chang
Electronic Engineering, Hanyang University, Seoul, Republic of Korea

Abstract

This paper address the Hanyang University team submission for the DCASE 2022 Challenge Low-Complexity Acoustic Scene Classification task. The task aims to design a generalized audio scene classification system for various devices under low complexity and short input time conditions. We followed two strategies to achieve our goal: improving the model structure for short segmented audio and adopting transfer learning methods that are generalizable to unknown devices. Based on the BC-ResNet, which showed the best performance in DCASE 2021 challenge, we incorporated the method proposed in the field of short-duration speaker verification to secure high accuracy. In addition, we proposed a novel finetuning method using device-aware data-random-drop to get a generalized model across multiple devices. Most of the training dataset is data recorded with a specific device. We devised a fine tuning method that gradually excludes data recorded with a specific device from mini-batch during training, and this method improves generalization performance. Following the official protocol of cross validation setup from the TAU Urban Acoustic Scenes 2022 Mobile development dataset, we achieve 70.1% accuracy and 0.835 multi class cross-entropy loss, respectively.

System characteristics
Sampling rate 16kHz
Data augmentation mixup, SpecAugment, time masking, frequency masking, temporal shuffle
Features log-mel energies
Classifier CNN, BC-Res2Net
Decision making categorical cross entropy
PDF

Low-Complexity Acoustic Scene Classification Based on Residual Net

Jiangnan Liang, Cheng Zeng, Chuang Shi, Le Zhang, Yisen Zhou, Yuehong Li, Yanyu Zhou and Tianqi Tan
University of Electronic Science and Technology of China, Chengdu, China

Abstract

This technical report describes the submitted systems for task 1 of the DCASE 2022 challenge. The log-mel energies, delta features and delta-delta features were extracted to train the model. We adopted a total of eight data augmentation methods. BC-ResNet and MobileNetV2 were used as training model. We used knowledge distillation and quantization to compress the model. Our systems achieved lower log loss and higher accuracy in the development dataset than the baseline system.

System characteristics
Sampling rate 44.1kHz
Data augmentation time masking, frequency masking, time warping, mixup; time masking, frequency masking, time warping, frequency warping, mixup; noise addition, pitch shifting, speed changing, time masking, mixup
Features log-mel energies
Classifier BC-ResNet; MobileNetV2
PDF

Receptive Field Regularized CNNs with Traditional Audio Augmentations

Tobias Morocutti and Diaaeldin Shalaby
Johannes Kepler University, Linz, Austria

Abstract

This technical report describes our system for Task 1 (Low-Complexity Acoustic Scene Classification) of the DCASE2022 Challenge. Due to the limited allowed complexity of the model to submit, we use a teacher-student approach. The teacher is a Receptive Field (RF) regularized CNN model and the student is a simpler 5-layer CNN with batch normalization, dropout and maxpool layers. In addition, some data augmentation techniques, such as adding gaussian noise, shifting, pitch shifting and time stretching are adopted for expanding the diversity of the dataset. Our system achieves an accuracy of 53.4% and a multiclass cross-entropy (log loss) of 1.279 on the development dataset. The student model has 21,930 parameters and a Multiply accumulate count of 9.775 million.

System characteristics
Sampling rate 22.05kHz
Data augmentation mixup, pitch shifting, time stretching, shifting, adding gaussian noise
Features mel-spectrogram
Classifier ensemble, CNN
Decision making average
PDF

Submission to DCASE 2022 Task 1: Depthwise Separable Convolutions for Low-Complexity Acoustic Scene Classification

Chukwuebuka Olisaemeka and Lakshmi Babu Saheer
Computing Sciences, Anglia Ruskin University, Cambridge, United Kingdom

Abstract

This technical report describes the details of the TASK1 submission to the DCASE2022 challenge. The aim of this task is to design an acoustic scene classification system that targets devices with low memory and computational allowance. The task also aims to build systems that can generalize across multiple devices. To achieve this objective, a model using Depthwise Separable Convolutional layers is proposed, which reduces the number of parameters and computations required compared to the normal convolutional layers. This work further proposes the use of dilated kernels, which increase the receptive field of the convolutional layer without increasing the number of parameters to be learned. Finally, quantization is applied to reduce the model complexity. The proposed system achieves an average test accuracy of 39% and log loss of 1.878 on TAU Urban Acoustic Scenes 2022 Mobile, development dataset with a parameter count of 96.473k and 3.284 MMACs.

System characteristics
Sampling rate 44.1kHz
Features log-mel energies
Classifier CNN
PDF

Low-Complexity Deep Learning Frameworks for Acoustic Scene Classification

Lam Pham1, Ngo Dat2, Anahid Naghibzadeh-Jalali1 and Alexander Schindler1
1Center for Digital Safety & Security, Austrian Institute of Technology, Vienna, Austria, 2School of computer science and electronic engineering, Essex University, UK

Abstract

In this report, we presents low-complexity deep learning frameworks for acoustic scene classification (ASC). The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities. In particular, we initially transform audio recordings into Mel, Gammatone, and CQT spectrograms. Next, data augmentation methods of Random Cropping, Specaugment, and Mixup are then applied on spectrograms. Augmented spectrograms are then fed into deep learning based classifiers. Finally, probabilities which obtained from three individual classifiers, which are trained with three type of spectrograms independently, are fused to achieve the best performance. Our experiments, which are conducted on DCASE 2022 Task 1 Development dataset, achieve low-complexity frameworks and the best classification accuracy of 60.1%, improving DCASE baseline by 17.2%.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup, random cropping, SpecAugment
Features CQT, Gammatonegram, Mel
Classifier CNN
Decision making late fusion of predicted probabilities
PDF

CP-JKU Submission to Dcase22: Distilling Knowledge for Low-Complexity Convolutional Neural Networks From a Patchout Audio Transformer

Florian Schmid1,2, Shahed Masoudian2, Khaled Koutini2 and Gerhard Widmer1,2
1Computational Perception (CP), Johannes Kepler University (JKU) Linz, Linz, Austria, 2LIT Artificial Intelligence Lab, Johannes Kepler University (JKU) Linz, Linz, Austria

Abstract

In this technical report, we describe the CP-JKU team’s submission for Task 1 Low-Complexity Acoustic Scene Classification of the DCASE 22 challenge [1]. We use Knowledge Distillation to teach low-complexity CNN student models from Patchout Spectrogram Transformer (PaSST) models. We use the pre-trained PaSST models on Audioset and fine-tune them on the TAU Urban Acoustic Scenes 2022 Mobile development dataset. We experiment with using an ensemble of teachers, different receptive fields of the student models, and mixing frequency-wise statistics of spectrograms to enhance generalization to unseen devices. Finally, the student models are quantized in order to perform inference computations using 8 bit integers, simulating the low-complexity constraints of edge devices.

Awards: Judges’ award

System characteristics
Sampling rate 32.0kHz
Data augmentation mixup, mixstyle, pitch shifting; mixstyle, pitch shifting
Features log-mel energies
Classifier RF-regularized CNNs, PaSST transformer
PDF

Structured Filter Pruning and Feature Selection for Low Complexity Acoustic Scene Classification

Lorenz Schmidt, Beran Kiliç and Nils Peters
International Audio Laboratories, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany

Abstract

The DCASE challenge track 1 provides a dataset for Acoustic Scene Classification (ASC), a popular problem in machine learning. This years challenge shortens the provided audio clips to 1 sec, adds a Multiply-Accumulate operations (MAC) constrain and additionally counts all parameters of the model. We tackle the problem by using three approaches: First we use a linear model with global moments of the spectrogram, getting into reach of the baseline; then we use feature selection to reduce generalization gap and MACs; and finally, structured filter pruning to bring the number of parameters below the parameter constraint. Using the evaluation split of the development dataset, our result shows an increase to 49.1% overall accuracy compared to the baseline system with 42.9% accuracy.

System characteristics
Sampling rate 16kHz
Data augmentation mixup, rolling, SpecAugment
Features log-mel energies
Classifier CNN, SVM
PDF

Mini-Segnet for Low-Complexity Acoustic Scene Classification

Yun-Fei Shao1, Xuan Zhang2, Ge-Ge Bing1, Ke-Meng Zhao1, Jun-Jie Xu2, Yong Ma2 and Wei-Qiang Zhang1
1Department of Electronic Engineering, Tsinghua University, Beijing, China, 2School of Lingustic Sciences and Arts, Jiangsu Normal University, Xuzhou, China

Abstract

This report details the architecture we used to address task 1 of the DCASE2022 challenge. The goal of the task is to design an audio scene classification system for device-imbalanced datasets under the constraints of model complexity. Our architecture is based on SegNet, adding an instance normalization layer to normalize the activations of the previous layer at each step. Log-mel spectrograms, delta features, and delta-delta features are extracted to train the acoustic scene classification model. A total of 6 data augmentations are applied as follows: mixup, time and frequency domain masking, image augmentation, auto level, pix2pix, and random crop. We apply three model compression schemes: pruning, quantization, and knowledge distillation to reduce model complexity. The proposed system achieves higher classification accuracies and lower log loss than the baseline system. After model compression, our model achieves an average accuracy of 54.11% within the 127.2 K parameters size, 8-bit quantization, and MMACs less than 30 M.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup, ImageDataGenerator, temporal crop, Auto levels, pix2pix
Features log-mel energies
Classifier Mini-SegNet
PDF

Low-Complexity CNNs for Acoustic Scene Classification

Arshdeep Singh, James A King, Xubo Liu, Wenwu Wang and Mark D. Plumbley
CVSSP, University of Surrey, Guildford, UK

Abstract

This technical report describes the SurreyAudioTeam22’s submission for DCASE 2022 ASC Task 1, Low-Complexity Acoustic Scene Classification (ASC). The task has two rules, (a) the ASC framework should have maximum 128K parameters, and (b) there should be a maximum of 30 millions multiply-accumulate operations (MACs) per inference. In this report, we present lowcomplexity systems for ASC that follow the rules intended for the task

System characteristics
Sampling rate 44.1kHz
Features log-mel energies
Classifier CNN
Decision making maximum likelihood; average
PDF

Self-Ensemble with Multi-Task Learning for Low-Complexity Acoustic Scene Classification

Reiko Sugahara, Ryo Sato, Masatoshi Osawa, Yuuki Yuno and Chiho Haruta
RION CO., LTD., Tokyo, Japan

Abstract

This technical report describes a procedure for Task 1 in Detection and Classification of Acoustic Scenes and Events (DCASE) 2022. The proposed method adopts MobileNet-based models with log-mel energies and deltas as inputs. The accuracy was improved by self-ensemble with multi-task learning. Data augmentations, e.g., mixup, SpecAugment, and spectrum modulation, were applied to prevent overfitting. To meet system complexity requirements, we adopted depth-separable convolution and quantization aware training. The model contains 120,505 parameters and requires 26.607 million multiply-and-accumulate operations. Consequently, the proposed system achieved a 56.5% accuracy and a log-loss of 1.179 based on the development data.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup, SpecAugment, spectrum modulation
Features log-mel energies, deltas
Classifier MobileNet
Decision making weighted average
PDF

Low-Complexity Acoustic Scene Classification with Mismatch-Devices Using Separable Convolutions and Coordinate Attention

Yifei Xin1, Yuexian Zou1, Fan Cui2 and Yujun Wang2
1Peking University, Shenzhen, China, 2Xiaomi Corporation, Beijing, China

Abstract

This report details the architecture we used to address Task 1 of the of DCASE2022 challenge. Our architecture is based on 4 layer convolutional neural network taking as input a log-mel spectrogram. The complexity of this network is controlled by using separable convolutions in the channel, time and frequency dimensions. Moreover, we introduce a novel attention mechanism by embedding positional information into channel attention, which we call coordinate attention to improve the accuracy of a CNN-based framework. Besides, we use SpecAugment++, time shifting and test time augmentations to further improve the performance of the system.

System characteristics
Sampling rate 44.1kHz
Data augmentation SpecAugment++, time shifting
Features spectrogram
Embeddings CNN6
Classifier CNN
PDF

Acoustic Scene Classification Based on Feature Fusion and Dilated-Convolution

Junfei Yu, Runyu Shi, Tianrui He and Kaibin Guo
Mobile Phone, Xiaomi, Beijing, China

Abstract

This technical report describes our submission for Task 1 of the DCASE Challenge 2022. The goal of task 1 is to classify the recorded audios for acoustic scene classification using an int8 quantized model that does not exceed 128KB in size. In our submission, a variety of timefrequency features are extracted and fused to be the input of the deep learning network. As the backbone of the network, the dilated-convolution is applied for embedding of various input features. Furthermore, we make use of multiple time-frequency data augmentation on the original data to increase the diversity of the data. After the network training is completed, the variable type of the weight data is converted into INT8. This INT8 model achieves a log loss of 1.305 and an accuracy of 51.7% on the standard test set of the TAU Urban Acoustic Scenes 2022 Mobile development dataset.

System characteristics
Sampling rate 44.1kHz
Features log-mel energies, spectral entropy, spectral flatness
Embeddings dilated-CNN
Classifier CNN
PDF

DCASE 2022: Comparative Analysis of CNNs for Acoustic Scene Classification Under Low-Complexity Considerations

Josep Zaragoza Paredes1, Javier Naranjo Alcázar2, Valery Naranjo Ornedo1 and Pedro Zuccarello2
1ETSIT, Universitat Politècnica de València, Valencia, Spain, 2R+D, Instituto Tecnológico de Informática, Valencia, Spain

Abstract

Acoustic scene classification is an automatic listening problem that aims to assign an audio recording to a pre-defined scene based on its audio data. Over the years (and in past editions of the DCASE) this problem has often been solved with techniques known as ensembles (use of several machine learning models to combine their predictions in the inference phase). While these solutions can show performance in terms of accuracy, they can be very expensive in terms of computational capacity, making it impossible to deploy them in IoT devices. Due to the drift in this field of study, this task has two limitations in terms of model complexity. It should be noted that there is also the added complexity of mismatching devices (the audios provided are recorded by different sources of information). This technical report makes a comparative study of two different network architectures: conventional CNN and Convmixer. Although both networks exceed the baseline required by the competition, the conventional CNN shows a higher performance, exceeding the baseline by 8 percentage points. Solutions based on Conv-mixer architectures show worse performance although they are much lighter solutions.

System characteristics
Sampling rate 44.1kHz
Features log-mel energies
Classifier CNN
PDF