Low-Complexity Acoustic Scene Classification with Multiple Devices


Challenge results

Task description

This subtask is concerned with the basic problem of acoustic scene classification, in which it is required to classify a test audio recording into one of ten known acoustic scene classes. This task targets generalization across a number of different devices, and will use audio data recorded and simulated with a variety of devices. The task also targets low complexity solutions for the classification problem in terms of model size.

The development dataset consists of recordings from 10 European cities using 9 different devices: 3 real devices (A, B, C) and 6 simulated devices (S1-S6). Data from devices B, C, and S1-S6 consists of randomly selected segments from the simultaneous recordings, therefore all overlap with the data from device A, but not necessarily with each other. The total amount of audio in the development set is 64 hours.

The evaluation dataset contains data from 12 cities, 10 acoustic scenes, 11 devices. There are five new devices (not available in the development set): real device D and simulated devices S7-S11. Evaluation data contains 22 hours of audio.

The device A consists in a Soundman OKM II Klassik/studio A3, electret binaural microphone and a Zoom F8 audio recorder using 48kHz sampling rate and 24-bit resolution. The other devices are commonly available customer devices: device B is a Samsung Galaxy S7, device C is iPhone SE, and device D is a GoPro Hero5 Session.

More detailed task description can be found in the task description page

Systems ranking

Submission information Evaluation dataset Development dataset
Rank Submission label Name Technical
Report
Official
system rank
Logloss Accuracy
with 95% confidence interval
Logloss Accuracy
Byttebier_IDLab_task1a_1 qat_8b Byttebier2021 21 0.936 68.6 (67.6 - 69.6) 0.820 71.2
Byttebier_IDLab_task1a_2 8b_calibm Byttebier2021 18 0.914 67.5 (66.5 - 68.6) 0.820 71.2
Byttebier_IDLab_task1a_3 8b_calibo Byttebier2021 23 0.944 68.5 (67.5 - 69.6) 0.820 71.2
Byttebier_IDLab_task1a_4 16b_prune Byttebier2021 17 0.905 68.8 (67.8 - 69.8) 0.840 70.2
Cao_SCUT_task1a_1 sys_1 Cao2021 49 1.136 66.7 (65.7 - 67.7) 1.038 71.6
Cao_SCUT_task1a_2 sys_2 Cao2021 56 1.200 64.6 (63.5 - 65.6) 1.108 69.6
Cao_SCUT_task1a_3 sys_3 Cao2021 50 1.137 67.2 (66.1 - 68.2) 1.058 71.7
Cao_SCUT_task1a_4 sys_4 Cao2021 53 1.147 66.1 (65.1 - 67.1) 1.047 72.4
Ding_TJU_task1a_1 Ding_TJU Ding2021 85 1.544 53.0 (51.9 - 54.1) 1.360 55.5
Ding_TJU_task1a_2 Ding_TJU Ding2021 70 1.326 51.1 (50.0 - 52.2) 1.263
Ding_TJU_task1a_3 Ding_TJU Ding2021 61 1.226 49.1 (48.0 - 50.2) 1.193 55.0
Ding_TJU_task1a_4 Ding_TJU Ding2021 67 1.296 51.4 (50.3 - 52.5) 1.268 50.0
Fan_NWPU_task1a_1 res-att Cui2021 64 1.261 68.3 (67.3 - 69.3) 0.870 69.7
Galindo-Meza_ITESO_task1a_1 e2e_CNN_INT8 Galindo-Meza2021 97 2.221 53.9 (52.8 - 55.0) 1.904 56.5
Heo_Clova_task1a_1 Clova_AMFM Hee-Soo2021 42 1.087 67.0 (66.0 - 68.0) 69.7
Heo_Clova_task1a_2 Clova_Res Hee-Soo2021 20 0.930 66.9 (65.9 - 67.9) 70.5
Heo_Clova_task1a_3 Clova_AMFM_W Hee-Soo2021 34 1.045 70.0 (69.0 - 71.0)
Heo_Clova_task1a_4 Clova_Res_W Hee-Soo2021 12 0.871 70.1 (69.1 - 71.1)
Horváth_HIT_task1a_1 R_MNv2_fl Horvth2021 86 1.597 51.4 (50.3 - 52.5) 1.258 55.3
Horváth_HIT_task1a_2 R_MNv2_af Horvth2021 92 2.031 53.3 (52.2 - 54.4) 2.021 54.3
Horváth_HIT_task1a_3 CPRes_fl Horvth2021 76 1.460 51.6 (50.5 - 52.7) 1.248 54.5
Horváth_HIT_task1a_4 CPRes_af Horvth2021 95 2.065 49.2 (48.1 - 50.3) 2.030 54.7
Jeng_CHT+NSYSU_task1a_1 SparseFCNN Jeng2021 78 1.469 55.0 (53.9 - 56.1) 1.464 54.6
Jeng_CHT+NSYSU_task1a_2 DiverseSpa Jeng2021 84 1.543 51.3 (50.2 - 52.4) 1.593 51.2
Jeng_CHT+NSYSU_task1a_3 SparseMNet Jeng2021 79 1.470 56.3 (55.2 - 57.4) 1.428 58.2
Jeong_ETRI_task1a_1 JYH_ETRI_1 Jeong2021 33 1.041 66.0 (64.9 - 67.0) 1.006 65.9
Jeong_ETRI_task1a_2 JYH_ETRI_2 Jeong2021 25 0.952 67.0 (65.9 - 68.0) 1.015 64.9
Jeong_ETRI_task1a_3 JYH_ETRI_3 Jeong2021 30 1.023 66.7 (65.7 - 67.7) 1.014 64.6
Jeong_ETRI_task1a_4 JYH_ETRI_4 Jeong2021 63 1.228 66.1 (65.1 - 67.2) 0.968 65.8
Kek_NU_task1a_1 DSSMNet1 Kek2021 72 1.355 66.8 (65.7 - 67.8) 1.410 63.0
Kek_NU_task1a_2 DSSMNet2 Kek2021 57 1.207 63.5 (62.4 - 64.6) 1.242 62.3
Kim_3M_task1a_1 CNN_pr1 Kim2021 38 1.076 61.5 (60.4 - 62.6) 1.010 63.4
Kim_3M_task1a_2 CNN_pr2 Kim2021 39 1.077 61.6 (60.5 - 62.6) 1.008 63.5
Kim_3M_task1a_3 CNN_pr3 Kim2021 37 1.076 62.0 (61.0 - 63.1) 1.009 63.3
Kim_3M_task1a_4 CNN_pr4 Kim2021 40 1.078 61.3 (60.2 - 62.3) 1.009 63.5
Kim_KNU_task1a_1 KNU-CP1 Kim2021a 46 1.115 64.7 (63.6 - 65.7) 1.068 65.0
Kim_KNU_task1a_2 KNU-CP2 Kim2021a 28 1.010 63.8 (62.8 - 64.9) 1.040 62.0
Kim_KNU_task1a_3 KNU-CP3 Kim2021a 55 1.188 61.3 (60.3 - 62.4) 1.043 65.5
Kim_KNU_task1a_4 KNU-CP4 Kim2021a 52 1.143 62.9 (61.8 - 64.0) 1.035 65.3
Kim_QTI_task1a_1 ResNorm_QTI1 Kim2021b 8 0.793 75.0 (74.0 - 76.0) 0.722 77.0
Kim_QTI_task1a_2 ResNorm_QTI2 Kim2021b 1 0.724 76.1 (75.1 - 77.0) 0.716 75.9
Kim_QTI_task1a_3 ResNorm_QTI3 Kim2021b 2 0.735 76.1 (75.2 - 77.1) 0.723 77.5
Kim_QTI_task1a_4 ResNorm_QTI4 Kim2021b 5 0.764 75.2 (74.3 - 76.2) 0.776 75.1
Koutini_CPJKU_task1a_1 DampedR7NB Koutini2021 14 0.883 70.9 (69.9 - 71.9) 0.916 68.6
Koutini_CPJKU_task1a_2 DampedR8 Koutini2021 10 0.842 71.8 (70.8 - 72.8) 0.944 66.9
Koutini_CPJKU_task1a_3 DampedR8NB Koutini2021 9 0.834 72.1 (71.1 - 73.1) 0.890 69.5
Koutini_CPJKU_task1a_4 DampedR8DA Koutini2021 11 0.847 71.8 (70.9 - 72.8) 0.880 69.5
Lim_CAU_task1a_1 CAUET-TEFF1-C45-Q Lim2021 90 1.956 67.5 (66.5 - 68.5) 1.673 65.5
Lim_CAU_task1a_2 CAUET-TEFF1-P45-Q Lim2021 91 2.010 67.9 (66.9 - 69.0) 1.801 65.7
Lim_CAU_task1a_3 CAUET-TEFF2-C70-Q Lim2021 80 1.479 68.5 (67.5 - 69.5) 1.625 65.2
Lim_CAU_task1a_4 CAUET-TEFF3-Q Lim2021 93 2.039 65.8 (64.7 - 66.8) 1.906 63.1
Liu_UESTC_task1a_1 FR_agm Liu2021 16 0.900 68.8 (67.8 - 69.8) 0.909 68.2
Liu_UESTC_task1a_2 onebit_agm Liu2021 15 0.895 68.2 (67.2 - 69.2) 0.923 68.0
Liu_UESTC_task1a_3 onebit_noagm Liu2021 13 0.878 69.6 (68.6 - 70.6) 0.990 65.0
Liu_UESTC_task1a_4 weight_qz Liu2021 87 1.626 42.0 (40.9 - 43.1) 1.434 45.4
Madhu_CET_task1a_1 DWTMSCNN Madhu2021 99 3.950 9.7 (9.0 - 10.3) 0.628 85.1
DCASE2021 baseline Baseline 1.730 45.6 (44.5 - 46.7) 1.461 46.9
Naranjo-Alcazar_ITI_task1a_1 ASC_ResSE Naranjo-Alcazar2021_t1a 51 1.140 60.2 (59.2 - 61.3) 64.2
Pham_AIT_task1a_1 Pham_AIT Pham2021 73 1.368 67.5 (66.4 - 68.5) 66.7
Pham_AIT_task1a_2 Pham_AIT Pham2021 54 1.187 68.4 (67.4 - 69.4) 66.7
Pham_AIT_task1a_3 Pham_AIT Pham2021 94 2.058 69.6 (68.6 - 70.6) 66.7
Phan_UIUC_task1a_1 ResNet Phan2021 65 1.272 63.3 (62.3 - 64.4) 1.259 64.1
Phan_UIUC_task1a_2 ResNet_t3 Phan2021 71 1.335 63.3 (62.3 - 64.4) 1.313 64.1
Phan_UIUC_task1a_3 ResNet_t2 Phan2021 60 1.223 65.3 (64.3 - 66.4) 1.259 64.1
Phan_UIUC_task1a_4 ResNet_t3 Phan2021 66 1.292 65.3 (64.3 - 66.4) 1.313 64.1
Puy_VAI_task1a_1 ce_tta Puy2021 24 0.952 66.6 (65.6 - 67.6) 0.898 66.8
Puy_VAI_task1a_2 ce_mu_tta Puy2021 27 0.974 65.4 (64.4 - 66.5) 0.927 66.2
Puy_VAI_task1a_3 fl_tta Puy2021 22 0.939 66.2 (65.1 - 67.2) 0.877 68.7
Qiao_NCUT_task1a_1 Qiao_NCUT Qiao2021 88 1.630 52.2 (51.1 - 53.3) 1.001 51.7
Seo_SGU_task1a_1 Penult Seo2021 32 1.030 70.3 (69.3 - 71.3) 1.040 69.0
Seo_SGU_task1a_2 Stride21 Seo2021 41 1.080 71.4 (70.4 - 72.4) 1.089 72.6
Seo_SGU_task1a_3 Stride22 Seo2021 35 1.065 71.3 (70.3 - 72.3) 1.092 72.1
Seo_SGU_task1a_4 Stride12 Seo2021 44 1.087 71.8 (70.8 - 72.8) 1.106 72.6
Singh_IITMandi_task1a_1 Singh_29KB Singh2021 77 1.464 47.2 (46.1 - 48.3) 1.383 47.7
Singh_IITMandi_task1a_2 Singh_53KB Singh2021 83 1.515 44.7 (43.6 - 45.8) 1.394 48.5
Singh_IITMandi_task1a_3 Singh_74KB Singh2021 82 1.509 46.1 (45.0 - 47.2) 1.395 49.0
Singh_IITMandi_task1a_4 Singh_71KB Singh2021 81 1.488 46.8 (45.7 - 47.9) 1.413 48.6
Sugahara_RION_task1a_1 RION1 Sugahara2021 43 1.087 63.8 (62.8 - 64.9) 0.958 70.1
Sugahara_RION_task1a_2 RION2 Sugahara2021 36 1.070 65.2 (64.2 - 66.3) 0.975 69.7
Sugahara_RION_task1a_3 RION3 Sugahara2021 31 1.024 65.3 (64.3 - 66.4) 0.937 66.8
Sugahara_RION_task1a_4 RION4 Sugahara2021 68 1.297 64.7 (63.7 - 65.8) 1.062 68.8
Verbitskiy_DS_task1a_1 ASC_MB32 Verbitskiy2021 48 1.127 61.4 (60.3 - 62.4) 1.042 64.4
Verbitskiy_DS_task1a_2 ASC_MB64 Verbitskiy2021 29 1.019 64.5 (63.4 - 65.5) 0.932 68.8
Verbitskiy_DS_task1a_3 ASC_MB128 Verbitskiy2021 26 0.966 67.3 (66.3 - 68.4) 0.859 70.9
Verbitskiy_DS_task1a_4 ASC_MB160 Verbitskiy2021 19 0.924 68.1 (67.1 - 69.1) 0.848 70.5
Yang_GT_task1a_1 Yang_GT_lth_a Yang2021 6 0.768 73.1 (72.1 - 74.0) 0.640 79.4
Yang_GT_task1a_2 Yang_GT_lth_b Yang2021 4 0.764 72.9 (71.9 - 73.9)
Yang_GT_task1a_3 Yang_GT_lth_c Yang2021 3 0.758 72.9 (71.9 - 73.8)
Yang_GT_task1a_4 Yang_GT_lth_d Yang2021 7 0.774 72.8 (71.8 - 73.8)
Yihao_speakin_task1a_1 Yihao_ratio07 Yihao2021 69 1.311 51.9 (50.8 - 53.0) 0.893 69.4
Yihao_speakin_task1a_2 Yihao_ratio065 Yihao2021 59 1.222 55.2 (54.1 - 56.3) 0.727 76.1
Yihao_speakin_task1a_3 Yihao_seresnet Yihao2021 96 2.105 53.5 (52.4 - 54.6) 1.990 82.8
Zhang_BUPT&BYTEDANCE_task1a_1 Zhang_resnet_1 Zhang2021 47 1.124 63.0 (62.0 - 64.1) 78.2
Zhang_BUPT&BYTEDANCE_task1a_2 Zhang_resnet_2 Zhang2021 45 1.113 63.2 (62.2 - 64.3) 76.4
Zhang_BUPT&BYTEDANCE_task1a_3 Zhang_resnet_cbam Zhang2021 98 3.359 52.2 (51.1 - 53.3) 65.2
Zhang_BUPT&BYTEDANCE_task1a_4 Zhang_resnet_senet Zhang2021 89 1.946 59.0 (57.9 - 60.1) 70.6
Zhao_Maxvision_task1a_1 maxvision1 Zhao2021 75 1.440 61.2 (60.2 - 62.3) 1.494 57.6
Zhao_Maxvision_task1a_2 maxvision2 Zhao2021 74 1.412 63.5 (62.4 - 64.6) 1.482 59.6
Zhao_Maxvision_task1a_3 maxvision3 Zhao2021 62 1.227 63.5 (62.5 - 64.6) 1.258 59.9
Zhao_Maxvision_task1a_4 maxvision4 Zhao2021 58 1.215 62.8 (61.8 - 63.9) 1.485 57.8

Teams ranking

Table including only the best performing system per submitting team.

Submission information Evaluation dataset Development dataset
Rank Submission label Name Technical
Report
Official
system rank
Team rank Logloss Accuracy
with 95% confidence interval
Logloss Accuracy
Byttebier_IDLab_task1a_4 16b_prune Byttebier2021 17 6 0.905 68.8 (67.8 - 69.8) 0.840 70.2
Cao_SCUT_task1a_1 sys_1 Cao2021 49 15 1.136 66.7 (65.7 - 67.7) 1.038 71.6
Ding_TJU_task1a_3 Ding_TJU Ding2021 61 22 1.226 49.1 (48.0 - 50.2) 1.193 55.0
Fan_NWPU_task1a_1 res-att Cui2021 64 23 1.261 68.3 (67.3 - 69.3) 0.870 69.7
Galindo-Meza_ITESO_task1a_1 e2e_CNN_INT8 Galindo-Meza2021 97 29 2.221 53.9 (52.8 - 55.0) 1.904 56.5
Heo_Clova_task1a_4 Clova_Res_W Hee-Soo2021 12 4 0.871 70.1 (69.1 - 71.1)
Horváth_HIT_task1a_3 CPRes_fl Horvth2021 76 24 1.460 51.6 (50.5 - 52.7) 1.248 54.5
Jeng_CHT+NSYSU_task1a_1 SparseFCNN Jeng2021 78 26 1.469 55.0 (53.9 - 56.1) 1.464 54.6
Jeong_ETRI_task1a_2 JYH_ETRI_2 Jeong2021 25 9 0.952 67.0 (65.9 - 68.0) 1.015 64.9
Kek_NU_task1a_2 DSSMNet2 Kek2021 57 18 1.207 63.5 (62.4 - 64.6) 1.242 62.3
Kim_3M_task1a_3 CNN_pr3 Kim2021 37 13 1.076 62.0 (61.0 - 63.1) 1.009 63.3
Kim_KNU_task1a_2 KNU-CP2 Kim2021a 28 10 1.010 63.8 (62.8 - 64.9) 1.040 62.0
Kim_QTI_task1a_2 ResNorm_QTI2 Kim2021b 1 1 0.724 76.1 (75.1 - 77.0) 0.716 75.9
Koutini_CPJKU_task1a_3 DampedR8NB Koutini2021 9 3 0.834 72.1 (71.1 - 73.1) 0.890 69.5
Lim_CAU_task1a_3 CAUET-TEFF2-C70-Q Lim2021 80 27 1.479 68.5 (67.5 - 69.5) 1.625 65.2
Liu_UESTC_task1a_3 onebit_noagm Liu2021 13 5 0.878 69.6 (68.6 - 70.6) 0.990 65.0
Madhu_CET_task1a_1 DWTMSCNN Madhu2021 99 30 3.950 9.7 (9.0 - 10.3) 0.628 85.1
DCASE2021 baseline Baseline 1.730 45.6 (44.5 - 46.7) 1.461 46.9
Naranjo-Alcazar_ITI_task1a_1 ASC_ResSE Naranjo-Alcazar2021_t1a 51 16 1.140 60.2 (59.2 - 61.3) 64.2
Pham_AIT_task1a_2 Pham_AIT Pham2021 54 17 1.187 68.4 (67.4 - 69.4) 66.7
Phan_UIUC_task1a_3 ResNet_t2 Phan2021 60 21 1.223 65.3 (64.3 - 66.4) 1.259 64.1
Puy_VAI_task1a_3 fl_tta Puy2021 22 8 0.939 66.2 (65.1 - 67.2) 0.877 68.7
Qiao_NCUT_task1a_1 Qiao_NCUT Qiao2021 88 28 1.630 52.2 (51.1 - 53.3) 1.001 51.7
Seo_SGU_task1a_1 Penult Seo2021 32 12 1.030 70.3 (69.3 - 71.3) 1.040 69.0
Singh_IITMandi_task1a_1 Singh_29KB Singh2021 77 25 1.464 47.2 (46.1 - 48.3) 1.383 47.7
Sugahara_RION_task1a_3 RION3 Sugahara2021 31 11 1.024 65.3 (64.3 - 66.4) 0.937 66.8
Verbitskiy_DS_task1a_4 ASC_MB160 Verbitskiy2021 19 7 0.924 68.1 (67.1 - 69.1) 0.848 70.5
Yang_GT_task1a_3 Yang_GT_lth_c Yang2021 3 2 0.758 72.9 (71.9 - 73.8)
Yihao_speakin_task1a_2 Yihao_ratio065 Yihao2021 59 20 1.222 55.2 (54.1 - 56.3) 0.727 76.1
Zhang_BUPT&BYTEDANCE_task1a_2 Zhang_resnet_2 Zhang2021 45 14 1.113 63.2 (62.2 - 64.3) 76.4
Zhao_Maxvision_task1a_4 maxvision4 Zhao2021 58 19 1.215 62.8 (61.8 - 63.9) 1.485 57.8

System complexity

Submission information Evaluation dataset Acoustic model System
Rank Submission label Technical
Report
Official
system
rank
Logloss Accuracy Parameters Non-zero
parameters
Sparsity Size
(KB) *
Complexity
management
Byttebier_IDLab_task1a_1 Byttebier2021 21 0.936 68.6 114634 113976 0.0057400073276688834 127.6 weight quantization, grouped convolutions, Conv+BN fusion
Byttebier_IDLab_task1a_2 Byttebier2021 18 0.914 67.5 114634 113976 0.0057400073276688834 127.6 weight quantization, grouped convolutions, Conv+BN fusion
Byttebier_IDLab_task1a_3 Byttebier2021 23 0.944 68.5 114634 113976 0.0057400073276688834 127.6 weight quantization, grouped convolutions, Conv+BN fusion
Byttebier_IDLab_task1a_4 Byttebier2021 17 0.905 68.8 82910 62390 0.24749728621396694 121.9 weight quantization, grouped convolutions, pruning
Cao_SCUT_task1a_1 Cao2021 49 1.136 66.7 36658 34970 0.04604724753123468 71.6 weight quantization
Cao_SCUT_task1a_2 Cao2021 56 1.200 64.6 36658 34970 0.04604724753123468 71.6 weight quantization
Cao_SCUT_task1a_3 Cao2021 50 1.137 67.2 36658 34970 0.04604724753123468 71.6 weight quantization
Cao_SCUT_task1a_4 Cao2021 53 1.147 66.1 51926 50238 0.03250779956091365 102.9 weight quantization
Ding_TJU_task1a_1 Ding2021 85 1.544 53.0 40230 40230 0.0 78.6 weight quantization
Ding_TJU_task1a_2 Ding2021 70 1.326 51.1 20250 20250 0.0 39.5 weight quantization
Ding_TJU_task1a_3 Ding2021 61 1.226 49.1 63816 63816 0.0 124.6 weight quantization
Ding_TJU_task1a_4 Ding2021 67 1.296 51.4 20250 20250 0.0 39.5 weight quantization
Fan_NWPU_task1a_1 Cui2021 64 1.261 68.3 93323 93323 0.0 93.3 weight quantization
Galindo-Meza_ITESO_task1a_1 Galindo-Meza2021 97 2.221 53.9 127637 127637 0.0 124.6 pruning, int8 weight quantization
Heo_Clova_task1a_1 Hee-Soo2021 42 1.087 67.0 65424 65424 0.0 127.7 weight quantization
Heo_Clova_task1a_2 Hee-Soo2021 20 0.930 66.9 63547 63547 0.0 124.1 weight quantization
Heo_Clova_task1a_3 Hee-Soo2021 34 1.045 70.0 65424 65424 0.0 127.7 weight quantization
Heo_Clova_task1a_4 Hee-Soo2021 12 0.871 70.1 63547 63547 0.0 124.1 weight quantization
Horváth_HIT_task1a_1 Horvth2021 86 1.597 51.4 47939 47939 0.0 93.6 weight quantization
Horváth_HIT_task1a_2 Horvth2021 92 2.031 53.3 47939 47939 0.0 93.6 weight quantization
Horváth_HIT_task1a_3 Horvth2021 76 1.460 51.6 58266 58266 0.0 113.8 weight quantization
Horváth_HIT_task1a_4 Horvth2021 95 2.065 49.2 58266 58266 0.0 113.8 weight quantization
Jeng_CHT+NSYSU_task1a_1 Jeng2021 78 1.469 55.0 130457242 129320 0.9990087173543037 126.3 sparsity, weight quantization
Jeng_CHT+NSYSU_task1a_2 Jeng2021 84 1.543 51.3 130457242 127906 0.9990195561546518 124.9 sparsity, weight quantization
Jeng_CHT+NSYSU_task1a_3 Jeng2021 79 1.470 56.3 17186944 130999 0.9923779934350168 127.9 sparsity, weight quantization
Jeong_ETRI_task1a_1 Jeong2021 33 1.041 66.0 54845 54845 0.0 113.9 weight quantization, depthwise separable convolutions
Jeong_ETRI_task1a_2 Jeong2021 25 0.952 67.0 54845 54845 0.0 113.9 weight quantization, depthwise separable convolutions
Jeong_ETRI_task1a_3 Jeong2021 30 1.023 66.7 60236 60236 0.0 124.4 weight quantization, depthwise separable convolutions
Jeong_ETRI_task1a_4 Jeong2021 63 1.228 66.1 60236 60236 0.0 124.4 weight quantization, depthwise separable convolutions
Kek_NU_task1a_1 Kek2021 72 1.355 66.8 63448 59472 0.0626654898499559 123.9 weight quantization
Kek_NU_task1a_2 Kek2021 57 1.207 63.5 64850 60842 0.06180416345412487 126.6 weight quantization
Kim_3M_task1a_1 Kim2021 38 1.076 61.5 168778 116398 0.31034850513692547 113.7 weight quantization, pruning
Kim_3M_task1a_2 Kim2021 39 1.077 61.6 168778 113428 0.32794558532510165 110.8 weight quantization, pruning
Kim_3M_task1a_3 Kim2021 37 1.076 62.0 168778 120841 0.2840239841685528 118.0 weight quantization, pruning
Kim_3M_task1a_4 Kim2021 40 1.078 61.3 168778 116439 0.31010558248112907 113.7 weight quantization, pruning
Kim_KNU_task1a_1 Kim2021a 46 1.115 64.7 58472 58374 0.0016760158708442052 125.6 CP-decomposition, weight quantization
Kim_KNU_task1a_2 Kim2021a 28 1.010 63.8 64064 64064 0.0 125.1 parameter sharing, weight quantization
Kim_KNU_task1a_3 Kim2021a 55 1.188 61.3 58472 58411 0.0010432343685866652 125.7 CP-decomposition, weight quantization
Kim_KNU_task1a_4 Kim2021a 52 1.143 62.9 58472 58411 0.0010432343685866652 125.7 CP-decomposition, weight quantization
Kim_QTI_task1a_1 Kim2021b 8 0.793 75.0 630042 95472 0.8484672450408068 121.9 weight quantization, pruning, knowledge distillation
Kim_QTI_task1a_2 Kim2021b 1 0.724 76.1 630042 95472 0.8484672450408068 121.9 weight quantization, pruning, knowledge distillation
Kim_QTI_task1a_3 Kim2021b 2 0.735 76.1 630042 95472 0.8484672450408068 121.9 weight quantization, pruning, knowledge distillation
Kim_QTI_task1a_4 Kim2021b 5 0.764 75.2 314990 62721 0.800879392996603 122.5 weight quantization, pruning, knowledge distillation
Koutini_CPJKU_task1a_1 Koutini2021 14 0.883 70.9 504104 64690 0.8716733055083872 126.3 float16, sparsity
Koutini_CPJKU_task1a_2 Koutini2021 10 0.842 71.8 678184 64928 0.9042619702027768 126.8 float16, sparsity
Koutini_CPJKU_task1a_3 Koutini2021 9 0.834 72.1 635176 64625 0.8982565462171115 126.2 float16, sparsity
Koutini_CPJKU_task1a_4 Koutini2021 11 0.847 71.8 641320 63529 0.9009402482380091 124.1 float16, sparsity
Lim_CAU_task1a_1 Lim2021 90 1.956 67.5 89910 56499 0.3716049382716049 125.2 weight quantization, sparsity
Lim_CAU_task1a_2 Lim2021 91 2.010 67.9 89910 56499 0.3716049382716049 125.2 weight quantization, sparsity
Lim_CAU_task1a_3 Lim2021 80 1.479 68.5 134748 54504 0.5955116216938285 125.4 weight quantization, sparsity
Lim_CAU_task1a_4 Lim2021 93 2.039 65.8 56046 56046 0.0 118.8 weight quantization, sparsity
Liu_UESTC_task1a_1 Liu2021 16 0.900 68.8 643194 643194 0.0 106.7 1-bit quantization,FR_unit
Liu_UESTC_task1a_2 Liu2021 15 0.895 68.2 268362 268368 2.235785990567507e-05 42.5 1-bit quantization
Liu_UESTC_task1a_3 Liu2021 13 0.878 69.6 268362 268368 2.235785990567507e-05 42.5 1-bit quantization
Liu_UESTC_task1a_4 Liu2021 87 1.626 42.0 60928 60928 0.0 119.0 weight quantization
Madhu_CET_task1a_1 Madhu2021 99 3.950 9.7 42774 42774 0.0 89.5 weight quantization
DCASE2021 baseline 1.730 45.6 46246 46246 0.0 90.3 weight quantization
Naranjo-Alcazar_ITI_task1a_1 Naranjo-Alcazar2021_t1a 51 1.140 60.2 50130 50130 0.0 96.0 weight quantization, tflite, float16
Pham_AIT_task1a_1 Pham2021 73 1.368 67.5 10909 10909 0.0 128.0 channel restriction and decomposed convolution
Pham_AIT_task1a_2 Pham2021 54 1.187 68.4 10909 10909 0.0 128.0 channel restriction and decomposed convolution
Pham_AIT_task1a_3 Pham2021 94 2.058 69.6 10909 10909 0.0 128.0 channel restriction and decomposed convolution
Phan_UIUC_task1a_1 Phan2021 65 1.272 63.3 41356 36364 0.12070799883934613 75.2 weight quantization, depthwise separable convolutions
Phan_UIUC_task1a_2 Phan2021 71 1.335 63.3 41356 36364 0.12070799883934613 75.2 weight quantization, depthwise separable convolutions
Phan_UIUC_task1a_3 Phan2021 60 1.223 65.3 41356 36364 0.12070799883934613 75.2 weight quantization, depthwise separable convolutions
Phan_UIUC_task1a_4 Phan2021 66 1.292 65.3 41356 36364 0.12070799883934613 75.2 weight quantization, depthwise separable convolutions
Puy_VAI_task1a_1 Puy2021 24 0.952 66.6 62474 62474 0.0 122.0 weight quantization
Puy_VAI_task1a_2 Puy2021 27 0.974 65.4 62474 62474 0.0 122.0 weight quantization
Puy_VAI_task1a_3 Puy2021 22 0.939 66.2 62474 62474 0.0 122.0 weight quantization
Qiao_NCUT_task1a_1 Qiao2021 88 1.630 52.2 31852 31852 0.0 124.4 weight quantization
Seo_SGU_task1a_1 Seo2021 32 1.030 70.3 101173 101173 0.0 125.0 weight quantization
Seo_SGU_task1a_2 Seo2021 41 1.080 71.4 99557 99557 0.0 126.5 weight quantization
Seo_SGU_task1a_3 Seo2021 35 1.065 71.3 99614 99614 0.0 126.6 weight quantization
Seo_SGU_task1a_4 Seo2021 44 1.087 71.8 99603 99603 0.0 126.5 weight quantization
Singh_IITMandi_task1a_1 Singh2021 77 1.464 47.2 14754 14754 0.0 28.8 Filter pruning and quantization
Singh_IITMandi_task1a_2 Singh2021 83 1.515 44.7 27166 27166 0.0 53.1 Filter pruning and quantization
Singh_IITMandi_task1a_3 Singh2021 82 1.509 46.1 38110 38110 0.0 74.4 Filter pruning and quantization
Singh_IITMandi_task1a_4 Singh2021 81 1.488 46.8 36578 36578 0.0 71.4 Filter pruning and quantization
Sugahara_RION_task1a_1 Sugahara2021 43 1.087 63.8 339730 86577 0.7451593912813117 94.7 weight quantization, pruning
Sugahara_RION_task1a_2 Sugahara2021 36 1.070 65.2 339730 86577 0.7451593912813117 94.7 weight quantization, pruning
Sugahara_RION_task1a_3 Sugahara2021 31 1.024 65.3 203838 102606 0.496629676507815 108.3 weight quantization, pruning
Sugahara_RION_task1a_4 Sugahara2021 68 1.297 64.7 255940 109804 0.570977572868641 114.6 weight quantization, pruning
Verbitskiy_DS_task1a_1 Verbitskiy2021 48 1.127 61.4 62090 62090 0.0 121.3 weight quantization
Verbitskiy_DS_task1a_2 Verbitskiy2021 29 1.019 64.5 62154 62154 0.0 121.4 weight quantization
Verbitskiy_DS_task1a_3 Verbitskiy2021 26 0.966 67.3 62282 62282 0.0 121.6 weight quantization
Verbitskiy_DS_task1a_4 Verbitskiy2021 19 0.924 68.1 62346 62346 0.0 121.8 weight quantization
Yang_GT_task1a_1 Yang2021 6 0.768 73.1 4410180 30500 0.9930841825050225 122.0 weight quantization, LTH pruning, teacher-student learning
Yang_GT_task1a_2 Yang2021 4 0.764 72.9 14640720 111000 0.9924184056521811 111.0 weight quantization, LTH pruning, teacher-student learning
Yang_GT_task1a_3 Yang2021 3 0.758 72.9 7056288 45750 0.9935164210984586 125.0 weight quantization, LTH pruning, teacher-student learning
Yang_GT_task1a_4 Yang2021 7 0.774 72.8 7056288 45750 0.9935164210984586 125.0 weight quantization, LTH pruning, teacher-student learning
Yihao_speakin_task1a_1 Yihao2021 69 1.311 51.9 48075 48075 0.0 93.8 sparsity
Yihao_speakin_task1a_2 Yihao2021 59 1.222 55.2 63244 63244 0.0 123.5 sparsity
Yihao_speakin_task1a_3 Yihao2021 96 2.105 53.5 50952 50952 0.0 99.5 sparsity
Zhang_BUPT&BYTEDANCE_task1a_1 Zhang2021 47 1.124 63.0 83572 49738 0.4048485138563155 48.6 weight quantization
Zhang_BUPT&BYTEDANCE_task1a_2 Zhang2021 45 1.113 63.2 83572 49738 0.4048485138563155 48.6 weight quantization
Zhang_BUPT&BYTEDANCE_task1a_3 Zhang2021 98 3.359 52.2 87011 53177 0.3888473871119743 51.9 weight quantization
Zhang_BUPT&BYTEDANCE_task1a_4 Zhang2021 89 1.946 59.0 86516 85706 0.009362430070738337 83.7 weight quantization
Zhao_Maxvision_task1a_1 Zhao2021 75 1.440 61.2 59421 59376 0.0007573080224163586 116.1 weight quantization
Zhao_Maxvision_task1a_2 Zhao2021 74 1.412 63.5 59421 59376 0.0007573080224163586 116.1 weight quantization
Zhao_Maxvision_task1a_3 Zhao2021 62 1.227 63.5 59421 59376 0.0007573080224163586 116.1 weight quantization
Zhao_Maxvision_task1a_4 Zhao2021 58 1.215 62.8 59421 59376 0.0007573080224163586 116.1 weight quantization


*) Model size is calculated accordingly to the task specific rules, and will differ from a real model storage size. See model size calculation examples here.

Generalization performance

All results with evaluation dataset.

Submission information Overall Devices Cities
Evaluation dataset Unseen Seen Unseen Seen
Rank Submission label Technical
Report
Official
system
rank
Logloss Accuracy Logloss Accuracy Logloss Accuracy Logloss Accuracy Logloss Accuracy
Byttebier_IDLab_task1a_1 Byttebier2021 21 0.936 68.6 1.065 64.5 0.829 72.0 0.972 67.5 0.926 68.6
Byttebier_IDLab_task1a_2 Byttebier2021 18 0.914 67.5 1.048 63.6 0.801 70.8 1.012 65.4 0.892 68.0
Byttebier_IDLab_task1a_3 Byttebier2021 23 0.944 68.5 1.094 64.7 0.820 71.7 1.007 67.3 0.931 68.7
Byttebier_IDLab_task1a_4 Byttebier2021 17 0.905 68.8 1.002 65.5 0.824 71.5 0.914 69.2 0.903 68.8
Cao_SCUT_task1a_1 Cao2021 49 1.136 66.7 1.214 62.5 1.071 70.2 1.190 63.2 1.126 67.3
Cao_SCUT_task1a_2 Cao2021 56 1.200 64.6 1.318 59.0 1.102 69.2 1.249 60.6 1.188 65.4
Cao_SCUT_task1a_3 Cao2021 50 1.137 67.2 1.223 63.3 1.066 70.4 1.196 63.4 1.123 68.1
Cao_SCUT_task1a_4 Cao2021 53 1.147 66.1 1.250 60.8 1.061 70.5 1.206 61.3 1.135 67.2
Ding_TJU_task1a_1 Ding2021 85 1.544 53.0 1.878 46.8 1.265 58.2 1.547 49.0 1.530 54.1
Ding_TJU_task1a_2 Ding2021 70 1.326 51.1 1.488 45.9 1.191 55.4 1.362 48.7 1.310 51.5
Ding_TJU_task1a_3 Ding2021 61 1.226 49.1 1.356 43.9 1.118 53.4 1.274 48.9 1.209 49.2
Ding_TJU_task1a_4 Ding2021 67 1.296 51.4 1.426 46.6 1.188 55.4 1.305 50.7 1.293 51.5
Fan_NWPU_task1a_1 Cui2021 64 1.261 68.3 1.458 65.6 1.098 70.6 1.628 65.6 1.187 69.0
Galindo-Meza_ITESO_task1a_1 Galindo-Meza2021 97 2.221 53.9 2.488 50.8 1.999 56.5 2.165 54.3 2.191 54.6
Heo_Clova_task1a_1 Hee-Soo2021 42 1.087 67.0 1.180 62.6 1.009 70.7 1.099 67.2 1.082 67.1
Heo_Clova_task1a_2 Hee-Soo2021 20 0.930 66.9 0.993 64.1 0.878 69.2 0.982 66.1 0.911 67.4
Heo_Clova_task1a_3 Hee-Soo2021 34 1.045 70.0 1.110 67.1 0.991 72.5 1.059 68.3 1.039 71.1
Heo_Clova_task1a_4 Hee-Soo2021 12 0.871 70.1 0.929 68.1 0.823 71.8 0.864 71.8 0.868 70.3
Horváth_HIT_task1a_1 Horvth2021 86 1.597 51.4 2.039 44.1 1.228 57.5 1.561 50.3 1.570 51.5
Horváth_HIT_task1a_2 Horvth2021 92 2.031 53.3 2.072 47.1 1.997 58.6 2.040 51.9 2.030 53.8
Horváth_HIT_task1a_3 Horvth2021 76 1.460 51.6 1.780 44.6 1.193 57.5 1.463 49.7 1.461 51.8
Horváth_HIT_task1a_4 Horvth2021 95 2.065 49.2 2.111 40.4 2.027 56.5 2.063 49.3 2.065 49.9
Jeng_CHT+NSYSU_task1a_1 Jeng2021 78 1.469 55.0 1.557 50.9 1.396 58.4 1.479 52.2 1.473 55.1
Jeng_CHT+NSYSU_task1a_2 Jeng2021 84 1.543 51.3 1.619 47.3 1.480 54.6 1.562 47.5 1.542 51.9
Jeng_CHT+NSYSU_task1a_3 Jeng2021 79 1.470 56.3 1.613 50.9 1.351 60.8 1.516 53.9 1.464 56.8
Jeong_ETRI_task1a_1 Jeong2021 33 1.041 66.0 1.219 60.6 0.893 70.5 1.045 64.6 1.045 66.2
Jeong_ETRI_task1a_2 Jeong2021 25 0.952 67.0 1.094 62.6 0.834 70.6 0.986 64.2 0.940 67.3
Jeong_ETRI_task1a_3 Jeong2021 30 1.023 66.7 1.187 61.4 0.886 71.1 0.971 66.9 1.056 66.7
Jeong_ETRI_task1a_4 Jeong2021 63 1.228 66.1 1.724 59.6 0.816 71.6 1.131 65.8 1.254 66.7
Kek_NU_task1a_1 Kek2021 72 1.355 66.8 1.461 61.3 1.266 71.3 1.358 66.3 1.354 66.6
Kek_NU_task1a_2 Kek2021 57 1.207 63.5 1.416 56.6 1.034 69.3 1.230 62.4 1.201 63.6
Kim_3M_task1a_1 Kim2021 38 1.076 61.5 1.185 57.7 0.986 64.6 1.062 60.7 1.079 62.1
Kim_3M_task1a_2 Kim2021 39 1.077 61.6 1.185 58.1 0.987 64.5 1.067 61.3 1.080 61.9
Kim_3M_task1a_3 Kim2021 37 1.076 62.0 1.183 58.6 0.986 64.9 1.060 60.7 1.079 62.3
Kim_3M_task1a_4 Kim2021 40 1.078 61.3 1.190 57.6 0.986 64.3 1.068 60.4 1.081 61.6
Kim_KNU_task1a_1 Kim2021a 46 1.115 64.7 1.317 59.4 0.946 69.1 1.074 67.4 1.125 64.3
Kim_KNU_task1a_2 Kim2021a 28 1.010 63.8 1.215 57.2 0.839 69.4 0.991 62.6 1.003 64.1
Kim_KNU_task1a_3 Kim2021a 55 1.188 61.3 1.371 56.2 1.036 65.6 1.188 59.9 1.187 61.6
Kim_KNU_task1a_4 Kim2021a 52 1.143 62.9 1.315 57.7 1.000 67.3 1.141 63.2 1.143 62.7
Kim_QTI_task1a_1 Kim2021b 8 0.793 75.0 0.851 73.6 0.744 76.2 0.745 74.7 0.791 75.3
Kim_QTI_task1a_2 Kim2021b 1 0.724 76.1 0.766 74.5 0.689 77.4 0.657 76.2 0.727 76.2
Kim_QTI_task1a_3 Kim2021b 2 0.735 76.1 0.792 75.2 0.687 76.9 0.647 78.0 0.746 75.9
Kim_QTI_task1a_4 Kim2021b 5 0.764 75.2 0.832 73.3 0.708 76.8 0.713 74.6 0.771 75.3
Koutini_CPJKU_task1a_1 Koutini2021 14 0.883 70.9 1.051 66.4 0.743 74.6 0.776 74.1 0.898 70.1
Koutini_CPJKU_task1a_2 Koutini2021 10 0.842 71.8 0.976 68.2 0.730 74.8 0.805 71.3 0.848 71.8
Koutini_CPJKU_task1a_3 Koutini2021 9 0.834 72.1 0.947 69.6 0.740 74.2 0.742 73.6 0.844 72.0
Koutini_CPJKU_task1a_4 Koutini2021 11 0.847 71.8 0.970 69.3 0.744 74.0 0.737 74.2 0.864 71.5
Lim_CAU_task1a_1 Lim2021 90 1.956 67.5 2.767 62.2 1.280 71.9 1.910 65.0 1.913 68.2
Lim_CAU_task1a_2 Lim2021 91 2.010 67.9 2.892 62.3 1.275 72.6 1.945 65.5 1.996 68.5
Lim_CAU_task1a_3 Lim2021 80 1.479 68.5 1.892 64.1 1.134 72.2 1.374 66.5 1.500 69.1
Lim_CAU_task1a_4 Lim2021 93 2.039 65.8 2.998 60.1 1.240 70.5 1.996 64.3 2.025 65.7
Liu_UESTC_task1a_1 Liu2021 16 0.900 68.8 0.974 66.1 0.838 71.1 0.884 70.9 0.904 68.5
Liu_UESTC_task1a_2 Liu2021 15 0.895 68.2 0.955 66.4 0.844 69.7 0.859 69.8 0.902 67.8
Liu_UESTC_task1a_3 Liu2021 13 0.878 69.6 0.966 66.8 0.804 71.9 0.866 70.8 0.880 69.5
Liu_UESTC_task1a_4 Liu2021 87 1.626 42.0 1.756 38.3 1.519 45.0 1.622 42.0 1.632 41.9
Madhu_CET_task1a_1 Madhu2021 99 3.950 9.7 3.952 9.2 3.948 10.1 4.011 10.1 3.924 10.0
DCASE2021 baseline 1.730 45.6 2.222 38.0 1.320 51.9 1.802 43.6 1.702 45.5
Naranjo-Alcazar_ITI_task1a_1 Naranjo-Alcazar2021_t1a 51 1.140 60.2 1.348 53.4 0.967 65.9 1.091 61.0 1.139 60.6
Pham_AIT_task1a_1 Pham2021 73 1.368 67.5 1.653 64.3 1.130 70.1 1.302 67.1 1.341 67.9
Pham_AIT_task1a_2 Pham2021 54 1.187 68.4 1.398 64.8 1.011 71.3 1.069 71.5 1.180 68.5
Pham_AIT_task1a_3 Pham2021 94 2.058 69.6 2.497 66.1 1.693 72.6 1.843 71.7 2.034 69.8
Phan_UIUC_task1a_1 Phan2021 65 1.272 63.3 1.369 59.2 1.191 66.7 1.250 62.8 1.271 63.6
Phan_UIUC_task1a_2 Phan2021 71 1.335 63.3 1.419 59.2 1.265 66.7 1.316 62.8 1.334 63.6
Phan_UIUC_task1a_3 Phan2021 60 1.223 65.3 1.294 62.8 1.164 67.5 1.190 65.0 1.220 65.7
Phan_UIUC_task1a_4 Phan2021 66 1.292 65.3 1.351 62.8 1.242 67.5 1.265 65.0 1.289 65.7
Puy_VAI_task1a_1 Puy2021 24 0.952 66.6 1.159 59.7 0.779 72.4 0.948 66.1 0.947 66.8
Puy_VAI_task1a_2 Puy2021 27 0.974 65.4 1.152 59.4 0.825 70.5 0.999 64.0 0.971 65.8
Puy_VAI_task1a_3 Puy2021 22 0.939 66.2 1.116 60.1 0.791 71.2 0.932 65.2 0.934 66.1
Qiao_NCUT_task1a_1 Qiao2021 88 1.630 52.2 1.651 50.7 1.612 53.5 1.598 53.9 1.636 52.1
Seo_SGU_task1a_1 Seo2021 32 1.030 70.3 1.107 67.4 0.965 72.8 1.087 67.9 1.018 70.7
Seo_SGU_task1a_2 Seo2021 41 1.080 71.4 1.164 67.7 1.010 74.4 1.108 71.6 1.073 71.2
Seo_SGU_task1a_3 Seo2021 35 1.065 71.3 1.149 67.6 0.995 74.4 1.086 72.1 1.057 71.5
Seo_SGU_task1a_4 Seo2021 44 1.087 71.8 1.175 67.6 1.014 75.3 1.094 71.8 1.083 71.9
Singh_IITMandi_task1a_1 Singh2021 77 1.464 47.2 1.687 41.5 1.277 51.9 1.444 45.8 1.470 47.1
Singh_IITMandi_task1a_2 Singh2021 83 1.515 44.7 1.730 40.0 1.337 48.5 1.531 41.9 1.506 44.5
Singh_IITMandi_task1a_3 Singh2021 82 1.509 46.1 1.761 40.9 1.299 50.4 1.490 45.4 1.517 46.1
Singh_IITMandi_task1a_4 Singh2021 81 1.488 46.8 1.738 41.3 1.279 51.5 1.485 45.0 1.501 46.7
Sugahara_RION_task1a_1 Sugahara2021 43 1.087 63.8 1.247 57.8 0.953 68.8 1.110 65.4 1.078 63.9
Sugahara_RION_task1a_2 Sugahara2021 36 1.070 65.2 1.231 58.2 0.936 71.0 1.091 66.7 1.061 65.3
Sugahara_RION_task1a_3 Sugahara2021 31 1.024 65.3 1.159 60.8 0.912 69.1 1.022 66.2 1.021 65.5
Sugahara_RION_task1a_4 Sugahara2021 68 1.297 64.7 1.610 57.9 1.036 70.4 1.228 65.5 1.294 64.8
Verbitskiy_DS_task1a_1 Verbitskiy2021 48 1.127 61.4 1.305 55.5 0.978 66.2 1.204 57.9 1.107 62.2
Verbitskiy_DS_task1a_2 Verbitskiy2021 29 1.019 64.5 1.144 60.4 0.915 67.8 1.112 60.6 0.998 65.2
Verbitskiy_DS_task1a_3 Verbitskiy2021 26 0.966 67.3 1.102 63.1 0.852 70.8 1.059 64.6 0.946 67.8
Verbitskiy_DS_task1a_4 Verbitskiy2021 19 0.924 68.1 1.040 64.2 0.827 71.4 1.009 63.5 0.905 69.2
Yang_GT_task1a_1 Yang2021 6 0.768 73.1 0.846 70.8 0.703 74.9 0.825 73.5 0.753 72.5
Yang_GT_task1a_2 Yang2021 4 0.764 72.9 0.840 70.0 0.700 75.4 0.806 73.3 0.754 72.5
Yang_GT_task1a_3 Yang2021 3 0.758 72.9 0.832 70.1 0.696 75.1 0.805 73.2 0.748 72.5
Yang_GT_task1a_4 Yang2021 7 0.774 72.8 0.850 70.2 0.710 74.9 0.819 73.3 0.762 72.3
Yihao_speakin_task1a_1 Yihao2021 69 1.311 51.9 1.376 49.7 1.257 53.6 1.293 49.9 1.305 52.4
Yihao_speakin_task1a_2 Yihao2021 59 1.222 55.2 1.284 53.5 1.171 56.6 1.233 54.3 1.214 55.7
Yihao_speakin_task1a_3 Yihao2021 96 2.105 53.5 2.114 50.7 2.097 55.8 2.100 52.5 2.106 53.3
Zhang_BUPT&BYTEDANCE_task1a_1 Zhang2021 47 1.124 63.0 1.243 58.9 1.024 66.4 1.161 59.5 1.112 63.3
Zhang_BUPT&BYTEDANCE_task1a_2 Zhang2021 45 1.113 63.2 1.242 57.4 1.006 68.1 1.102 60.4 1.100 64.0
Zhang_BUPT&BYTEDANCE_task1a_3 Zhang2021 98 3.359 52.2 3.840 47.3 2.958 56.3 3.654 51.2 3.265 52.8
Zhang_BUPT&BYTEDANCE_task1a_4 Zhang2021 89 1.946 59.0 2.451 53.2 1.525 63.8 1.963 57.0 1.971 59.9
Zhao_Maxvision_task1a_1 Zhao2021 75 1.440 61.2 1.598 54.2 1.308 67.1 1.475 58.0 1.429 62.4
Zhao_Maxvision_task1a_2 Zhao2021 74 1.412 63.5 1.551 55.6 1.297 70.0 1.436 62.3 1.408 63.5
Zhao_Maxvision_task1a_3 Zhao2021 62 1.227 63.5 1.430 55.9 1.057 70.0 1.339 62.4 1.196 63.7
Zhao_Maxvision_task1a_4 Zhao2021 58 1.215 62.8 1.406 56.2 1.056 68.3 1.253 61.0 1.213 63.5

Class-wise performance

Log loss

Rank Submission label Technical
Report
Official
system
rank
Logloss Airport Bus Metro Metro
station
Park Public
square
Shopping
mall
Street
pedestrian
Street
traffic
Tram
Byttebier_IDLab_task1a_1 Byttebier2021 21 0.936 1.393 0.431 0.937 0.977 0.355 1.681 0.840 1.695 0.312 0.740
Byttebier_IDLab_task1a_2 Byttebier2021 18 0.914 1.224 0.526 0.790 1.201 0.434 1.256 0.949 1.586 0.426 0.743
Byttebier_IDLab_task1a_3 Byttebier2021 23 0.944 1.287 0.473 0.850 1.191 0.348 1.480 0.891 1.687 0.393 0.844
Byttebier_IDLab_task1a_4 Byttebier2021 17 0.905 1.245 0.540 0.840 0.985 0.392 1.682 0.852 1.581 0.259 0.673
Cao_SCUT_task1a_1 Cao2021 49 1.136 1.461 1.007 1.169 1.343 0.753 1.485 1.006 1.576 0.618 0.937
Cao_SCUT_task1a_2 Cao2021 56 1.200 1.430 0.808 1.228 1.294 0.812 1.702 1.272 1.863 0.670 0.924
Cao_SCUT_task1a_3 Cao2021 50 1.137 1.456 0.919 1.180 1.191 0.829 1.583 1.041 1.603 0.735 0.839
Cao_SCUT_task1a_4 Cao2021 53 1.147 1.519 0.997 1.150 1.195 0.713 1.545 1.100 1.610 0.775 0.867
Ding_TJU_task1a_1 Ding2021 85 1.544 1.955 1.598 1.454 1.693 1.021 2.443 1.179 1.872 1.335 0.886
Ding_TJU_task1a_2 Ding2021 70 1.326 1.503 1.280 1.362 1.761 0.947 1.563 1.171 1.778 1.142 0.753
Ding_TJU_task1a_3 Ding2021 61 1.226 1.763 1.135 1.285 1.329 0.827 1.591 0.802 1.749 1.041 0.741
Ding_TJU_task1a_4 Ding2021 67 1.296 1.806 1.231 1.184 1.479 0.809 1.753 0.943 1.764 1.167 0.828
Fan_NWPU_task1a_1 Cui2021 64 1.261 1.754 0.695 1.316 1.439 0.936 1.926 1.254 2.392 0.478 0.423
Galindo-Meza_ITESO_task1a_1 Galindo-Meza2021 97 2.221 2.875 1.782 1.970 3.164 1.838 3.269 1.712 3.128 1.088 1.388
Heo_Clova_task1a_1 Hee-Soo2021 42 1.087 1.372 0.949 1.127 1.202 0.829 1.288 1.065 1.463 0.615 0.956
Heo_Clova_task1a_2 Hee-Soo2021 20 0.930 1.270 0.616 0.929 1.105 0.720 1.322 0.712 1.487 0.468 0.670
Heo_Clova_task1a_3 Hee-Soo2021 34 1.045 1.309 0.913 1.003 1.240 0.806 1.220 1.124 1.390 0.634 0.812
Heo_Clova_task1a_4 Hee-Soo2021 12 0.871 1.205 0.583 0.868 1.003 0.492 1.284 0.862 1.342 0.452 0.622
Horváth_HIT_task1a_1 Horvth2021 86 1.597 1.615 0.865 1.424 1.637 1.438 2.358 1.861 2.608 1.062 1.102
Horváth_HIT_task1a_2 Horvth2021 92 2.031 2.103 1.884 2.065 2.072 1.991 2.205 2.039 2.172 1.866 1.913
Horváth_HIT_task1a_3 Horvth2021 76 1.460 1.589 0.764 1.800 1.833 1.101 1.651 1.907 1.910 0.892 1.148
Horváth_HIT_task1a_4 Horvth2021 95 2.065 2.131 1.857 2.106 2.135 2.041 2.191 2.069 2.192 1.945 1.988
Jeng_CHT+NSYSU_task1a_1 Jeng2021 78 1.469 1.695 1.508 1.709 1.454 0.849 1.839 1.540 1.746 1.061 1.289
Jeng_CHT+NSYSU_task1a_2 Jeng2021 84 1.543 1.382 1.506 1.907 1.536 0.887 2.451 1.403 1.991 1.016 1.348
Jeng_CHT+NSYSU_task1a_3 Jeng2021 79 1.470 1.645 1.297 1.599 1.426 0.996 2.077 1.567 1.724 0.996 1.370
Jeong_ETRI_task1a_1 Jeong2021 33 1.041 1.574 0.455 1.086 1.276 0.487 1.666 1.139 1.473 0.666 0.590
Jeong_ETRI_task1a_2 Jeong2021 25 0.952 1.390 0.455 1.047 1.272 0.451 1.379 1.111 1.287 0.566 0.561
Jeong_ETRI_task1a_3 Jeong2021 30 1.023 1.457 0.548 1.132 1.344 0.359 1.095 1.357 1.581 0.622 0.733
Jeong_ETRI_task1a_4 Jeong2021 63 1.228 1.116 0.528 1.008 1.301 0.746 1.164 1.407 3.708 0.467 0.840
Kek_NU_task1a_1 Kek2021 72 1.355 1.619 1.049 1.385 1.587 0.948 1.775 1.364 1.740 0.916 1.164
Kek_NU_task1a_2 Kek2021 57 1.207 1.683 0.572 1.162 1.303 0.808 1.749 1.429 1.694 0.809 0.864
Kim_3M_task1a_1 Kim2021 38 1.076 1.241 0.851 0.958 1.540 0.705 1.488 1.000 1.385 0.800 0.796
Kim_3M_task1a_2 Kim2021 39 1.077 1.274 0.848 0.954 1.438 0.715 1.536 1.002 1.398 0.787 0.819
Kim_3M_task1a_3 Kim2021 37 1.076 1.243 0.848 0.942 1.514 0.744 1.473 1.028 1.391 0.797 0.777
Kim_3M_task1a_4 Kim2021 40 1.078 1.273 0.921 0.923 1.506 0.678 1.380 1.040 1.577 0.805 0.679
Kim_KNU_task1a_1 Kim2021a 46 1.115 1.511 0.694 1.220 1.114 0.773 1.322 1.491 1.662 0.566 0.791
Kim_KNU_task1a_2 Kim2021a 28 1.010 1.228 0.547 0.962 1.003 0.564 1.259 1.327 1.564 1.011 0.636
Kim_KNU_task1a_3 Kim2021a 55 1.188 1.537 0.924 1.486 1.311 0.914 1.483 1.097 1.844 0.600 0.685
Kim_KNU_task1a_4 Kim2021a 52 1.143 1.456 0.856 1.302 1.332 0.786 1.348 1.183 1.700 0.571 0.898
Kim_QTI_task1a_1 Kim2021b 8 0.793 1.242 0.397 0.723 0.890 0.363 1.419 0.721 1.397 0.426 0.351
Kim_QTI_task1a_2 Kim2021b 1 0.724 1.050 0.351 0.550 0.810 0.400 1.261 0.671 1.298 0.436 0.411
Kim_QTI_task1a_3 Kim2021b 2 0.735 0.976 0.398 0.557 0.876 0.378 1.356 0.722 1.310 0.381 0.393
Kim_QTI_task1a_4 Kim2021b 5 0.764 1.232 0.332 0.542 0.744 0.273 1.468 0.826 1.350 0.417 0.460
Koutini_CPJKU_task1a_1 Koutini2021 14 0.883 1.097 0.369 0.742 0.853 0.309 1.419 1.151 1.905 0.499 0.489
Koutini_CPJKU_task1a_2 Koutini2021 10 0.842 1.036 0.378 0.696 0.858 0.334 1.386 1.059 1.628 0.483 0.562
Koutini_CPJKU_task1a_3 Koutini2021 9 0.834 0.989 0.364 0.738 0.939 0.322 1.418 0.974 1.682 0.439 0.477
Koutini_CPJKU_task1a_4 Koutini2021 11 0.847 1.070 0.374 0.722 0.824 0.307 1.462 1.038 1.685 0.466 0.520
Lim_CAU_task1a_1 Lim2021 90 1.956 2.394 0.393 1.310 2.305 0.596 4.185 2.131 4.457 0.666 1.123
Lim_CAU_task1a_2 Lim2021 91 2.010 2.454 0.364 2.061 2.557 0.805 3.464 2.529 4.386 0.520 0.956
Lim_CAU_task1a_3 Lim2021 80 1.479 1.898 0.509 1.270 2.146 0.472 2.372 2.108 2.408 0.789 0.815
Lim_CAU_task1a_4 Lim2021 93 2.039 1.350 0.411 1.910 3.807 0.749 2.299 3.124 4.841 0.828 1.073
Liu_UESTC_task1a_1 Liu2021 16 0.900 1.209 0.543 0.708 1.073 0.597 1.192 1.101 1.438 0.363 0.775
Liu_UESTC_task1a_2 Liu2021 15 0.895 1.024 0.522 0.987 0.901 0.463 1.299 0.992 1.524 0.465 0.768
Liu_UESTC_task1a_3 Liu2021 13 0.878 1.498 0.600 0.867 0.918 0.468 1.138 0.907 1.496 0.409 0.475
Liu_UESTC_task1a_4 Liu2021 87 1.626 1.626 2.583 1.539 1.375 1.413 2.058 1.112 1.931 1.040 1.587
Madhu_CET_task1a_1 Madhu2021 99 3.950 4.120 3.971 4.412 3.673 4.147 3.229 4.169 3.351 4.580 3.845
DCASE2021 baseline 1.730 2.077 1.615 1.159 1.955 2.173 2.455 1.227 1.744 1.825 1.073
Naranjo-Alcazar_ITI_task1a_1 Naranjo-Alcazar2021_t1a 51 1.140 1.346 1.046 1.057 0.809 0.875 1.569 1.491 1.352 1.040 0.817
Pham_AIT_task1a_1 Pham2021 73 1.368 1.380 0.624 1.154 1.791 0.608 2.558 1.565 2.133 0.921 0.942
Pham_AIT_task1a_2 Pham2021 54 1.187 1.403 1.045 1.093 1.035 0.510 1.658 1.212 2.524 1.001 0.385
Pham_AIT_task1a_3 Pham2021 94 2.058 2.187 1.302 1.749 2.298 0.853 3.526 2.204 3.980 1.616 0.870
Phan_UIUC_task1a_1 Phan2021 65 1.272 1.429 1.095 1.198 1.457 0.902 1.693 1.201 1.756 0.853 1.136
Phan_UIUC_task1a_2 Phan2021 71 1.335 1.499 1.195 1.301 1.496 0.991 1.707 1.278 1.764 0.907 1.211
Phan_UIUC_task1a_3 Phan2021 60 1.223 1.325 0.947 1.358 1.518 0.933 1.475 1.199 1.607 0.812 1.052
Phan_UIUC_task1a_4 Phan2021 66 1.292 1.423 1.061 1.414 1.560 1.014 1.515 1.265 1.654 0.865 1.147
Puy_VAI_task1a_1 Puy2021 24 0.952 1.536 0.404 1.053 1.072 0.480 1.468 1.038 1.485 0.437 0.546
Puy_VAI_task1a_2 Puy2021 27 0.974 1.353 0.638 1.010 1.175 0.448 1.394 1.232 1.395 0.556 0.536
Puy_VAI_task1a_3 Puy2021 22 0.939 1.499 0.486 0.959 1.049 0.501 1.339 1.045 1.322 0.601 0.588
Qiao_NCUT_task1a_1 Qiao2021 88 1.630 1.665 1.313 2.005 2.381 1.075 1.782 1.616 1.844 1.150 1.468
Seo_SGU_task1a_1 Seo2021 32 1.030 1.502 0.735 1.013 1.042 0.634 1.515 0.965 1.606 0.530 0.755
Seo_SGU_task1a_2 Seo2021 41 1.080 1.478 0.849 1.064 1.143 0.710 1.438 1.096 1.580 0.630 0.814
Seo_SGU_task1a_3 Seo2021 35 1.065 1.312 0.853 1.116 1.139 0.666 1.507 1.104 1.558 0.572 0.821
Seo_SGU_task1a_4 Seo2021 44 1.087 1.530 0.911 1.206 1.070 0.742 1.448 1.025 1.485 0.649 0.807
Singh_IITMandi_task1a_1 Singh2021 77 1.464 1.564 1.549 1.114 1.661 1.341 2.025 1.363 1.564 1.399 1.056
Singh_IITMandi_task1a_2 Singh2021 83 1.515 1.647 1.437 1.459 1.598 1.629 1.774 1.206 1.516 1.764 1.122
Singh_IITMandi_task1a_3 Singh2021 82 1.509 1.612 1.466 1.418 1.606 1.450 2.018 1.300 1.596 1.680 0.945
Singh_IITMandi_task1a_4 Singh2021 81 1.488 1.811 1.506 1.398 1.489 1.262 2.018 1.254 1.739 1.437 0.963
Sugahara_RION_task1a_1 Sugahara2021 43 1.087 1.687 0.924 0.892 1.182 0.544 1.433 1.316 1.382 0.608 0.902
Sugahara_RION_task1a_2 Sugahara2021 36 1.070 1.636 0.841 0.940 1.166 0.500 1.472 1.287 1.393 0.596 0.873
Sugahara_RION_task1a_3 Sugahara2021 31 1.024 1.318 0.613 1.139 1.103 0.446 1.698 1.124 1.539 0.543 0.720
Sugahara_RION_task1a_4 Sugahara2021 68 1.297 1.969 0.643 1.118 1.681 0.307 2.254 1.677 1.718 0.739 0.862
Verbitskiy_DS_task1a_1 Verbitskiy2021 48 1.127 1.466 0.827 0.778 1.045 0.799 1.855 1.136 1.985 0.607 0.771
Verbitskiy_DS_task1a_2 Verbitskiy2021 29 1.019 1.255 0.784 0.763 0.886 0.615 1.698 1.038 2.023 0.496 0.635
Verbitskiy_DS_task1a_3 Verbitskiy2021 26 0.966 1.148 0.558 0.838 0.985 0.512 1.572 1.005 1.935 0.499 0.603
Verbitskiy_DS_task1a_4 Verbitskiy2021 19 0.924 1.037 0.484 0.789 0.984 0.475 1.609 0.844 1.970 0.478 0.568
Yang_GT_task1a_1 Yang2021 6 0.768 1.092 0.316 0.728 0.885 0.431 1.080 0.722 1.559 0.430 0.438
Yang_GT_task1a_2 Yang2021 4 0.764 0.935 0.277 0.752 0.907 0.414 1.064 0.780 1.626 0.392 0.491
Yang_GT_task1a_3 Yang2021 3 0.758 0.975 0.282 0.737 0.907 0.406 1.065 0.762 1.588 0.387 0.473
Yang_GT_task1a_4 Yang2021 7 0.774 1.001 0.305 0.733 0.896 0.435 1.086 0.752 1.631 0.428 0.469
Yihao_speakin_task1a_1 Yihao2021 69 1.311 1.460 1.325 1.242 1.215 1.132 2.075 0.976 1.814 0.655 1.218
Yihao_speakin_task1a_2 Yihao2021 59 1.222 1.282 1.399 1.119 1.422 1.081 1.635 1.074 1.594 0.579 1.040
Yihao_speakin_task1a_3 Yihao2021 96 2.105 2.106 2.136 2.098 2.111 2.063 2.174 2.104 2.162 2.019 2.074
Zhang_BUPT&BYTEDANCE_task1a_1 Zhang2021 47 1.124 1.296 1.315 0.976 1.177 0.798 1.431 0.998 1.639 0.704 0.902
Zhang_BUPT&BYTEDANCE_task1a_2 Zhang2021 45 1.113 1.210 1.270 0.900 1.182 0.659 1.478 1.197 1.601 0.740 0.893
Zhang_BUPT&BYTEDANCE_task1a_3 Zhang2021 98 3.359 4.982 4.172 3.030 3.876 2.235 3.320 3.571 4.635 1.760 2.007
Zhang_BUPT&BYTEDANCE_task1a_4 Zhang2021 89 1.946 2.475 2.964 1.691 2.480 1.237 1.717 1.816 2.932 1.210 0.938
Zhao_Maxvision_task1a_1 Zhao2021 75 1.440 1.560 1.115 1.549 1.602 1.040 1.724 1.513 1.974 1.043 1.281
Zhao_Maxvision_task1a_2 Zhao2021 74 1.412 1.598 1.157 1.463 1.682 1.199 1.620 1.360 1.797 1.047 1.199
Zhao_Maxvision_task1a_3 Zhao2021 62 1.227 1.443 0.804 1.121 1.520 0.985 1.466 1.196 2.055 0.835 0.840
Zhao_Maxvision_task1a_4 Zhao2021 58 1.215 1.337 0.760 1.072 1.399 1.096 1.615 0.989 2.056 0.908 0.919

Accuracy

Rank Submission label Technical
Report
Official
system
rank
Accuracy Airport Bus Metro Metro
station
Park Public
square
Shopping
mall
Street
pedestrian
Street
traffic
Tram
Byttebier_IDLab_task1a_1 Byttebier2021 21 68.6 49.0 88.1 66.8 66.8 90.0 40.7 73.7 41.5 93.6 75.9
Byttebier_IDLab_task1a_2 Byttebier2021 18 67.5 53.8 81.8 73.2 63.3 83.6 51.8 63.0 42.9 88.5 73.5
Byttebier_IDLab_task1a_3 Byttebier2021 23 68.5 57.2 83.7 72.5 63.5 88.4 50.5 67.6 41.7 89.9 70.5
Byttebier_IDLab_task1a_4 Byttebier2021 17 68.8 54.2 81.7 69.9 67.3 87.4 38.4 72.6 45.3 93.7 77.4
Cao_SCUT_task1a_1 Cao2021 49 66.7 43.7 77.0 67.2 59.3 84.1 49.9 77.3 41.9 85.0 81.6
Cao_SCUT_task1a_2 Cao2021 56 64.6 53.2 84.7 62.1 64.3 81.9 42.4 63.1 30.2 87.4 76.4
Cao_SCUT_task1a_3 Cao2021 50 67.2 46.0 80.4 67.0 66.0 83.2 44.4 72.7 41.2 84.5 86.1
Cao_SCUT_task1a_4 Cao2021 53 66.1 41.7 75.3 64.5 68.7 86.2 45.1 70.8 42.7 80.1 86.0
Ding_TJU_task1a_1 Ding2021 85 53.0 32.4 50.5 53.8 47.2 72.5 32.3 66.4 36.2 67.3 71.6
Ding_TJU_task1a_2 Ding2021 70 51.1 33.7 46.3 39.8 38.5 73.9 42.8 65.2 29.8 65.3 75.4
Ding_TJU_task1a_3 Ding2021 61 49.1 19.8 39.0 28.7 45.1 68.8 37.1 86.7 17.3 68.8 79.3
Ding_TJU_task1a_4 Ding2021 67 51.4 23.6 47.9 44.2 43.7 76.5 37.1 73.7 28.3 65.8 73.2
Fan_NWPU_task1a_1 Cui2021 64 68.3 53.2 81.9 61.6 63.9 81.3 49.1 64.4 54.5 88.0 84.8
Galindo-Meza_ITESO_task1a_1 Galindo-Meza2021 97 53.9 38.1 60.0 51.9 43.1 62.8 38.1 64.5 36.9 78.4 65.2
Heo_Clova_task1a_1 Hee-Soo2021 42 67.0 46.6 74.9 65.4 65.8 76.9 58.1 70.1 49.2 88.1 75.0
Heo_Clova_task1a_2 Hee-Soo2021 20 66.9 50.0 78.5 66.7 60.6 77.3 51.3 77.3 43.8 87.1 76.4
Heo_Clova_task1a_3 Hee-Soo2021 34 70.0 52.9 77.1 72.1 63.3 80.7 61.4 68.8 55.7 88.6 79.8
Heo_Clova_task1a_4 Hee-Soo2021 12 70.1 54.3 82.6 68.3 63.8 86.0 54.7 70.8 51.3 88.6 80.8
Horváth_HIT_task1a_1 Horvth2021 86 51.4 38.5 72.1 45.8 46.7 65.7 32.8 53.4 30.2 67.0 61.9
Horváth_HIT_task1a_2 Horvth2021 92 53.3 41.8 73.9 48.6 48.9 57.3 28.5 51.9 35.1 77.8 69.7
Horváth_HIT_task1a_3 Horvth2021 76 51.6 39.1 79.4 34.3 39.1 63.5 34.2 57.3 39.9 71.8 57.3
Horváth_HIT_task1a_4 Horvth2021 95 49.2 37.9 81.1 39.4 37.5 54.7 32.7 52.7 30.4 66.7 58.7
Jeng_CHT+NSYSU_task1a_1 Jeng2021 78 55.0 38.4 54.5 40.3 59.5 85.4 33.0 51.4 49.0 74.2 64.0
Jeng_CHT+NSYSU_task1a_2 Jeng2021 84 51.3 59.7 57.1 30.4 53.3 84.3 0.0 57.1 29.8 76.4 64.8
Jeng_CHT+NSYSU_task1a_3 Jeng2021 79 56.3 43.6 67.8 49.9 60.2 76.4 24.0 51.4 47.5 79.4 62.5
Jeong_ETRI_task1a_1 Jeong2021 33 66.0 45.6 88.8 57.8 59.7 88.3 43.2 64.0 51.3 81.8 79.3
Jeong_ETRI_task1a_2 Jeong2021 25 67.0 48.0 86.2 61.4 55.4 88.4 46.3 63.6 54.7 84.8 80.8
Jeong_ETRI_task1a_3 Jeong2021 30 66.7 47.6 85.0 60.0 59.1 88.6 60.7 61.7 51.8 80.3 72.2
Jeong_ETRI_task1a_4 Jeong2021 63 66.1 56.8 83.8 60.0 60.0 82.8 59.1 55.1 48.0 84.3 71.3
Kek_NU_task1a_1 Kek2021 72 66.8 48.1 89.8 62.6 58.1 89.9 43.4 68.2 43.7 85.9 77.9
Kek_NU_task1a_2 Kek2021 57 63.5 45.5 88.9 64.3 60.0 78.8 41.2 59.0 44.7 77.0 75.9
Kim_3M_task1a_1 Kim2021 38 61.5 53.5 71.0 66.2 48.5 77.9 46.0 61.9 44.9 76.1 69.1
Kim_3M_task1a_2 Kim2021 39 61.6 52.8 71.0 64.9 51.8 78.2 44.3 63.8 45.2 76.4 67.6
Kim_3M_task1a_3 Kim2021 37 62.0 53.8 70.7 66.8 49.9 77.3 46.8 62.6 45.6 76.3 70.7
Kim_3M_task1a_4 Kim2021 40 61.3 52.5 68.3 66.5 48.5 77.5 49.2 62.1 39.4 74.9 73.7
Kim_KNU_task1a_1 Kim2021a 46 64.7 52.8 79.2 57.4 63.4 76.8 53.7 53.4 48.5 86.7 75.1
Kim_KNU_task1a_2 Kim2021a 28 63.8 54.5 82.1 62.6 65.5 81.9 53.4 52.1 41.4 68.8 75.6
Kim_KNU_task1a_3 Kim2021a 55 61.3 48.4 68.2 46.6 55.3 75.3 48.6 65.3 38.9 85.5 81.4
Kim_KNU_task1a_4 Kim2021a 52 62.9 46.8 74.0 53.9 56.3 78.4 52.4 64.3 45.2 87.1 70.6
Kim_QTI_task1a_1 Kim2021b 8 75.0 60.0 89.3 75.9 74.4 89.5 53.3 78.3 53.8 87.4 88.3
Kim_QTI_task1a_2 Kim2021b 1 76.1 62.4 90.7 80.7 74.4 87.8 56.2 77.9 57.6 86.7 86.6
Kim_QTI_task1a_3 Kim2021b 2 76.1 67.4 89.1 80.2 74.2 88.9 53.5 74.7 57.7 88.6 86.7
Kim_QTI_task1a_4 Kim2021b 5 75.2 59.7 91.4 80.2 76.4 92.7 50.3 73.4 55.9 86.9 85.2
Koutini_CPJKU_task1a_1 Koutini2021 14 70.9 61.4 87.2 72.7 73.1 90.3 53.0 59.8 43.1 84.8 83.1
Koutini_CPJKU_task1a_2 Koutini2021 10 71.8 61.6 87.8 74.2 72.0 90.5 53.5 66.2 46.8 85.7 79.7
Koutini_CPJKU_task1a_3 Koutini2021 9 72.1 63.3 89.1 72.7 69.4 90.0 53.0 66.5 46.7 87.0 83.2
Koutini_CPJKU_task1a_4 Koutini2021 11 71.8 62.4 88.3 74.6 71.8 90.8 52.4 65.0 44.1 87.2 81.8
Lim_CAU_task1a_1 Lim2021 90 67.5 54.7 88.3 68.3 63.5 85.1 42.4 63.8 52.9 84.8 71.2
Lim_CAU_task1a_2 Lim2021 91 67.9 55.7 91.0 61.7 62.8 81.3 44.7 63.8 54.2 86.0 78.2
Lim_CAU_task1a_3 Lim2021 80 68.5 53.4 88.0 66.9 64.1 85.6 47.6 64.8 54.7 84.7 75.4
Lim_CAU_task1a_4 Lim2021 93 65.8 60.5 89.4 57.6 56.3 83.3 49.5 58.2 52.0 79.5 71.2
Liu_UESTC_task1a_1 Liu2021 16 68.8 54.2 83.7 78.3 61.2 79.5 57.7 63.1 49.0 90.3 71.2
Liu_UESTC_task1a_2 Liu2021 15 68.2 64.8 84.2 61.1 68.3 84.7 51.4 64.8 41.7 86.9 74.4
Liu_UESTC_task1a_3 Liu2021 13 69.6 43.2 80.6 65.7 69.6 85.4 59.2 72.9 45.3 89.1 85.1
Liu_UESTC_task1a_4 Liu2021 87 42.0 29.4 11.7 41.8 53.5 54.4 29.7 64.9 24.1 68.6 41.5
Madhu_CET_task1a_1 Madhu2021 99 9.7 5.7 13.8 6.6 9.6 12.4 9.5 10.7 11.5 7.8 9.3
DCASE2021 baseline 45.6 24.0 44.6 54.4 37.8 52.7 24.4 63.8 39.9 56.4 58.1
Naranjo-Alcazar_ITI_task1a_1 Naranjo-Alcazar2021_t1a 51 60.2 45.1 64.1 57.6 74.6 77.7 38.0 50.4 53.2 70.5 71.3
Pham_AIT_task1a_1 Pham2021 73 67.5 59.7 86.2 67.4 60.5 86.2 43.9 69.4 50.0 79.9 71.3
Pham_AIT_task1a_2 Pham2021 54 68.4 58.8 75.1 63.3 72.7 87.5 57.1 67.4 38.1 77.4 86.4
Pham_AIT_task1a_3 Pham2021 94 69.6 63.0 82.7 67.7 68.9 88.4 52.0 70.3 44.4 79.3 79.5
Phan_UIUC_task1a_1 Phan2021 65 63.3 48.2 75.9 59.3 57.8 85.0 42.2 70.6 39.6 85.6 69.1
Phan_UIUC_task1a_2 Phan2021 71 63.3 48.2 75.9 59.3 57.8 85.0 42.2 70.6 39.6 85.6 69.1
Phan_UIUC_task1a_3 Phan2021 60 65.3 55.7 83.5 48.5 54.0 83.5 48.6 70.2 45.7 86.6 77.1
Phan_UIUC_task1a_4 Phan2021 66 65.3 55.7 83.5 48.5 54.0 83.5 48.6 70.2 45.7 86.6 77.1
Puy_VAI_task1a_1 Puy2021 24 66.6 41.5 86.5 59.1 66.3 85.0 48.4 64.3 49.7 86.0 79.3
Puy_VAI_task1a_2 Puy2021 27 65.4 45.5 76.1 61.9 63.1 87.0 49.9 55.9 49.7 83.3 81.9
Puy_VAI_task1a_3 Puy2021 22 66.2 40.0 84.8 60.9 65.0 85.9 48.2 66.0 52.1 80.2 78.4
Qiao_NCUT_task1a_1 Qiao2021 88 52.2 41.5 67.8 37.8 14.9 87.9 40.5 60.2 36.7 81.3 53.5
Seo_SGU_task1a_1 Seo2021 32 70.3 50.9 83.5 71.7 69.7 86.0 50.5 73.4 48.2 88.4 81.2
Seo_SGU_task1a_2 Seo2021 41 71.4 52.3 83.5 74.4 71.0 85.7 54.9 71.0 50.3 88.1 82.6
Seo_SGU_task1a_3 Seo2021 35 71.3 59.8 82.7 71.2 69.9 88.1 50.1 70.2 48.9 88.8 83.5
Seo_SGU_task1a_4 Seo2021 44 71.8 47.6 83.1 67.7 75.6 86.1 53.0 74.1 55.9 87.8 87.4
Singh_IITMandi_task1a_1 Singh2021 77 47.2 25.6 38.5 57.2 43.1 62.9 31.3 62.4 37.8 60.7 52.3
Singh_IITMandi_task1a_2 Singh2021 83 44.7 25.5 48.1 40.3 42.6 53.3 30.7 66.8 38.4 49.5 51.5
Singh_IITMandi_task1a_3 Singh2021 82 46.1 29.0 44.9 44.4 40.3 61.0 24.4 63.8 38.5 50.5 64.0
Singh_IITMandi_task1a_4 Singh2021 81 46.8 21.6 43.3 42.8 45.3 62.9 26.8 68.9 33.3 60.4 63.1
Sugahara_RION_task1a_1 Sugahara2021 43 63.8 29.3 69.7 75.1 64.3 89.5 44.4 55.9 55.3 85.6 69.2
Sugahara_RION_task1a_2 Sugahara2021 36 65.2 33.2 76.9 71.2 66.5 91.5 41.8 56.3 56.8 84.8 73.0
Sugahara_RION_task1a_3 Sugahara2021 31 65.3 51.4 85.7 57.1 66.0 88.1 34.6 63.4 47.5 83.2 76.0
Sugahara_RION_task1a_4 Sugahara2021 68 64.7 39.3 80.9 66.5 52.8 91.8 46.5 54.8 58.5 80.6 75.6
Verbitskiy_DS_task1a_1 Verbitskiy2021 48 61.4 45.3 69.9 71.8 64.4 77.9 38.9 63.1 29.7 82.7 69.8
Verbitskiy_DS_task1a_2 Verbitskiy2021 29 64.5 50.4 73.9 70.3 68.8 82.7 43.3 65.9 26.8 86.0 76.6
Verbitskiy_DS_task1a_3 Verbitskiy2021 26 67.3 56.6 81.6 68.9 67.9 87.1 46.7 67.2 33.3 86.1 77.8
Verbitskiy_DS_task1a_4 Verbitskiy2021 19 68.1 61.1 82.8 70.7 67.6 88.6 46.2 72.0 27.0 86.6 78.3
Yang_GT_task1a_1 Yang2021 6 73.1 57.7 91.0 71.7 66.9 86.6 56.3 76.0 48.7 89.1 86.4
Yang_GT_task1a_2 Yang2021 4 72.9 64.6 91.9 70.8 67.3 87.0 58.2 72.0 44.7 89.9 82.7
Yang_GT_task1a_3 Yang2021 3 72.9 62.9 91.4 71.2 66.9 87.0 57.1 72.9 46.2 89.8 83.3
Yang_GT_task1a_4 Yang2021 7 72.8 61.7 90.9 71.8 67.0 85.6 56.7 74.1 47.0 89.0 84.1
Yihao_speakin_task1a_1 Yihao2021 69 51.9 41.2 54.3 53.4 57.6 63.0 25.0 67.7 28.0 79.8 48.6
Yihao_speakin_task1a_2 Yihao2021 59 55.2 49.9 49.0 59.6 51.1 61.9 38.0 64.4 36.4 81.6 60.5
Yihao_speakin_task1a_3 Yihao2021 96 53.5 48.6 48.6 56.7 53.4 67.0 36.9 49.6 33.7 77.5 62.9
Zhang_BUPT&BYTEDANCE_task1a_1 Zhang2021 47 63.0 52.5 63.0 66.2 64.3 78.3 49.1 67.0 39.3 81.1 69.4
Zhang_BUPT&BYTEDANCE_task1a_2 Zhang2021 45 63.2 55.9 63.9 70.3 60.9 81.3 49.0 59.2 40.4 81.7 69.6
Zhang_BUPT&BYTEDANCE_task1a_3 Zhang2021 98 52.2 40.7 47.5 53.7 51.9 64.8 44.2 48.6 39.8 68.8 62.5
Zhang_BUPT&BYTEDANCE_task1a_4 Zhang2021 89 59.0 44.8 56.4 57.1 60.0 68.3 50.8 60.5 46.3 75.6 70.5
Zhao_Maxvision_task1a_1 Zhao2021 75 61.2 51.8 83.0 56.7 54.3 79.3 45.5 58.3 35.4 80.2 68.1
Zhao_Maxvision_task1a_2 Zhao2021 74 63.5 47.9 78.4 63.9 50.0 74.6 52.7 67.7 42.4 81.3 76.1
Zhao_Maxvision_task1a_3 Zhao2021 62 63.5 50.0 79.4 65.4 49.5 73.6 52.3 68.4 38.6 80.4 77.8
Zhao_Maxvision_task1a_4 Zhao2021 58 62.8 48.5 82.8 67.0 60.1 72.1 42.0 68.8 40.5 77.7 68.6

Device-wise performance

Log loss

Unseen devices Seen devices
Rank Submission label Technical
Report
Official
system
rank
Log loss Accuracy /
Unseen
Accuracy /
Seen
D S7 S8 S9 S10 A B C S1 S2 S3
Byttebier_IDLab_task1a_1 Byttebier2021 21 0.936 1.065 0.829 1.762 0.861 0.845 0.870 0.984 0.713 0.949 0.821 0.873 0.789 0.827
Byttebier_IDLab_task1a_2 Byttebier2021 18 0.914 1.048 0.801 1.777 0.843 0.794 0.875 0.954 0.683 0.923 0.809 0.847 0.766 0.779
Byttebier_IDLab_task1a_3 Byttebier2021 23 0.944 1.094 0.820 1.931 0.871 0.809 0.874 0.987 0.692 0.943 0.820 0.862 0.801 0.800
Byttebier_IDLab_task1a_4 Byttebier2021 17 0.905 1.002 0.824 1.570 0.808 0.823 0.857 0.953 0.708 0.957 0.818 0.849 0.790 0.824
Cao_SCUT_task1a_1 Cao2021 49 1.136 1.214 1.071 1.318 1.081 1.053 1.290 1.326 0.897 1.084 1.011 1.170 1.187 1.075
Cao_SCUT_task1a_2 Cao2021 56 1.200 1.318 1.102 1.507 1.122 1.072 1.405 1.485 0.878 1.141 1.057 1.222 1.230 1.084
Cao_SCUT_task1a_3 Cao2021 50 1.137 1.223 1.066 1.327 1.064 1.027 1.331 1.364 0.874 1.087 1.011 1.162 1.194 1.070
Cao_SCUT_task1a_4 Cao2021 53 1.147 1.250 1.061 1.403 1.076 1.008 1.315 1.448 0.885 1.060 1.019 1.152 1.185 1.068
Ding_TJU_task1a_1 Ding2021 85 1.544 1.878 1.265 2.188 1.404 1.304 2.228 2.264 1.070 1.293 1.106 1.413 1.374 1.336
Ding_TJU_task1a_2 Ding2021 70 1.326 1.488 1.191 1.879 1.316 1.181 1.593 1.473 0.983 1.187 1.022 1.286 1.374 1.291
Ding_TJU_task1a_3 Ding2021 61 1.226 1.356 1.118 1.388 1.211 1.119 1.447 1.612 0.941 1.099 1.003 1.196 1.285 1.187
Ding_TJU_task1a_4 Ding2021 67 1.296 1.426 1.188 1.566 1.338 1.200 1.366 1.662 0.966 1.203 1.029 1.317 1.360 1.253
Fan_NWPU_task1a_1 Cui2021 64 1.261 1.458 1.098 2.084 1.351 1.093 1.409 1.351 0.977 1.185 0.873 1.113 1.412 1.026
Galindo-Meza_ITESO_task1a_1 Galindo-Meza2021 97 2.221 2.488 1.999 2.869 2.020 1.888 2.683 2.979 1.434 2.178 2.152 2.084 2.420 1.729
Heo_Clova_task1a_1 Hee-Soo2021 42 1.087 1.180 1.009 1.627 1.024 1.010 1.058 1.180 0.926 1.048 0.991 1.065 1.001 1.022
Heo_Clova_task1a_2 Hee-Soo2021 20 0.930 0.993 0.878 1.278 0.879 0.884 0.955 0.967 0.785 0.902 0.843 0.928 0.911 0.896
Heo_Clova_task1a_3 Hee-Soo2021 34 1.045 1.110 0.991 1.390 1.007 0.995 1.049 1.109 0.916 1.040 0.992 1.029 0.998 0.971
Heo_Clova_task1a_4 Hee-Soo2021 12 0.871 0.929 0.823 1.205 0.822 0.802 0.905 0.912 0.754 0.838 0.843 0.881 0.821 0.802
Horváth_HIT_task1a_1 Horvth2021 86 1.597 2.039 1.228 2.242 1.388 1.311 3.143 2.111 1.093 1.133 1.066 1.322 1.491 1.265
Horváth_HIT_task1a_2 Horvth2021 92 2.031 2.072 1.997 2.149 2.012 2.002 2.099 2.096 1.926 2.027 1.996 2.023 2.032 1.978
Horváth_HIT_task1a_3 Horvth2021 76 1.460 1.780 1.193 2.223 1.223 1.442 2.377 1.634 1.117 1.215 1.165 1.264 1.289 1.108
Horváth_HIT_task1a_4 Horvth2021 95 2.065 2.111 2.027 2.215 2.037 2.052 2.127 2.123 1.963 2.037 2.022 2.045 2.071 2.027
Jeng_CHT+NSYSU_task1a_1 Jeng2021 78 1.469 1.557 1.396 1.640 1.445 1.347 1.638 1.713 1.077 1.404 1.326 1.521 1.612 1.436
Jeng_CHT+NSYSU_task1a_2 Jeng2021 84 1.543 1.619 1.480 1.703 1.510 1.457 1.671 1.752 1.185 1.496 1.386 1.593 1.676 1.542
Jeng_CHT+NSYSU_task1a_3 Jeng2021 79 1.470 1.613 1.351 1.855 1.448 1.454 1.613 1.694 1.107 1.410 1.345 1.413 1.477 1.352
Jeong_ETRI_task1a_1 Jeong2021 33 1.041 1.219 0.893 1.502 1.106 0.979 1.386 1.125 0.729 0.854 0.780 0.979 1.061 0.954
Jeong_ETRI_task1a_2 Jeong2021 25 0.952 1.094 0.834 1.241 0.927 0.920 1.309 1.071 0.712 0.823 0.719 0.962 0.952 0.835
Jeong_ETRI_task1a_3 Jeong2021 30 1.023 1.187 0.886 1.491 0.950 0.879 1.474 1.143 0.681 0.845 0.726 1.103 1.054 0.908
Jeong_ETRI_task1a_4 Jeong2021 63 1.228 1.724 0.816 4.295 0.911 0.991 1.279 1.144 0.708 0.831 0.786 0.895 0.915 0.758
Kek_NU_task1a_1 Kek2021 72 1.355 1.461 1.266 1.685 1.330 1.331 1.482 1.479 1.095 1.340 1.232 1.316 1.344 1.268
Kek_NU_task1a_2 Kek2021 57 1.207 1.416 1.034 2.204 1.061 1.087 1.419 1.309 0.883 1.076 1.036 1.086 1.116 1.002
Kim_3M_task1a_1 Kim2021 38 1.076 1.185 0.986 1.420 1.051 1.073 1.169 1.212 0.792 1.090 0.912 0.971 1.191 0.959
Kim_3M_task1a_2 Kim2021 39 1.077 1.185 0.987 1.430 1.038 1.068 1.168 1.222 0.795 1.093 0.907 0.976 1.190 0.961
Kim_3M_task1a_3 Kim2021 37 1.076 1.183 0.986 1.419 1.039 1.068 1.168 1.220 0.795 1.085 0.909 0.975 1.192 0.963
Kim_3M_task1a_4 Kim2021 40 1.078 1.190 0.986 1.430 1.043 1.076 1.151 1.249 0.803 1.097 0.917 0.969 1.164 0.963
Kim_KNU_task1a_1 Kim2021a 46 1.115 1.317 0.946 2.529 0.953 1.046 1.003 1.056 0.866 1.030 0.925 0.985 0.936 0.931
Kim_KNU_task1a_2 Kim2021a 28 1.010 1.215 0.839 1.412 0.988 0.854 1.612 1.212 0.734 0.842 0.757 0.931 0.930 0.839
Kim_KNU_task1a_3 Kim2021a 55 1.188 1.371 1.036 2.379 1.047 1.130 1.102 1.198 0.909 1.150 0.983 1.106 1.050 1.016
Kim_KNU_task1a_4 Kim2021a 52 1.143 1.315 1.000 2.113 1.063 1.146 1.073 1.182 0.883 1.081 0.958 1.045 1.034 0.997
Kim_QTI_task1a_1 Kim2021b 8 0.793 0.851 0.744 1.162 0.756 0.720 0.784 0.832 0.631 0.780 0.749 0.784 0.773 0.749
Kim_QTI_task1a_2 Kim2021b 1 0.724 0.766 0.689 1.059 0.665 0.631 0.720 0.754 0.561 0.754 0.719 0.721 0.704 0.675
Kim_QTI_task1a_3 Kim2021b 2 0.735 0.792 0.687 1.195 0.680 0.643 0.719 0.724 0.585 0.730 0.724 0.739 0.685 0.659
Kim_QTI_task1a_4 Kim2021b 5 0.764 0.832 0.708 1.169 0.733 0.720 0.768 0.772 0.598 0.751 0.725 0.746 0.747 0.680
Koutini_CPJKU_task1a_1 Koutini2021 14 0.883 1.051 0.743 1.704 0.784 0.808 0.990 0.968 0.612 0.746 0.720 0.815 0.841 0.727
Koutini_CPJKU_task1a_2 Koutini2021 10 0.842 0.976 0.730 1.592 0.741 0.778 0.868 0.904 0.581 0.783 0.723 0.812 0.795 0.686
Koutini_CPJKU_task1a_3 Koutini2021 9 0.834 0.947 0.740 1.477 0.739 0.759 0.865 0.896 0.600 0.783 0.748 0.821 0.791 0.695
Koutini_CPJKU_task1a_4 Koutini2021 11 0.847 0.970 0.744 1.624 0.761 0.752 0.856 0.859 0.625 0.807 0.786 0.776 0.755 0.716
Lim_CAU_task1a_1 Lim2021 90 1.956 2.767 1.280 5.170 1.776 1.894 3.051 1.944 1.100 1.152 1.168 1.272 1.546 1.443
Lim_CAU_task1a_2 Lim2021 91 2.010 2.892 1.275 6.246 1.406 1.768 3.284 1.754 1.167 1.131 1.201 1.168 1.576 1.406
Lim_CAU_task1a_3 Lim2021 80 1.479 1.892 1.134 2.711 1.624 1.320 1.849 1.955 0.837 1.282 1.061 1.241 1.280 1.106
Lim_CAU_task1a_4 Lim2021 93 2.039 2.998 1.240 7.522 1.699 1.348 2.471 1.952 1.069 1.334 0.977 1.412 1.229 1.417
Liu_UESTC_task1a_1 Liu2021 16 0.900 0.974 0.838 1.367 0.873 0.845 0.866 0.920 0.749 0.879 0.796 0.912 0.880 0.813
Liu_UESTC_task1a_2 Liu2021 15 0.895 0.955 0.844 1.334 0.848 0.834 0.851 0.907 0.759 0.902 0.796 0.909 0.890 0.810
Liu_UESTC_task1a_3 Liu2021 13 0.878 0.966 0.804 1.398 0.833 0.837 0.853 0.908 0.680 0.871 0.823 0.877 0.810 0.766
Liu_UESTC_task1a_4 Liu2021 87 1.626 1.756 1.519 1.893 1.622 1.643 1.826 1.796 1.222 1.539 1.276 1.648 1.848 1.580
Madhu_CET_task1a_1 Madhu2021 99 3.950 3.952 3.948 3.813 3.947 4.018 3.974 4.008 3.925 3.999 4.019 3.923 3.962 3.858
DCASE2021 baseline 1.730 2.222 1.320 3.255 1.609 1.610 2.142 2.494 1.085 1.361 1.174 1.361 1.468 1.473
Naranjo-Alcazar_ITI_task1a_1 Naranjo-Alcazar2021_t1a 51 1.140 1.348 0.967 1.821 1.043 1.048 1.543 1.285 0.814 0.999 0.944 1.035 1.061 0.949
Pham_AIT_task1a_1 Pham2021 73 1.368 1.653 1.130 2.525 1.185 1.230 1.882 1.443 0.889 1.399 1.037 1.103 1.355 0.994
Pham_AIT_task1a_2 Pham2021 54 1.187 1.398 1.011 1.877 1.105 1.090 1.438 1.479 0.798 1.200 1.089 1.105 1.042 0.830
Pham_AIT_task1a_3 Pham2021 94 2.058 2.497 1.693 3.776 1.829 1.871 2.705 2.306 1.429 2.085 1.691 1.686 1.852 1.412
Phan_UIUC_task1a_1 Phan2021 65 1.272 1.369 1.191 1.748 1.193 1.226 1.290 1.387 1.069 1.256 1.211 1.197 1.261 1.152
Phan_UIUC_task1a_2 Phan2021 71 1.335 1.419 1.265 1.766 1.270 1.278 1.350 1.431 1.145 1.318 1.271 1.284 1.330 1.240
Phan_UIUC_task1a_3 Phan2021 60 1.223 1.294 1.164 1.618 1.156 1.179 1.249 1.266 1.074 1.224 1.177 1.173 1.214 1.120
Phan_UIUC_task1a_4 Phan2021 66 1.292 1.351 1.242 1.643 1.228 1.240 1.303 1.342 1.146 1.304 1.251 1.248 1.301 1.201
Puy_VAI_task1a_1 Puy2021 24 0.952 1.159 0.779 1.621 0.937 0.887 1.286 1.066 0.666 0.823 0.647 0.822 0.911 0.804
Puy_VAI_task1a_2 Puy2021 27 0.974 1.152 0.825 1.404 0.953 0.939 1.265 1.199 0.658 0.880 0.701 0.874 0.990 0.848
Puy_VAI_task1a_3 Puy2021 22 0.939 1.116 0.791 1.331 0.920 0.915 1.310 1.107 0.672 0.806 0.688 0.838 0.912 0.829
Qiao_NCUT_task1a_1 Qiao2021 88 1.630 1.651 1.612 1.768 1.609 1.534 1.622 1.724 1.581 1.631 1.592 1.640 1.625 1.603
Seo_SGU_task1a_1 Seo2021 32 1.030 1.107 0.965 1.502 1.006 0.959 1.002 1.068 0.917 0.988 1.007 1.005 0.957 0.916
Seo_SGU_task1a_2 Seo2021 41 1.080 1.164 1.010 1.592 1.033 1.019 1.056 1.123 0.931 1.080 1.016 1.044 1.009 0.977
Seo_SGU_task1a_3 Seo2021 35 1.065 1.149 0.995 1.592 1.008 1.002 1.035 1.106 0.927 1.066 1.014 1.028 0.981 0.953
Seo_SGU_task1a_4 Seo2021 44 1.087 1.175 1.014 1.572 1.045 1.026 1.078 1.155 0.938 1.092 1.001 1.064 1.002 0.986
Singh_IITMandi_task1a_1 Singh2021 77 1.464 1.687 1.277 1.984 1.445 1.251 1.873 1.883 1.041 1.245 1.112 1.406 1.512 1.349
Singh_IITMandi_task1a_2 Singh2021 83 1.515 1.730 1.337 1.873 1.425 1.329 2.012 2.010 1.082 1.319 1.195 1.343 1.598 1.482
Singh_IITMandi_task1a_3 Singh2021 82 1.509 1.761 1.299 1.878 1.436 1.330 2.107 2.055 1.024 1.315 1.164 1.358 1.504 1.430
Singh_IITMandi_task1a_4 Singh2021 81 1.488 1.738 1.279 1.736 1.473 1.325 2.078 2.080 1.041 1.291 1.159 1.345 1.451 1.386
Sugahara_RION_task1a_1 Sugahara2021 43 1.087 1.247 0.953 1.601 1.038 1.102 1.336 1.159 0.863 1.010 0.878 1.023 1.007 0.939
Sugahara_RION_task1a_2 Sugahara2021 36 1.070 1.231 0.936 1.614 1.017 1.099 1.307 1.118 0.871 0.995 0.875 0.994 0.970 0.915
Sugahara_RION_task1a_3 Sugahara2021 31 1.024 1.159 0.912 1.608 0.945 1.017 1.185 1.043 0.933 0.947 0.864 0.954 0.898 0.875
Sugahara_RION_task1a_4 Sugahara2021 68 1.297 1.610 1.036 2.081 1.254 1.580 2.014 1.124 1.068 1.194 0.905 1.129 0.976 0.941
Verbitskiy_DS_task1a_1 Verbitskiy2021 48 1.127 1.305 0.978 1.410 1.114 1.027 1.532 1.444 0.856 0.935 0.902 1.009 1.163 1.005
Verbitskiy_DS_task1a_2 Verbitskiy2021 29 1.019 1.144 0.915 1.243 0.990 0.941 1.257 1.290 0.782 0.867 0.831 0.987 1.072 0.954
Verbitskiy_DS_task1a_3 Verbitskiy2021 26 0.966 1.102 0.852 1.304 0.888 0.919 1.198 1.203 0.726 0.805 0.771 0.926 0.996 0.886
Verbitskiy_DS_task1a_4 Verbitskiy2021 19 0.924 1.040 0.827 1.197 0.886 0.869 1.050 1.200 0.697 0.831 0.766 0.908 0.930 0.829
Yang_GT_task1a_1 Yang2021 6 0.768 0.846 0.703 1.075 0.721 0.737 0.902 0.792 0.611 0.738 0.688 0.787 0.724 0.673
Yang_GT_task1a_2 Yang2021 4 0.764 0.840 0.700 1.091 0.707 0.724 0.882 0.797 0.611 0.741 0.671 0.784 0.722 0.670
Yang_GT_task1a_3 Yang2021 3 0.758 0.832 0.696 1.058 0.711 0.723 0.875 0.795 0.608 0.738 0.667 0.785 0.711 0.667
Yang_GT_task1a_4 Yang2021 7 0.774 0.850 0.710 1.087 0.724 0.735 0.898 0.805 0.621 0.737 0.692 0.796 0.737 0.679
Yihao_speakin_task1a_1 Yihao2021 69 1.311 1.376 1.257 1.374 1.255 1.156 1.516 1.578 1.036 1.260 1.171 1.372 1.408 1.297
Yihao_speakin_task1a_2 Yihao2021 59 1.222 1.284 1.171 1.295 1.174 1.102 1.361 1.487 0.949 1.177 1.126 1.260 1.307 1.211
Yihao_speakin_task1a_3 Yihao2021 96 2.105 2.114 2.097 2.109 2.098 2.082 2.136 2.145 2.047 2.099 2.079 2.127 2.118 2.112
Zhang_BUPT&BYTEDANCE_task1a_1 Zhang2021 47 1.124 1.243 1.024 1.448 1.135 0.974 1.301 1.358 0.812 0.988 0.967 1.158 1.172 1.048
Zhang_BUPT&BYTEDANCE_task1a_2 Zhang2021 45 1.113 1.242 1.006 1.460 1.044 0.947 1.381 1.377 0.791 0.999 0.911 1.116 1.161 1.056
Zhang_BUPT&BYTEDANCE_task1a_3 Zhang2021 98 3.359 3.840 2.958 4.402 3.559 3.079 4.339 3.819 2.726 2.674 3.054 2.923 2.935 3.438
Zhang_BUPT&BYTEDANCE_task1a_4 Zhang2021 89 1.946 2.451 1.525 3.028 2.127 1.496 2.811 2.796 1.233 1.093 1.223 1.765 2.384 1.451
Zhao_Maxvision_task1a_1 Zhao2021 75 1.440 1.598 1.308 1.810 1.375 1.379 1.705 1.722 1.084 1.342 1.271 1.398 1.450 1.305
Zhao_Maxvision_task1a_2 Zhao2021 74 1.412 1.551 1.297 1.782 1.355 1.364 1.656 1.596 1.134 1.328 1.288 1.355 1.396 1.279
Zhao_Maxvision_task1a_3 Zhao2021 62 1.227 1.430 1.057 1.570 1.170 1.107 1.685 1.620 0.862 1.104 1.037 1.147 1.166 1.024
Zhao_Maxvision_task1a_4 Zhao2021 58 1.215 1.406 1.056 1.605 1.193 1.116 1.548 1.569 0.831 1.070 1.087 1.162 1.163 1.024

Accuracy

Unseen devices Seen devices
Rank Submission label Technical
Report
Official
system
rank
Accuracy Accuracy /
Unseen
Accuracy /
Seen
D S7 S8 S9 S10 A B C S1 S2 S3
Byttebier_IDLab_task1a_1 Byttebier2021 21 68.6 64.5 72.0 44.0 69.9 71.1 70.3 67.4 77.4 68.3 71.5 70.0 72.2 72.6
Byttebier_IDLab_task1a_2 Byttebier2021 18 67.5 63.6 70.8 42.9 68.9 70.7 68.3 67.2 75.4 66.5 71.1 70.1 70.7 71.0
Byttebier_IDLab_task1a_3 Byttebier2021 23 68.5 64.7 71.7 44.2 70.6 72.4 68.8 67.8 76.1 68.1 71.5 71.5 70.8 72.2
Byttebier_IDLab_task1a_4 Byttebier2021 17 68.8 65.5 71.5 48.3 71.4 70.8 70.6 66.5 76.5 68.2 72.1 71.4 70.3 70.6
Cao_SCUT_task1a_1 Cao2021 49 66.7 62.5 70.2 58.3 68.3 67.9 58.5 59.3 75.4 70.6 72.6 65.8 67.9 68.9
Cao_SCUT_task1a_2 Cao2021 56 64.6 59.0 69.2 53.1 68.9 71.0 53.9 48.2 78.3 68.1 73.1 64.3 62.2 69.3
Cao_SCUT_task1a_3 Cao2021 50 67.2 63.3 70.4 58.8 71.4 70.3 58.6 57.2 76.8 70.1 72.5 66.5 65.7 70.8
Cao_SCUT_task1a_4 Cao2021 53 66.1 60.8 70.5 54.3 70.3 71.7 55.4 52.5 76.5 70.6 71.9 66.9 66.3 70.7
Ding_TJU_task1a_1 Ding2021 85 53.0 46.8 58.2 46.5 52.8 55.7 39.6 39.3 68.3 59.2 62.6 50.8 54.7 53.8
Ding_TJU_task1a_2 Ding2021 70 51.1 45.9 55.4 41.4 49.9 55.0 40.8 42.2 64.2 53.3 62.1 52.4 47.5 52.9
Ding_TJU_task1a_3 Ding2021 61 49.1 43.9 53.4 46.4 47.2 46.4 43.1 36.2 61.1 51.9 58.9 50.3 47.6 50.6
Ding_TJU_task1a_4 Ding2021 67 51.4 46.6 55.4 45.8 46.9 52.9 48.8 38.8 64.7 55.0 61.4 49.4 48.5 53.2
Fan_NWPU_task1a_1 Cui2021 64 68.3 65.6 70.6 54.7 71.8 69.4 65.8 66.0 74.7 69.0 74.0 68.6 67.5 69.4
Galindo-Meza_ITESO_task1a_1 Galindo-Meza2021 97 53.9 50.8 56.5 50.3 56.2 57.8 47.1 42.6 70.8 51.8 54.6 53.6 49.4 58.5
Heo_Clova_task1a_1 Hee-Soo2021 42 67.0 62.6 70.7 42.5 69.3 70.7 67.2 63.2 74.7 70.0 72.6 67.6 70.7 68.5
Heo_Clova_task1a_2 Hee-Soo2021 20 66.9 64.1 69.2 55.8 69.4 66.2 64.9 64.0 74.6 69.9 69.2 67.5 66.4 67.9
Heo_Clova_task1a_3 Hee-Soo2021 34 70.0 67.1 72.5 56.0 71.4 72.6 69.2 66.4 76.7 69.6 72.1 71.3 72.6 72.6
Heo_Clova_task1a_4 Hee-Soo2021 12 70.1 68.1 71.8 57.2 71.2 72.9 69.7 69.3 75.1 72.4 70.0 71.0 70.3 72.1
Horváth_HIT_task1a_1 Horvth2021 86 51.4 44.1 57.5 44.4 52.4 51.5 32.6 39.3 65.1 57.6 60.0 53.2 53.3 56.0
Horváth_HIT_task1a_2 Horvth2021 92 53.3 47.1 58.6 36.4 56.9 57.9 40.8 43.2 69.0 56.1 58.5 53.5 53.3 61.1
Horváth_HIT_task1a_3 Horvth2021 76 51.6 44.6 57.5 34.6 56.1 48.5 41.0 42.8 61.9 54.9 55.3 55.3 56.1 61.4
Horváth_HIT_task1a_4 Horvth2021 95 49.2 40.4 56.5 21.0 53.3 52.4 36.1 39.3 66.4 56.3 58.3 52.2 48.8 56.8
Jeng_CHT+NSYSU_task1a_1 Jeng2021 78 55.0 50.9 58.4 47.6 54.0 64.3 47.1 41.4 70.1 57.9 63.5 52.6 49.2 56.8
Jeng_CHT+NSYSU_task1a_2 Jeng2021 84 51.3 47.3 54.6 43.1 53.2 56.7 43.3 40.4 65.8 54.4 58.9 49.6 47.5 51.2
Jeng_CHT+NSYSU_task1a_3 Jeng2021 79 56.3 50.9 60.8 40.7 59.6 57.6 50.8 45.6 68.8 57.4 60.1 60.8 55.4 62.1
Jeong_ETRI_task1a_1 Jeong2021 33 66.0 60.6 70.5 51.0 66.0 66.0 56.9 62.9 75.6 69.9 73.8 68.2 65.3 70.3
Jeong_ETRI_task1a_2 Jeong2021 25 67.0 62.6 70.6 57.4 66.8 66.2 59.2 63.6 75.3 69.4 75.4 65.1 68.6 69.6
Jeong_ETRI_task1a_3 Jeong2021 30 66.7 61.4 71.1 52.8 67.1 66.2 59.6 61.3 76.7 69.7 75.6 67.5 67.1 70.3
Jeong_ETRI_task1a_4 Jeong2021 63 66.1 59.6 71.6 43.5 67.8 66.3 59.6 61.0 77.4 70.1 72.8 66.7 68.8 73.6
Kek_NU_task1a_1 Kek2021 72 66.8 61.3 71.3 45.0 69.3 68.9 61.9 61.2 78.5 65.4 72.4 71.0 69.2 71.5
Kek_NU_task1a_2 Kek2021 57 63.5 56.6 69.3 35.3 67.2 67.6 52.8 60.0 74.2 67.6 69.2 68.2 65.8 70.7
Kim_3M_task1a_1 Kim2021 38 61.5 57.7 64.6 51.0 62.6 60.7 59.6 54.9 70.0 62.9 66.8 63.6 58.6 65.8
Kim_3M_task1a_2 Kim2021 39 61.6 58.1 64.5 52.1 63.1 60.8 59.9 54.6 69.9 63.5 66.9 63.7 58.1 64.9
Kim_3M_task1a_3 Kim2021 37 62.0 58.6 64.9 52.1 63.5 61.4 60.7 55.4 71.1 63.1 66.4 63.5 58.3 67.1
Kim_3M_task1a_4 Kim2021 40 61.3 57.6 64.3 51.2 62.2 60.6 59.9 54.2 70.1 62.2 65.8 63.5 58.5 65.8
Kim_KNU_task1a_1 Kim2021a 46 64.7 59.4 69.1 32.8 68.5 63.3 67.1 65.6 72.4 66.0 69.9 68.3 68.3 69.6
Kim_KNU_task1a_2 Kim2021a 28 63.8 57.2 69.4 49.9 63.5 67.8 48.9 55.8 75.1 69.3 71.9 63.3 66.4 70.0
Kim_KNU_task1a_3 Kim2021a 55 61.3 56.2 65.6 31.3 65.1 60.8 64.3 59.6 69.7 60.4 68.1 64.9 64.9 65.7
Kim_KNU_task1a_4 Kim2021a 52 62.9 57.7 67.3 35.3 65.0 59.7 66.7 61.7 70.1 64.2 67.8 66.7 66.9 67.9
Kim_QTI_task1a_1 Kim2021b 8 75.0 73.6 76.2 66.0 76.8 76.8 74.7 73.6 81.1 73.3 77.1 74.3 75.3 76.0
Kim_QTI_task1a_2 Kim2021b 1 76.1 74.5 77.4 68.9 76.8 76.7 75.8 74.4 82.6 74.2 76.7 76.0 76.8 78.1
Kim_QTI_task1a_3 Kim2021b 2 76.1 75.2 76.9 66.0 78.2 77.8 77.2 76.8 81.1 74.7 76.5 75.6 77.1 76.4
Kim_QTI_task1a_4 Kim2021b 5 75.2 73.3 76.8 66.1 76.4 75.7 75.0 73.5 81.2 74.4 75.3 77.1 75.0 77.5
Koutini_CPJKU_task1a_1 Koutini2021 14 70.9 66.4 74.6 51.0 75.0 72.1 67.5 66.5 80.1 74.7 74.4 73.1 70.4 74.6
Koutini_CPJKU_task1a_2 Koutini2021 10 71.8 68.2 74.8 52.1 75.6 74.0 71.0 68.3 81.4 72.5 74.3 71.1 71.9 77.6
Koutini_CPJKU_task1a_3 Koutini2021 9 72.1 69.6 74.2 57.6 75.0 73.1 71.7 70.6 80.6 73.8 73.6 69.6 72.4 75.4
Koutini_CPJKU_task1a_4 Koutini2021 11 71.8 69.3 74.0 54.7 74.6 74.7 70.7 71.8 79.4 71.8 72.4 73.5 72.8 73.9
Lim_CAU_task1a_1 Lim2021 90 67.5 62.2 71.9 50.6 70.0 68.3 59.7 62.2 77.5 71.4 72.2 68.1 71.4 71.1
Lim_CAU_task1a_2 Lim2021 91 67.9 62.3 72.6 49.0 71.2 68.5 58.9 64.0 77.9 71.4 74.6 69.7 71.0 71.0
Lim_CAU_task1a_3 Lim2021 80 68.5 64.1 72.2 56.2 70.0 68.3 62.5 63.3 78.3 68.7 74.2 69.7 69.7 72.6
Lim_CAU_task1a_4 Lim2021 93 65.8 60.1 70.5 41.4 70.4 68.8 58.6 61.1 74.3 68.1 73.8 66.8 65.4 74.7
Liu_UESTC_task1a_1 Liu2021 16 68.8 66.1 71.1 53.5 68.9 69.6 69.9 68.6 75.0 68.1 72.9 70.8 68.1 71.8
Liu_UESTC_task1a_2 Liu2021 15 68.2 66.4 69.7 50.7 70.6 70.6 70.8 69.3 73.2 66.9 70.6 68.9 66.1 72.8
Liu_UESTC_task1a_3 Liu2021 13 69.6 66.8 71.9 54.4 72.5 69.4 70.4 67.4 78.7 69.4 71.4 67.9 70.4 73.5
Liu_UESTC_task1a_4 Liu2021 87 42.0 38.3 45.0 39.4 39.9 42.5 35.7 34.0 55.1 42.4 52.8 41.2 37.9 40.7
Madhu_CET_task1a_1 Madhu2021 99 9.7 9.2 10.1 9.2 9.9 9.4 8.3 9.3 10.4 10.4 7.8 9.6 10.6 11.7
DCASE2021 baseline 45.6 38.0 51.9 29.2 46.5 49.7 34.0 30.6 62.5 51.7 57.6 49.6 44.3 45.8
Naranjo-Alcazar_ITI_task1a_1 Naranjo-Alcazar2021_t1a 51 60.2 53.4 65.9 45.3 60.1 61.4 47.6 52.8 71.3 66.0 66.0 63.9 62.1 66.2
Pham_AIT_task1a_1 Pham2021 73 67.5 64.3 70.1 55.4 71.0 69.4 61.0 64.9 77.8 65.3 71.9 71.4 62.8 71.4
Pham_AIT_task1a_2 Pham2021 54 68.4 64.8 71.3 57.8 70.6 70.3 62.8 62.8 78.6 67.4 68.9 69.3 68.5 75.4
Pham_AIT_task1a_3 Pham2021 94 69.6 66.1 72.6 57.8 72.9 70.3 63.9 65.4 78.8 68.8 71.0 73.3 67.9 76.0
Phan_UIUC_task1a_1 Phan2021 65 63.3 59.2 66.7 43.6 64.4 65.3 63.9 59.0 73.5 61.5 67.4 66.4 63.1 68.6
Phan_UIUC_task1a_2 Phan2021 71 63.3 59.2 66.7 43.6 64.4 65.3 63.9 59.0 73.5 61.5 67.4 66.4 63.1 68.6
Phan_UIUC_task1a_3 Phan2021 60 65.3 62.8 67.5 49.9 66.7 67.2 66.4 63.9 72.8 63.9 69.0 66.5 63.7 68.8
Phan_UIUC_task1a_4 Phan2021 66 65.3 62.8 67.5 49.9 66.7 67.2 66.4 63.9 72.8 63.9 69.0 66.5 63.7 68.8
Puy_VAI_task1a_1 Puy2021 24 66.6 59.7 72.4 46.8 66.0 67.9 57.4 60.3 77.2 72.8 77.5 69.0 67.2 70.6
Puy_VAI_task1a_2 Puy2021 27 65.4 59.4 70.5 49.0 68.6 69.6 53.9 56.0 76.9 69.7 74.9 67.5 63.1 70.7
Puy_VAI_task1a_3 Puy2021 22 66.2 60.1 71.2 51.9 67.9 66.9 53.9 59.9 77.1 71.2 74.2 68.9 67.5 68.3
Qiao_NCUT_task1a_1 Qiao2021 88 52.2 50.7 53.5 42.8 54.4 55.7 53.3 47.1 55.1 49.4 54.2 52.1 54.9 55.4
Seo_SGU_task1a_1 Seo2021 32 70.3 67.4 72.8 53.2 71.1 73.5 71.0 68.3 75.1 73.3 70.3 71.0 72.6 74.3
Seo_SGU_task1a_2 Seo2021 41 71.4 67.7 74.4 46.8 73.3 77.2 71.9 69.2 78.5 73.6 74.9 72.2 72.8 74.6
Seo_SGU_task1a_3 Seo2021 35 71.3 67.6 74.4 49.6 72.5 74.3 72.4 69.2 77.2 73.2 73.8 72.6 74.0 75.8
Seo_SGU_task1a_4 Seo2021 44 71.8 67.6 75.3 47.4 73.6 75.1 73.5 68.5 79.2 72.4 75.6 73.1 75.3 76.7
Singh_IITMandi_task1a_1 Singh2021 77 47.2 41.5 51.9 40.8 45.0 51.4 34.7 35.7 63.6 50.7 57.9 47.6 42.9 48.5
Singh_IITMandi_task1a_2 Singh2021 83 44.7 40.0 48.5 41.0 47.9 49.2 30.1 31.9 60.6 47.6 54.3 46.0 41.0 41.7
Singh_IITMandi_task1a_3 Singh2021 82 46.1 40.9 50.4 40.6 47.6 50.8 33.8 31.9 63.7 48.5 55.1 47.1 42.6 45.1
Singh_IITMandi_task1a_4 Singh2021 81 46.8 41.3 51.5 45.1 47.2 50.7 32.5 30.8 61.0 48.6 58.6 48.8 44.7 47.2
Sugahara_RION_task1a_1 Sugahara2021 43 63.8 57.8 68.8 43.6 66.4 61.5 53.9 63.7 72.6 66.1 71.2 65.1 68.1 69.9
Sugahara_RION_task1a_2 Sugahara2021 36 65.2 58.2 71.0 42.2 67.2 62.1 55.0 64.7 72.8 69.4 72.4 67.1 71.8 72.6
Sugahara_RION_task1a_3 Sugahara2021 31 65.3 60.8 69.1 43.2 68.9 65.8 58.5 67.5 63.3 67.1 71.0 68.9 71.4 72.8
Sugahara_RION_task1a_4 Sugahara2021 68 64.7 57.9 70.4 43.1 65.0 61.8 54.4 65.3 72.4 68.3 71.9 67.2 70.7 71.8
Verbitskiy_DS_task1a_1 Verbitskiy2021 48 61.4 55.5 66.2 50.4 62.1 66.8 46.5 51.7 71.8 66.7 67.8 65.8 58.2 67.2
Verbitskiy_DS_task1a_2 Verbitskiy2021 29 64.5 60.4 67.8 55.6 66.2 68.6 55.6 56.1 73.1 68.8 71.2 65.6 62.1 66.4
Verbitskiy_DS_task1a_3 Verbitskiy2021 26 67.3 63.1 70.8 59.9 69.6 69.2 57.8 59.2 74.7 72.6 73.3 69.9 66.0 68.5
Verbitskiy_DS_task1a_4 Verbitskiy2021 19 68.1 64.2 71.4 61.4 69.3 68.5 64.4 57.2 76.4 71.2 74.0 68.5 68.1 70.0
Yang_GT_task1a_1 Yang2021 6 73.1 70.8 74.9 63.6 74.7 75.0 67.9 72.8 78.8 74.9 76.1 71.4 72.5 76.0
Yang_GT_task1a_2 Yang2021 4 72.9 70.0 75.4 61.7 74.6 73.9 67.8 71.9 78.5 74.7 77.8 71.4 73.3 76.5
Yang_GT_task1a_3 Yang2021 3 72.9 70.1 75.1 62.1 74.6 74.3 67.8 71.9 78.3 74.0 77.9 71.1 73.3 76.1
Yang_GT_task1a_4 Yang2021 7 72.8 70.2 74.9 63.1 75.1 74.3 67.4 71.4 76.7 74.7 76.9 72.4 72.9 76.0
Yihao_speakin_task1a_1 Yihao2021 69 51.9 49.7 53.6 49.6 52.6 55.6 48.2 42.8 64.2 53.3 56.2 50.4 45.6 51.9
Yihao_speakin_task1a_2 Yihao2021 59 55.2 53.5 56.6 53.3 56.5 59.9 51.0 46.9 66.0 57.6 57.2 54.7 49.4 54.9
Yihao_speakin_task1a_3 Yihao2021 96 53.5 50.7 55.8 49.3 55.1 59.4 45.7 44.0 63.1 57.1 58.5 50.0 52.2 54.0
Zhang_BUPT&BYTEDANCE_task1a_1 Zhang2021 47 63.0 58.9 66.4 53.5 63.1 70.3 54.7 53.1 74.2 67.6 70.7 62.4 59.3 64.4
Zhang_BUPT&BYTEDANCE_task1a_2 Zhang2021 45 63.2 57.4 68.1 49.9 66.1 68.5 49.9 52.6 75.8 70.3 71.2 62.1 61.1 67.9
Zhang_BUPT&BYTEDANCE_task1a_3 Zhang2021 98 52.2 47.3 56.3 42.1 52.9 57.1 39.0 45.6 64.3 59.9 60.8 51.4 50.7 50.8
Zhang_BUPT&BYTEDANCE_task1a_4 Zhang2021 89 59.0 53.2 63.8 54.4 57.6 60.7 48.1 45.4 70.6 68.8 68.5 58.8 55.8 60.7
Zhao_Maxvision_task1a_1 Zhao2021 75 61.2 54.2 67.1 46.9 64.9 66.1 47.6 45.6 76.4 66.4 70.4 61.9 58.9 68.5
Zhao_Maxvision_task1a_2 Zhao2021 74 63.5 55.6 70.0 45.3 67.5 67.1 48.2 50.1 77.4 67.4 70.4 66.2 65.8 73.1
Zhao_Maxvision_task1a_3 Zhao2021 62 63.5 55.9 70.0 46.2 66.9 67.5 48.3 50.3 76.8 67.9 70.3 66.0 66.0 72.8
Zhao_Maxvision_task1a_4 Zhao2021 58 62.8 56.2 68.3 44.2 68.5 66.2 51.7 50.4 77.4 67.1 70.4 64.6 61.7 68.9

System characteristics

General characteristics

Rank Submission label Technical
Report
Official
system
rank
Logloss
(Eval)
Accuracy
(Eval)
Sampling
rate
Data
augmentation
Features Embeddings
Byttebier_IDLab_task1a_1 Byttebier2021 21 0.936 68.6 44.1kHz mixup, temporal cropping, speed augmentation log-mel energies
Byttebier_IDLab_task1a_2 Byttebier2021 18 0.914 67.5 44.1kHz mixup, temporal cropping, speed augmentation log-mel energies
Byttebier_IDLab_task1a_3 Byttebier2021 23 0.944 68.5 44.1kHz mixup, temporal cropping, speed augmentation log-mel energies
Byttebier_IDLab_task1a_4 Byttebier2021 17 0.905 68.8 44.1kHz mixup, temporal cropping, speed augmentation log-mel energies
Cao_SCUT_task1a_1 Cao2021 49 1.136 66.7 44.1kHz mixup, time stretching,pitch shifting,spectrum correction log-mel energies
Cao_SCUT_task1a_2 Cao2021 56 1.200 64.6 44.1kHz mixup, time stretching,pitch shifting,spectrum correction log-mel energies
Cao_SCUT_task1a_3 Cao2021 50 1.137 67.2 44.1kHz mixup, time stretching,pitch shifting,spectrum correction log-mel energies
Cao_SCUT_task1a_4 Cao2021 53 1.147 66.1 44.1kHz mixup, time stretching,pitch shifting,spectrum correction log-mel energies
Ding_TJU_task1a_1 Ding2021 85 1.544 53.0 44.1kHz log-mel energies
Ding_TJU_task1a_2 Ding2021 70 1.326 51.1 44.1kHz log-mel energies
Ding_TJU_task1a_3 Ding2021 61 1.226 49.1 44.1kHz log-mel energies
Ding_TJU_task1a_4 Ding2021 67 1.296 51.4 44.1kHz log-mel energies
Fan_NWPU_task1a_1 Cui2021 64 1.261 68.3 44.1kHz reverb, filtering, random gain adjust, SpecAugment log-mel energies
Galindo-Meza_ITESO_task1a_1 Galindo-Meza2021 97 2.221 53.9 16kHz random noise, random gain, random cropping, mixup raw waveform AemNet
Heo_Clova_task1a_1 Hee-Soo2021 42 1.087 67.0 44.1kHz mixup, spectrum augmentation, device augmentation log-mel energies
Heo_Clova_task1a_2 Hee-Soo2021 20 0.930 66.9 44.1kHz mixup, tempo, channel corruption log-mel energies
Heo_Clova_task1a_3 Hee-Soo2021 34 1.045 70.0 44.1kHz mixup, spectrum augmentation, device augmentation log-mel energies
Heo_Clova_task1a_4 Hee-Soo2021 12 0.871 70.1 44.1kHz mixup, tempo, channel corruption log-mel energies
Horváth_HIT_task1a_1 Horvth2021 86 1.597 51.4 44.1kHz mixup, time stretching, pitch shifting, random noise, spectrum augmentation, random temporal shuffle, volume change log-mel energies, HPSS
Horváth_HIT_task1a_2 Horvth2021 92 2.031 53.3 44.1kHz mixup, time stretching, pitch shifting, random noise, spectrum augmentation, random temporal shuffle, volume change log-mel energies, HPSS
Horváth_HIT_task1a_3 Horvth2021 76 1.460 51.6 44.1kHz mixup, time stretching, pitch shifting, random noise, spectrum augmentation, random temporal shuffle, volume change log-mel energies, HPSS
Horváth_HIT_task1a_4 Horvth2021 95 2.065 49.2 44.1kHz mixup, time stretching, pitch shifting, random noise, spectrum augmentation, random temporal shuffle, volume change log-mel energies, HPSS
Jeng_CHT+NSYSU_task1a_1 Jeng2021 78 1.469 55.0 44.1kHz log-mel energies
Jeng_CHT+NSYSU_task1a_2 Jeng2021 84 1.543 51.3 44.1kHz log-mel energies
Jeng_CHT+NSYSU_task1a_3 Jeng2021 79 1.470 56.3 44.1kHz log-mel energies
Jeong_ETRI_task1a_1 Jeong2021 33 1.041 66.0 44.1kHz temporal cropping log-mel energies, deltas, delta-deltas
Jeong_ETRI_task1a_2 Jeong2021 25 0.952 67.0 44.1kHz temporal cropping, SpecAugment log-mel energies, deltas, delta-deltas
Jeong_ETRI_task1a_3 Jeong2021 30 1.023 66.7 44.1kHz temporal cropping log-mel energies, deltas, delta-deltas
Jeong_ETRI_task1a_4 Jeong2021 63 1.228 66.1 44.1kHz temporal cropping, SpecAugment log-mel energies, deltas, delta-deltas
Kek_NU_task1a_1 Kek2021 72 1.355 66.8 44.1kHz Wavelet Scattering
Kek_NU_task1a_2 Kek2021 57 1.207 63.5 44.1kHz Wavelet Scattering
Kim_3M_task1a_1 Kim2021 38 1.076 61.5 22.05kHz mixup, SpecAugment Perceptually-weighted log-mel energies VGGish
Kim_3M_task1a_2 Kim2021 39 1.077 61.6 22.05kHz mixup, SpecAugment Perceptually-weighted log-mel energies VGGish
Kim_3M_task1a_3 Kim2021 37 1.076 62.0 22.05kHz mixup, SpecAugment Perceptually-weighted log-mel energies VGGish
Kim_3M_task1a_4 Kim2021 40 1.078 61.3 22.05kHz mixup, SpecAugment Perceptually-weighted log-mel energies VGGish
Kim_KNU_task1a_1 Kim2021a 46 1.115 64.7 44.1kHz mixup log-mel energies, delta-log-mel energies, delta-delta-log-mel energies
Kim_KNU_task1a_2 Kim2021a 28 1.010 63.8 44.1kHz mixup log-mel energies, delta-log-mel energies, delta-delta-log-mel energies
Kim_KNU_task1a_3 Kim2021a 55 1.188 61.3 44.1kHz mixup log-mel energies, delta-log-mel energies, delta-delta-log-mel energies
Kim_KNU_task1a_4 Kim2021a 52 1.143 62.9 44.1kHz mixup log-mel energies, delta-log-mel energies, delta-delta-log-mel energies
Kim_QTI_task1a_1 Kim2021b 8 0.793 75.0 16kHz mixup, specaugment, time rolling log-mel energies
Kim_QTI_task1a_2 Kim2021b 1 0.724 76.1 16kHz mixup, specaugment, time rolling log-mel energies
Kim_QTI_task1a_3 Kim2021b 2 0.735 76.1 16kHz mixup, specaugment, time rolling log-mel energies
Kim_QTI_task1a_4 Kim2021b 5 0.764 75.2 16kHz mixup, specaugment, time rolling log-mel energies
Koutini_CPJKU_task1a_1 Koutini2021 14 0.883 70.9 22.05kHz mixup, pitch shifting Perceptually-weighted log-mel energies
Koutini_CPJKU_task1a_2 Koutini2021 10 0.842 71.8 22.05kHz mixup, pitch shifting Perceptually-weighted log-mel energies
Koutini_CPJKU_task1a_3 Koutini2021 9 0.834 72.1 22.05kHz mixup, pitch shifting Perceptually-weighted log-mel energies
Koutini_CPJKU_task1a_4 Koutini2021 11 0.847 71.8 22.05kHz mixup, pitch shifting Perceptually-weighted log-mel energies
Lim_CAU_task1a_1 Lim2021 90 1.956 67.5 44.1kHz spectrogram
Lim_CAU_task1a_2 Lim2021 91 2.010 67.9 44.1kHz spectrogram
Lim_CAU_task1a_3 Lim2021 80 1.479 68.5 44.1kHz spectrogram
Lim_CAU_task1a_4 Lim2021 93 2.039 65.8 44.1kHz spectrogram
Liu_UESTC_task1a_1 Liu2021 16 0.900 68.8 44.1kHz HRTF,mixup,temporal cropping,spectrum correction log-mel energies,deltas,delta-deltas
Liu_UESTC_task1a_2 Liu2021 15 0.895 68.2 44.1kHz HRTF,mixup,temporal cropping,spectrum correction log-mel energies,deltas,delta-deltas
Liu_UESTC_task1a_3 Liu2021 13 0.878 69.6 44.1kHz mixup,temporal cropping log-mel energies,deltas,delta-deltas
Liu_UESTC_task1a_4 Liu2021 87 1.626 42.0 44.1kHz mixup log-mel energies
Madhu_CET_task1a_1 Madhu2021 99 3.950 9.7 44.1kHz time stretching, pitch shifting, dynamic range compression, background noise, mixup wavelet based log-mel energies
DCASE2021 baseline 1.730 45.6 44.1kHz log-mel energies
Naranjo-Alcazar_ITI_task1a_1 Naranjo-Alcazar2021_t1a 51 1.140 60.2 44.1kHz mixup gammatone spectrogram
Pham_AIT_task1a_1 Pham2021 73 1.368 67.5 44.1kHz mixup CQT, Gammatonegram, log-mel energies
Pham_AIT_task1a_2 Pham2021 54 1.187 68.4 44.1kHz mixup CQT, Gammatonegram, log-mel energies
Pham_AIT_task1a_3 Pham2021 94 2.058 69.6 44.1kHz mixup CQT, Gammatonegram, log-mel energies
Phan_UIUC_task1a_1 Phan2021 65 1.272 63.3 44.1kHz mixup log-mel energies, deltas, delta-deltas
Phan_UIUC_task1a_2 Phan2021 71 1.335 63.3 44.1kHz mixup log-mel energies, deltas, delta-deltas
Phan_UIUC_task1a_3 Phan2021 60 1.223 65.3 44.1kHz mixup log-mel energies, deltas, delta-deltas
Phan_UIUC_task1a_4 Phan2021 66 1.292 65.3 44.1kHz mixup log-mel energies, deltas, delta-deltas
Puy_VAI_task1a_1 Puy2021 24 0.952 66.6 44.1kHz SpecAugment log-mel energies
Puy_VAI_task1a_2 Puy2021 27 0.974 65.4 44.1kHz SpecAugment, mixup log-mel energies
Puy_VAI_task1a_3 Puy2021 22 0.939 66.2 44.1kHz SpecAugment log-mel energies
Qiao_NCUT_task1a_1 Qiao2021 88 1.630 52.2 44.1kHz mixup log-mel energies, deltas, delta-deltas
Seo_SGU_task1a_1 Seo2021 32 1.030 70.3 44.1kHz mixup, spectrum augmentation, spectrum correction, pitch shifting, speed change, mix audios log-mel energies, deltas, delta-deltas
Seo_SGU_task1a_2 Seo2021 41 1.080 71.4 44.1kHz mixup, spectrum augmentation, spectrum correction, pitch shifting, speed change, mix audios log-mel energies, deltas, delta-deltas
Seo_SGU_task1a_3 Seo2021 35 1.065 71.3 44.1kHz mixup, spectrum augmentation, spectrum correction, pitch shifting, speed change, mix audios log-mel energies, deltas, delta-deltas
Seo_SGU_task1a_4 Seo2021 44 1.087 71.8 44.1kHz mixup, spectrum augmentation, spectrum correction, pitch shifting, speed change, mix audios log-mel energies, deltas, delta-deltas
Singh_IITMandi_task1a_1 Singh2021 77 1.464 47.2 44.1kHz log-mel energies
Singh_IITMandi_task1a_2 Singh2021 83 1.515 44.7 44.1kHz log-mel energies
Singh_IITMandi_task1a_3 Singh2021 82 1.509 46.1 44.1kHz log-mel energies
Singh_IITMandi_task1a_4 Singh2021 81 1.488 46.8 44.1kHz log-mel energies
Sugahara_RION_task1a_1 Sugahara2021 43 1.087 63.8 44.1kHz mixup, SpecAugment, time-shifting, spectrum modulation log-mel powers
Sugahara_RION_task1a_2 Sugahara2021 36 1.070 65.2 44.1kHz mixup, SpecAugment, time-shifting, spectrum modulation log-mel powers
Sugahara_RION_task1a_3 Sugahara2021 31 1.024 65.3 44.1kHz mixup, SpecAugment, time-shifting, spectrum modulation log-mel powers
Sugahara_RION_task1a_4 Sugahara2021 68 1.297 64.7 44.1kHz mixup, SpecAugment, time-shifting, spectrum modulation log-mel powers
Verbitskiy_DS_task1a_1 Verbitskiy2021 48 1.127 61.4 44.1kHz mixup, temporal cropping, SpecAugment log-mel energies
Verbitskiy_DS_task1a_2 Verbitskiy2021 29 1.019 64.5 44.1kHz mixup, temporal cropping, SpecAugment log-mel energies
Verbitskiy_DS_task1a_3 Verbitskiy2021 26 0.966 67.3 44.1kHz mixup, temporal cropping, SpecAugment log-mel energies
Verbitskiy_DS_task1a_4 Verbitskiy2021 19 0.924 68.1 44.1kHz mixup, temporal cropping, SpecAugment log-mel energies
Yang_GT_task1a_1 Yang2021 6 0.768 73.1 44.1kHz mixup, random cropping, channel confusion, SpecAugment, spectrum correction, reverberation-drc, pitch shifting, speed change, random noise, mix audios log-mel energies
Yang_GT_task1a_2 Yang2021 4 0.764 72.9 44.1kHz mixup, random cropping, channel confusion, SpecAugment, spectrum correction, reverberation-drc, pitch shifting, speed change, random noise, mix audios log-mel energies
Yang_GT_task1a_3 Yang2021 3 0.758 72.9 44.1kHz mixup, random cropping, channel confusion, SpecAugment, spectrum correction, reverberation-drc, pitch shifting, speed change, random noise, mix audios log-mel energies
Yang_GT_task1a_4 Yang2021 7 0.774 72.8 44.1kHz mixup, random cropping, channel confusion, SpecAugment, spectrum correction, reverberation-drc, pitch shifting, speed change, random noise, mix audios log-mel energies
Yihao_speakin_task1a_1 Yihao2021 69 1.311 51.9 16kHz SpecAugment log-mel energies
Yihao_speakin_task1a_2 Yihao2021 59 1.222 55.2 16kHz SpecAugment log-mel energies
Yihao_speakin_task1a_3 Yihao2021 96 2.105 53.5 16kHz SpecAugment log-mel energies
Zhang_BUPT&BYTEDANCE_task1a_1 Zhang2021 47 1.124 63.0 44.1kHz log-mel energies
Zhang_BUPT&BYTEDANCE_task1a_2 Zhang2021 45 1.113 63.2 44.1kHz log-mel energies
Zhang_BUPT&BYTEDANCE_task1a_3 Zhang2021 98 3.359 52.2 44.1kHz log-mel energies
Zhang_BUPT&BYTEDANCE_task1a_4 Zhang2021 89 1.946 59.0 44.1kHz log-mel energies
Zhao_Maxvision_task1a_1 Zhao2021 75 1.440 61.2 44.1kHz mixup, random cropping log-mel energies, deltas, delta-deltas
Zhao_Maxvision_task1a_2 Zhao2021 74 1.412 63.5 44.1kHz mixup, random cropping log-mel energies, deltas, delta-deltas
Zhao_Maxvision_task1a_3 Zhao2021 62 1.227 63.5 44.1kHz mixup, random cropping log-mel energies, deltas, delta-deltas
Zhao_Maxvision_task1a_4 Zhao2021 58 1.215 62.8 44.1kHz mixup, random cropping log-mel energies, deltas, delta-deltas



Machine learning characteristics

Rank Code Technical
Report
Official
system
rank
Logloss
(Eval)
Accuracy
(Eval)
External
data usage
External
data sources
Model
complexity
Classifier Ensemble
subsystems
Decision
making
Byttebier_IDLab_task1a_1 Byttebier2021 21 0.936 68.6 114634 SE-ResNet maximum logit
Byttebier_IDLab_task1a_2 Byttebier2021 18 0.914 67.5 114634 SE-ResNet multinomial logistic regression
Byttebier_IDLab_task1a_3 Byttebier2021 23 0.944 68.5 114634 SE-ResNet ovr logistic regression
Byttebier_IDLab_task1a_4 Byttebier2021 17 0.905 68.8 82910 SE-ResNet maximum logit
Cao_SCUT_task1a_1 Cao2021 49 1.136 66.7 embeddings 36658 CNN
Cao_SCUT_task1a_2 Cao2021 56 1.200 64.6 embeddings 36658 CNN
Cao_SCUT_task1a_3 Cao2021 50 1.137 67.2 embeddings 36658 CNN
Cao_SCUT_task1a_4 Cao2021 53 1.147 66.1 embeddings 51926 CNN
Ding_TJU_task1a_1 Ding2021 85 1.544 53.0 40230 CNN
Ding_TJU_task1a_2 Ding2021 70 1.326 51.1 20250 CNN
Ding_TJU_task1a_3 Ding2021 61 1.226 49.1 63816 CNN majority vote
Ding_TJU_task1a_4 Ding2021 67 1.296 51.4 20250 CNN
Fan_NWPU_task1a_1 Cui2021 64 1.261 68.3 embeddings 93323 ResNet, Attention
Galindo-Meza_ITESO_task1a_1 Galindo-Meza2021 97 2.221 53.9 pre-trained model Audioset 127637 CNN Maximum softmax
Heo_Clova_task1a_1 Hee-Soo2021 42 1.087 67.0 65424 CNN
Heo_Clova_task1a_2 Hee-Soo2021 20 0.930 66.9 63547 CNN
Heo_Clova_task1a_3 Hee-Soo2021 34 1.045 70.0 65424 CNN
Heo_Clova_task1a_4 Hee-Soo2021 12 0.871 70.1 63547 CNN
Horváth_HIT_task1a_1 Horvth2021 86 1.597 51.4 47939 MobileNetV2
Horváth_HIT_task1a_2 Horvth2021 92 2.031 53.3 47939 MobileNetV2, ArcFace
Horváth_HIT_task1a_3 Horvth2021 76 1.460 51.6 58266 ResNet
Horváth_HIT_task1a_4 Horvth2021 95 2.065 49.2 58266 ResNet, ArcFace
Jeng_CHT+NSYSU_task1a_1 Jeng2021 78 1.469 55.0 130457242 CNN logistical regression
Jeng_CHT+NSYSU_task1a_2 Jeng2021 84 1.543 51.3 130457242 CNN logistical regression
Jeng_CHT+NSYSU_task1a_3 Jeng2021 79 1.470 56.3 17186944 CNN logistical regression
Jeong_ETRI_task1a_1 Jeong2021 33 1.041 66.0 54845 ResNet
Jeong_ETRI_task1a_2 Jeong2021 25 0.952 67.0 54845 ResNet
Jeong_ETRI_task1a_3 Jeong2021 30 1.023 66.7 60236 ResNet
Jeong_ETRI_task1a_4 Jeong2021 63 1.228 66.1 60236 ResNet
Kek_NU_task1a_1 Kek2021 72 1.355 66.8 63448 CNN, MobileNetV2
Kek_NU_task1a_2 Kek2021 57 1.207 63.5 64850 CNN, MobileNetV2, Group convolution, Channel attention
Kim_3M_task1a_1 Kim2021 38 1.076 61.5 pre-trained weights of Vggish 168778 CNN
Kim_3M_task1a_2 Kim2021 39 1.077 61.6 pre-trained weights of Vggish 168778 CNN
Kim_3M_task1a_3 Kim2021 37 1.076 62.0 pre-trained weights of Vggish 168778 CNN
Kim_3M_task1a_4 Kim2021 40 1.078 61.3 pre-trained weights of Vggish 168778 CNN
Kim_KNU_task1a_1 Kim2021a 46 1.115 64.7 58472 ResNet
Kim_KNU_task1a_2 Kim2021a 28 1.010 63.8 64064 CNN (Inception)
Kim_KNU_task1a_3 Kim2021a 55 1.188 61.3 58472 ResNet
Kim_KNU_task1a_4 Kim2021a 52 1.143 62.9 58472 ResNet
Kim_QTI_task1a_1 Kim2021b 8 0.793 75.0 630042 CNN, BC-ResNet 2 maximum likelihood
Kim_QTI_task1a_2 Kim2021b 1 0.724 76.1 630042 CNN, BC-ResNet 2 maximum likelihood
Kim_QTI_task1a_3 Kim2021b 2 0.735 76.1 630042 CNN, BC-ResNet 2 maximum likelihood
Kim_QTI_task1a_4 Kim2021b 5 0.764 75.2 314990 CNN, BC-ResNet maximum likelihood
Koutini_CPJKU_task1a_1 Koutini2021 14 0.883 70.9 504104 RF-regularized CNNs
Koutini_CPJKU_task1a_2 Koutini2021 10 0.842 71.8 678184 RF-regularized CNNs
Koutini_CPJKU_task1a_3 Koutini2021 9 0.834 72.1 635176 RF-regularized CNNs
Koutini_CPJKU_task1a_4 Koutini2021 11 0.847 71.8 641320 RF-regularized CNNs
Lim_CAU_task1a_1 Lim2021 90 1.956 67.5 89910 CNN
Lim_CAU_task1a_2 Lim2021 91 2.010 67.9 89910 CNN
Lim_CAU_task1a_3 Lim2021 80 1.479 68.5 134748 CNN
Lim_CAU_task1a_4 Lim2021 93 2.039 65.8 56046 CNN
Liu_UESTC_task1a_1 Liu2021 16 0.900 68.8 643194 ResNet
Liu_UESTC_task1a_2 Liu2021 15 0.895 68.2 268362 ResNet
Liu_UESTC_task1a_3 Liu2021 13 0.878 69.6 268362 ResNet
Liu_UESTC_task1a_4 Liu2021 87 1.626 42.0 60928 CNN
Madhu_CET_task1a_1 Madhu2021 99 3.950 9.7 42774 CNN
DCASE2021 baseline 1.730 45.6 embeddings 46246 CNN
Naranjo-Alcazar_ITI_task1a_1 Naranjo-Alcazar2021_t1a 51 1.140 60.2 50130 CNN
Pham_AIT_task1a_1 Pham2021 73 1.368 67.5 10909 CNN 3 PROD late fusion
Pham_AIT_task1a_2 Pham2021 54 1.187 68.4 10909 CNN 3 PROD late fusion
Pham_AIT_task1a_3 Pham2021 94 2.058 69.6 10909 CNN 3 PROD late fusion
Phan_UIUC_task1a_1 Phan2021 65 1.272 63.3 41356 CNN
Phan_UIUC_task1a_2 Phan2021 71 1.335 63.3 41356 CNN
Phan_UIUC_task1a_3 Phan2021 60 1.223 65.3 41356 CNN
Phan_UIUC_task1a_4 Phan2021 66 1.292 65.3 41356 CNN
Puy_VAI_task1a_1 Puy2021 24 0.952 66.6 62474 CNN 30 average
Puy_VAI_task1a_2 Puy2021 27 0.974 65.4 62474 CNN 30 average
Puy_VAI_task1a_3 Puy2021 22 0.939 66.2 62474 CNN 30 average
Qiao_NCUT_task1a_1 Qiao2021 88 1.630 52.2 31852 ResNet ensemble 2 average
Seo_SGU_task1a_1 Seo2021 32 1.030 70.3 101173 MobileNet
Seo_SGU_task1a_2 Seo2021 41 1.080 71.4 99557 MobileNet
Seo_SGU_task1a_3 Seo2021 35 1.065 71.3 99614 MobileNet
Seo_SGU_task1a_4 Seo2021 44 1.087 71.8 99603 MobileNet
Singh_IITMandi_task1a_1 Singh2021 77 1.464 47.2 embeddings 14754 CNN
Singh_IITMandi_task1a_2 Singh2021 83 1.515 44.7 embeddings 27166 CNN
Singh_IITMandi_task1a_3 Singh2021 82 1.509 46.1 embeddings 38110 CNN
Singh_IITMandi_task1a_4 Singh2021 81 1.488 46.8 embeddings 36578 CNN
Sugahara_RION_task1a_1 Sugahara2021 43 1.087 63.8 339730 ResNet, ensemble 5 weighted score average
Sugahara_RION_task1a_2 Sugahara2021 36 1.070 65.2 339730 ResNet, ensemble 5 score average
Sugahara_RION_task1a_3 Sugahara2021 31 1.024 65.3 203838 ResNet, ensemble 3 score average
Sugahara_RION_task1a_4 Sugahara2021 68 1.297 64.7 255940 ResNet, ensemble 3 weighted score average
Verbitskiy_DS_task1a_1 Verbitskiy2021 48 1.127 61.4 62090 CNN, EfficientNetV2
Verbitskiy_DS_task1a_2 Verbitskiy2021 29 1.019 64.5 62154 CNN, EfficientNetV2
Verbitskiy_DS_task1a_3 Verbitskiy2021 26 0.966 67.3 62282 CNN, EfficientNetV2
Verbitskiy_DS_task1a_4 Verbitskiy2021 19 0.924 68.1 62346 CNN, EfficientNetV2
Yang_GT_task1a_1 Yang2021 6 0.768 73.1 4410180 Inception 5 average
Yang_GT_task1a_2 Yang2021 4 0.764 72.9 14640720 Inception 20 average
Yang_GT_task1a_3 Yang2021 3 0.758 72.9 7056288 Inception 8 average
Yang_GT_task1a_4 Yang2021 7 0.774 72.8 7056288 Inception 8 average
Yihao_speakin_task1a_1 Yihao2021 69 1.311 51.9 48075 CNN
Yihao_speakin_task1a_2 Yihao2021 59 1.222 55.2 63244 CNN
Yihao_speakin_task1a_3 Yihao2021 96 2.105 53.5 50952 CNN
Zhang_BUPT&BYTEDANCE_task1a_1 Zhang2021 47 1.124 63.0 83572 ResNet
Zhang_BUPT&BYTEDANCE_task1a_2 Zhang2021 45 1.113 63.2 83572 ResNet
Zhang_BUPT&BYTEDANCE_task1a_3 Zhang2021 98 3.359 52.2 87011 ResNet
Zhang_BUPT&BYTEDANCE_task1a_4 Zhang2021 89 1.946 59.0 86516 ResNet
Zhao_Maxvision_task1a_1 Zhao2021 75 1.440 61.2 59421 MobileNet 2 model weights average
Zhao_Maxvision_task1a_2 Zhao2021 74 1.412 63.5 59421 MobileNet 2 model weights average
Zhao_Maxvision_task1a_3 Zhao2021 62 1.227 63.5 59421 MobileNet 2 model weights average
Zhao_Maxvision_task1a_4 Zhao2021 58 1.215 62.8 59421 MobileNet 2 model weights average

Technical reports

Small-Footprint Acoustic Scene Classification Through 8-Bit Quantization-Aware Training and Pruning of ResNet Models

Laurens Byttebier, Brecht Desplanques, Jenthe Thienpondt, Siyuan Song, Kris Demuynck and Nilesh Madhu
ELIS, Ghent University - imec, Ghent, Belgium

Abstract

This report describes the IDLab submissions for Task 1a of the DCASE Challenge 2021. The challenge consists of constructing an acoustic scene classification model with a size of less than 128 KB. All submitted systems consist of a ResNet based model enhanced with Squeeze-and-Excitation (SE) blocks trained with temporal cropping, time domain mixup and speed-change augmentation strategies. Grouped convolutions are incorporated in all models to reduce the model complexity. Three submissions are based on 8-bit quantization-aware training with a fusion of batch norm and convolutional layers to reduce the parameter count even further. Further, two of these three systems explore multi-class score calibration by means of multinomial or one-vs-rest logistic regression. The calibration is then fused with the final linear output layer of the network to avoid an increase in model size. The fourth submission explores parameter pruning on a model with 16-bit weights as an alternative to the 8-bit weight quantization. The uncalibrated 8-bit model out- performs the pruned 16-bit model slightly and achieves a log loss of 0.82 and an accuracy of 71.2% on the standard test set of the TAU Urban Acoustic Scenes 2020 Mobile development dataset.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup, temporal cropping, speed augmentation
Features log-mel energies
Classifier SE-ResNet
Decision making maximum logit; multinomial logistic regression; ovr logistic regression
Complexity management weight quantization, grouped convolutions, Conv+BN fusion; weight quantization, grouped convolutions, pruning
PDF

Acoustic Scene Classification Using Lightweight ResNet with Attention

Wenchang Cao, Yanxiong Li and Qisheng Huang
School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China

Abstract

This technical report describes our system for the subtask A (Low-Complexity Acoustic Scene Classification with Multiple Devices) of Task1 (Acoustic Scene Classification) of the DCASE2021 Challenge. Due to the limited space-complexity of the model, we choose ResNet with depthwise separable convolution as our backbone network, and introduce the attention mechanism to the network. In addition, some data augmentation techniques, such as Mixup, Spectrum correction, are adopted for expanding the diversities of dataset. Our system achieves the accuracy rate of 71.6% on the development dataset, and the model size meets the requirement of subtask A.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup, time stretching,pitch shifting,spectrum correction
Features log-mel energies
Classifier CNN
Complexity management weight quantization
PDF

Consistency Learning Based Acoustic Scene Classification with Res-Attention

MengFan Cui, Fan Kui and Liyong Guo
Northwestern Polytechnic University, China

Abstract

In this report, we propose a consistency learning based method with different data augmentation methods to tackle Acoustic Scene Classification task1a in the DCASE2021 Challenge. Classification of data from multiple devices (real and simulated) targeting generalization properties of systems across a number of different devices and focusing on low- complexity solutions. Consistency learning is used to reduce the embedding distance of the augmented sample and the original sample. With the consistency learning, the algorithm is robust with device variances. For low-complexity and high-accuracy, a Res-Attention structure which combines residual structure with separable convolution layer and attention layer is proposed. On Task1a development dataset, the presented method gets 69.71% accuracy (0.87 log CrossEntropy loss) with the model size 93.3KB by using int8 quantization.

System characteristics
Sampling rate 44.1kHz
Data augmentation reverb, filtering, random gain adjust, SpecAugment
Features log-mel energies
Classifier ResNet, Attention
Complexity management weight quantization
PDF

Low-Complexity Acoustic Scene Classification Using Simple CNN

Biyun Ding
School of Electrical and Information Engineering, Tianjin University, Tianjin, China

Abstract

This technical report describes our Acoustic Scene Classification systems for DCASE2021 challenge Task1A: Low-Complexity Acoustic Scene Classification with Multiple Devices. In this work, many factors affect the performance. To improve the performance while ensure the model complexity, we attempt different methods in term of features, sampling rate, channel, classifier type, the network architecture of CNN, and the post- processing of predictions. According to the experiments on TAU urban acoustic scenes 2020 mobile development dataset, the best accuracy of single system we implemented is 55.89%, which is an improvement of 7% compared to Baseline CNN. Besides, the accuracy of the late fusion is 59.80%, which is an improvement of 11.35% compared to Baseline CNN.

System characteristics
Sampling rate 44.1kHz
Features log-mel energies
Classifier CNN
Decision making majority vote
Complexity management weight quantization
PDF

End-To-End CNN Optimization for Low-Complexity Acoustic Scene Classification in the DCASE 2021 Challenge

Carlos Alberto Galindo-Meza1, Juan Antonio Del Hoyo Ontiveros2, Jose Torres Ortega3 and Paulo Lopez-Meyer2
1Departamento de Electronica, Sistemas e Informatica, Instituto Tecnologico de Estudios Superiores de Occidente, Jalisco, Mexico, 2Intel Labs, Intel Corporation, Jalisco, Mexico, 3Intel Labs, Intel Corporation, California, USA

Abstract

For the DCASE 2021 challenge we implemented an optimization pipeline to comply with the low-complexity restrictions specified with the Task 1a constraints. Initially, we trained and validated an end-to-end convolutional neural networks-based audio classification model following a typical deep learning training strategy. We then applied an efficient pruning procedure based on the lottery ticket hypothesis, and finally we executed a training-aware quantization to convert the model’s weights from FP32 to INT8 format. Experimentation proved the feasibility of this approach by obtaining accuracy results above the baseline models reported in the challenge guidelines.

System characteristics
Sampling rate 16kHz
Data augmentation random noise, random gain, random cropping, mixup
Features raw waveform
Embeddings AemNet
Classifier CNN
Decision making Maximum softmax
Complexity management pruning, int8 weight quantization
PDF

Clova Submission for the DCASE 2021 Challenge: Acoustic Scene Classification Using Light Architectures and Device Augmentation

Heo Hee-Soo1, Jung Jee-weon1, Shim Hye-jin2 and Lee Bong-Jin1
1Naver Corporation, Seongnam, South Korea, 2Computer Science, University of Seoul, Seoul, South Korea

Abstract

This technical report addresses the submitted system of Naver Clova for the DCASE 2021 challenge task 1-a. The aim is to develop an acoustic scene classification system that can generalize towards unknown devices using a DNN with a limited number of parameters. We propose two lightweight architectures using residual networks, a method referred to as attentive max feature map, and multitask learning. After the initial training, the model is further fine-tuned using knowledge distillation. Two augmentation methods are also explored to simulate various recording devices. The proposed two architectures have 63,547 and 65,424 non-zeros parameters with a 16-bit resolution, both less than 128KB. Following the official protocol of train and test set split from the TAU Urban Acoustic Scenes 2020 Mobile development dataset, each model achieves 70.48% and 69.68% accuracy respectively.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup, spectrum augmentation, device augmentation; mixup, tempo, channel corruption
Features log-mel energies
Classifier CNN
Complexity management weight quantization
PDF

Using Arcface Metric Learning for Low-Complexity Acoustic Scene Classification

Kristóf Horváth, Harsh Purohit, Yohei Kawaguchi, Ryo Tanabe, Kota Dohi, Takashi Endo, Masaaki Yamamoto and Tomoya Nishida
Hitachi Ltd., Tokyo, Japan

Abstract

In this technical report we present our submissions for DCASE 2021 Challenge Task 1A. For the low-complexity model, we used both a MobileNetV2-based model and a ResNet-based model with reduced number of layers and trained it using ArcFace metric learning. To increase the accuracy, we used test-time augmentation (TTA) during inference. On the development dataset, our models attain an ASC accuracy of around 54–55%, while having less than 128 kB of total parameters.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup, time stretching, pitch shifting, random noise, spectrum augmentation, random temporal shuffle, volume change
Features log-mel energies, HPSS
Classifier MobileNetV2; MobileNetV2, ArcFace; ResNet; ResNet, ArcFace
Complexity management weight quantization
PDF

Diverse Sparsity System Using Convolution Neural Network

Hui Hsin Jeng1, Chia-Ping Chen1, Chung Li Lu2 and Bo-Cheng Chan2
1Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan, 2Chunghwa Telecom, Taoyuan, Taiwan

Abstract

In this technical report, we present our works on pruning convolution neural networks and using the quantization method to reduce parameters. DCASE2021 subtask 1A limit classifier size smaller than DCASE2020 subtask 1B with only 128 KB. Therefore we propose three pruning and quantization methods on Convolution Neural Networks. To prune the bigger network ( FCNN ) with single sparsity or diverse sparsity and quantization method. Another proposed method is simply pruning a smaller network ( MobNet ) with single sparsity and quantization method. Our best system performs 1.428 on validation log loss.

System characteristics
Sampling rate 44.1kHz
Features log-mel energies
Classifier CNN
Decision making logistical regression
Complexity management sparsity, weight quantization
PDF

Trident Resnets with Low Complexity for Acoustic Scene Classification

Youngho Jeong, Sooyoung Park and Taejin Lee
Media Coding Research Section, Electronics and Telecommunications Research Institute, Daejeon, Republic of Korea

Abstract

This technical report describes our Acoustic Scene Classification systems for DCASE2021 challenge Task1 subtask A. We designed two Trident ResNets with three parallel path, which is targeted to low complexity. The trident structure with respect to the frequency domain is beneficial when analyzing samples collected from minority or unseen devices. To satisfy the model size requirement, we replaced a standard convolution with a depthwise separable convolution and applied weight quantization to the trained model. As a result of performance evaluation, Trident ResNet B trained by applying data augmentation showed a log loss of 0.968 and a classification accuracy of 65.8% for the test split.

System characteristics
Sampling rate 44.1kHz
Data augmentation temporal cropping; temporal cropping, SpecAugment
Features log-mel energies, deltas, delta-deltas
Classifier ResNet
Complexity management weight quantization, depthwise separable convolutions
PDF

Technical Paper: Deep Scattering Spectrum with Mobile Network for Low Complexity Acoustic Scene Classification

Xing Yong Kek1, Cheng Siong Chin1 and Li Ye2
1Faculty if Science, Agriculture & Engineering, Newcastle University, Singapore, 2Xylem Inc, Singapore

Abstract

We present a technical paper that provide details of our classification model submitted to DCASE 2021 Task1a challenge. In this paper, we proposed the use of DSS with mobile network to tackle low complexity computation.

System characteristics
Sampling rate 44.1kHz
Features Wavelet Scattering
Classifier CNN, MobileNetV2; CNN, MobileNetV2, Group convolution, Channel attention
Complexity management weight quantization
PDF

Building Light-Weight Convolutional Neural Networks for Acoustic Scene Classification Using Audio Embeddings

Bongjun Kim
3M, Saint Paul, United States

Abstract

This technical report describes acoustic scene classification mod- els from our submissions for DCASE challenge 2021-task1A. The task is to build a system to perform classification on acoustic scene data. The dataset has 10 acoustic scene labels. Our submissions are Convolutional Neural Network (CNN)-based models which consist of 3 convolutional layers and 1 fully-connected layer. We utilize a small subset of deep audio embedding that has been pre-trained on a large scale of a dataset. We also perform quantization and pruning to reduce the complexity of models to meet the size limit of 128KB for the challenge. We compare the performance of our models with the baseline approach on the provided test dataset. The results show that our models outperform the baseline system.

System characteristics
Sampling rate 22.05kHz
Data augmentation mixup, SpecAugment
Features Perceptually-weighted log-mel energies
Embeddings VGGish
Classifier CNN
Complexity management weight quantization, pruning
PDF

Acoustic Scene Classification with Decomposed Convolution Neural Networks

Minhan Kim1, SeungHyeon Shin1, Seungjae Baek1, Seokjin Lee2, Sooyoung Park3 and Youngho Jeong3
1School of Electronic and Electrical Engineering, Kyungpook National University, Daegu, Republic of Korea, 2School of Electronics Engineering, School of Electronic and Electrical Engineering, Kyungpook National University, Daegu, Republic of Korea, 3Electronics and Telecommunications Research Institute, Daejeon, Republic of Korea

Abstract

This report describes a model submitted to DCASE2021 Task 1 sub- task A. Our model is developed by applying canonical polyadic decomposition to the conventional convolutional-neural-network- based models to reduce the model size to achieve the goal of Task 1A. More specifically, we apply the decomposition method to dual ResNet, which divides the features into two parts along the frequency axis and processes them independently, and shallow inception model. In order to evaluate our model, a simulation for acoustic scene classification was performed with the development dataset of DCASE 2021 Task 1A, and our model showed about log loss of 1.03-1.06 and macro accuracy of 62%-66% far better than that of the baseline model. Also, the model size of our system is smaller than 128 kbytes, which is the limit of the DCASE2021 Task 1A.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup
Features log-mel energies, delta-log-mel energies, delta-delta-log-mel energies
Classifier ResNet; CNN (Inception)
Complexity management CP-decomposition, weight quantization; parameter sharing, weight quantization
PDF

QTI Submission to DCASE 2021: Residual Normalization for Device-Imbalanced Acoustic Scene Classification with Efficient Design

Byeonggeun Kim, Seunghan Yang, Jangho Kim and Simyung Chang
Qualcomm AI Research, Qualcomm Korea YH, Seoul, Korea

Abstract

This technical report describes the details of our TASK1A submission of the DCASE2021 challenge. The goal of the task is to design an audio scene classification system for device-imbalanced datasets under the constraints of model complexity. This report introduces four methods to achieve the goal. First, we propose Residual Normalization, a novel feature normalization method that uses instance normalization with a shortcut path to discard unnecessary device- specific information without losing useful information for classification. Second, we design an efficient architecture, BC-ResNet- Mod, a modified version of the baseline architecture with a limited receptive field. Third, we exploit spectrogram-to-spectrogram translation from one to multiple devices to augment training data. Finally, we utilize three model compression schemes: pruning, quantization, and knowledge distillation to reduce model complexity. The proposed system achieves an average test accuracy of 76.3% in TAU Urban Acoustic Scenes 2020 Mobile, development dataset with 315k parameters, and average test accuracy of 75.3% after compression to 62kB of non-zero parameters.

System characteristics
Sampling rate 16kHz
Data augmentation mixup, specaugment, time rolling
Features log-mel energies
Classifier CNN, BC-ResNet
Decision making maximum likelihood
Complexity management weight quantization, pruning, knowledge distillation
PDF

Cpjku Submission to Dcase21: Cross-Device Audio Scene Classification with Wide Sparse Frequency-Damped CNNs

Khaled Koutini1, Schlüter Jan2 and Gerhard Widmer2
1Computational Perception (CP), Johannes Kepler University (JKU) Linz, Linz, Austria, 2Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria

Abstract

We describe the CP-JKU team's submission for Task 1A Low- Complexity Acoustic Scene Classification with Multiple Devices of the DCASE2021 Challenge. We use Receptive Field (RF) regularized Convolutional Neural Network (CNN) with Frequency Damping as a baseline. We investigate widening the convolutional layers without increasing the number of parameters by grouping and pruning. We apply iterative magnitude pruning to sparsify the weights of the models. Additionally, We investigate an adversarial domain adaptation approach.

System characteristics
Sampling rate 22.05kHz
Data augmentation mixup, pitch shifting
Features Perceptually-weighted log-mel energies
Classifier RF-regularized CNNs
Complexity management float16, sparsity
PDF

CAU-ET Submission to DCASE 2021: Light-Efficientnet for Acoustic Scene Classification

Soyoung Lim1, Yerin Lee1 and Il-Youp Kwak2
1Statistics Dept., Chung-Ang University, Seoul, South Korea, 2Department of Applied Statistics, Chung-Ang University, Seoul, South Korea

Abstract

Acoustic scene classification (ASC) categorizes an audio file based on the environment in which it has been recorded. This has long been studied in the detection and classification of acoustic scenes and events (DCASE). We presents the solution to Task 1 A (Low- Complexity Acoustic Scene Classification with Multiple Devices) of the DCASE 2021 challenge submitted by the Chung-Ang University team. We proposed light-efficientnet model with 3 scaling factors: width, depth, resolution. Additionally, we used lightweight deep learning techniques such as pruning and quantization.

System characteristics
Sampling rate 44.1kHz
Features spectrogram
Classifier CNN
Complexity management weight quantization, sparsity
PDF

DCASE 2021 Task 1 Subtask A: Low-Complexity Acoustic Scene Classification

Yingzi Liu1, Jiangnan Liang1, Luojun Zhao2, Jia Liu2, Kexin Zhao2, Weiyu Liu2, Long Zhang2, Tanyue Xu2 and Chuang Shi1
1School of imformation and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China, 2University of Electronic Science and Technology of China, Chengdu,China

Abstract

This technical report describes the systems for the task 1/subtask A of the DCASE 2021 challenge. In order to reduce the number of model parameters, we add the feature reuse units to the deep residual network. Also the one-bit-per-weight convolution layer are used in this paper. The log-mel spectrograms, delta features and delta-delta features are extracted to train the acoustic scene classification model. The HRTF and spectrum correction are used to augment the acoustic features. Our system achieves higher classification accuracies and lower log loss in the development dataset than baseline system.

System characteristics
Sampling rate 44.1kHz
Data augmentation HRTF,mixup,temporal cropping,spectrum correction; mixup,temporal cropping; mixup
Features log-mel energies,deltas,delta-deltas; log-mel energies
Classifier ResNet; CNN
Complexity management 1-bit quantization,FR_unit; 1-bit quantization; weight quantization
PDF

Wavelet Based Mel Scaled Representation for Low Complexity ASC with Multiple Devices

Aswathy Madhu1 and Suresh K2
1Electronics & Communication, College of Engineering Trivandrum, Thiruvananthapuram, Kerala, India, 2Electronics & Communication, Govt. Engineering College, Barton Hill, Thiruvananthapuram, Kerala, India

Abstract

This technical report presents our submission to the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 for Task1 (Acoustic Scene Classification), subtask A (Low-Complexity Acoustic Scene Classification with Multiple Devices). The proposed system is a simple state-of-the- art approach employing wavelet based mel scaled representation for acoustic signals and a CNN classifier. We use data augmentation to handle device mismatch and post training quantization of network weights to enforce low complexity in terms of model size. The submitted system surpasses the baseline system utilizing CNN developed for this subtask.

System characteristics
Sampling rate 44.1kHz
Data augmentation time stretching, pitch shifting, dynamic range compression, background noise, mixup
Features wavelet based log-mel energies
Classifier CNN
Complexity management weight quantization
PDF

Task 1A DCASE 2021: Acoustic Scene Classification with Mismatch-Devices Using Squeeze-Excitation Technique and Low-Complexity Constraint

Javier Naranjo-Alcazar1,2, Sergi Perez-Castanos1, Maximo Cobos1, Francesc J. Ferri1 and Pedro Zuccarello2
1Computer Science, Universitat de Valencia, Burjassot, Spain, 2Intituto Tecnológico de Informática, Valencia, Spain

Abstract

Acoustic scene classification (ASC) is one of the most popular problems in the field of machine listening. The objective of this problem is to classify an audio clip into one of the predefined scenes using only the audio data. This problem has considerably progressed over the years in the different editions of DCASE. It usually has several subtasks that allow to tackle this problem with different approaches. The subtask presented in this report corresponds to a ASC problem that is constrained by the complexity of the model as well as having audio recorded from different devices, known as mismatch devices (real and simulated). The work presented in this report follows the research line carried out by the team in previous years. Specifically, a system based on two steps is proposed: a two-dimensional representation of the audio using the Gamamtone filter bank and a convolutional neural network using squeeze-excitation techniques. The presented system outperforms the baseline by about 17 percentage points.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup
Features gammatone spectrogram
Classifier CNN
Complexity management weight quantization, tflite, float16
PDF

DCASE 2021 Task 1A: Technique Report

Lam Pham1, Alexander Schindler1, Hieu Tang2 and Truong Hoang3
1Center for Digital Safety & Security, Austrian Institute of Technology, Vienna, Austria, 2Department of Electronic and Electrical Engineering, Hongik University, Korea, 3FPT company, Ho Chi Minh, Vietnam

Abstract

In this report, we presents a low-complexity deep learning frameworks for acoustic scene classification (ASC). The proposed framework can be separated into three main steps: Front-end spectrogram extraction, back-end classification, and late fusion of predicted probabilities. In the first step, we use Mel filter, Gammatone filter and Constant Q Transform (CQT) to transform draw audio signal into spectrograms. Three spectrograms are then feed into three individual back- end convolutional neural networks (CNNs) for classification. Finally, a late fusion of three predicted probabilities obtained from three CNNs is conducted to achieve the final classification result. To reduce the complexity of CNN network architecture proposed, we apply two model compression techniques: model restriction and decomposed convolution. Our experiments, which are conducted on DCASE 2021 Task 1A development dataset, achieve a low-complexity CNN based framework with 128 KB trainable parameters and the best classification accuracy of 66.7%, improving DCASE baseline by 19.0%.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup
Features CQT, Gammatonegram, log-mel energies
Classifier CNN
Decision making PROD late fusion
Complexity management channel restriction and decomposed convolution
PDF

DCASE 2021 Task 1 Subtask A: Low-Complexity Acoustic Scene Classification

Duc Phan and Douglas Jones
ECE, University of Illinois, Urban-Champaign, Illinois, US

Abstract

Decomposing 2D convolution into time and frequency separable 1D convolutions produces a low-complexity neural network with good performance for acoustic scene classification. The final proposed network has roughly 41K parameters with a size of 75KB. It significantly outperforms the DCASE 2021 baseline network [1], with an accuracy of 64 percent on the development dataset[2].

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup
Features log-mel energies, deltas, delta-deltas
Classifier CNN
Complexity management weight quantization, depthwise separable convolutions
PDF

Separable Convolutions and Test-Time Augmentations for Low-Complexity and Calibrated Acoustic Scene Classification

Gilles Puy, Himalaya Jain and Andrei Bursuc
valeo.ai, Paris, France

Abstract

This report details the architecture we used to address Task 1a of the of DCASE2021 challenge. Our architecture is based on 4 layer convolutional neural network taking as input a log-mel spectrogram. The complexity of this network is controlled by using separable convolutions in the channel, time and frequency dimensions. We train different models to investigate the benefit of mixup, focal loss and test time augmentations in improving the performance of the system.

System characteristics
Sampling rate 44.1kHz
Data augmentation SpecAugment; SpecAugment, mixup
Features log-mel energies
Classifier CNN
Decision making average
Complexity management weight quantization
PDF

Acoustic Scene Classification Model Based on Two Parallel Residual Networks

Ziling Qiao, Hongxia Dong, Xichang Cai and Menglong Wu
Electronic and Communication Engineering, North China University of Technology, Beijing, China

Abstract

This technical report describes our submission for task1a of dcase2021 challenge. We calculated 128 log-mel energies under the original sampling rate of 44.1KHz for each time slice by taking 2048 FFT points with 50% overlap. Additionally, deltas and delta- deltas were calculated from the log Mel spectrogram and stacked into the channel axis. The resulting spectrograms were of size 128 frequency bins, 423 time samples and 3 channels with each representing log-mel spectrograms, its delta features and its delta-delta features respectively. Then, the three channel feature map is divided into 0-64 and 64-128 Mel bins on the frequency axis, and the high and low frequency features are input into the two parallel residual networks with identical residual blocks and convolutional residual blocks for training, and then the two network models are concatenate on the channel axis. Finally, after 1 ×1 convolution and global average pooling, the classification results are obtained through softmax output.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup
Features log-mel energies, deltas, delta-deltas
Classifier ResNet ensemble
Decision making average
Complexity management weight quantization
PDF

Mobilenet Using Coordinate Attention and Fusions for Low-Complexity Acoustic Scene Classification with Multiple Devices

Soonshin Seo and Ji-Hwan Kim
Dept. of Computer Science and Engineering, Sogang University, Seoul, Repulic of Korea

Abstract

In this technical report, we describe our acoustic scene classification methods submitted to detection and classification of acoustic scenes and events challenge 2021 task 1a. We extracted the log- Mel filter bank features with delta and delta-delta from the acoustic signals and applied normalization. A total of 6 data augmentations were applied as follows: mixup, spectrum augmentation, spectrum correction, pitch shift, speed change, and mix audios. In addition, we designed MobileNet using coordinate attention and fusions. Inspired by MobileNetV2, inverted residuals and linear bottlenecks are adapted for mobile blocks of the proposed MobileNet. We applied coordinate attention and early/late fusion methods after mobile blocks. In addition, we reduced the model size by applying weight quantization to the trained model. Experiments were conducted on the cross-validation setup of the official development set. We confirmed that our model achieved a log- loss of 1.040 and an accuracy of 72.6% within the 128 KB model size.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup, spectrum augmentation, spectrum correction, pitch shifting, speed change, mix audios
Features log-mel energies, deltas, delta-deltas
Classifier MobileNet
Complexity management weight quantization
PDF

Pruning and Quantization for Low-Complexity Acoustic Scene Classification

Arshdeep Singh1, Dhanunjaya Varma Devalraju2 and Padmanabhan Rajan2
1SCEE, Indian institute of technology, Mandi, Mandi, India, 2School of Computing and Electrical engineering, Indian institute of technology, Mandi, Mandi, India

Abstract

This technical report describes the IITMandi AudioTeam’s submission for DCASE 2021 ASC Task 1, Subtask Low-Complexity Acoustic Scene Classification with Multiple Devices. This report aims to design low-complexity systems for acoustic scene classification by eliminating filters in a pre-trained convolution neural network. A filter pruning strategy is opted, which consists of three steps. Step 1 aims to identify the redundant filters which have low- norm. Step 2 explicitly removes the redundant filters and their connecting feature maps from the unpruned network to give a pruned network. Step 3 involves fine-tuning of the pruned network to regain performance. Further, the trained parameters are quantized to 16- bit. On DCASE-2021 task 1A development dataset, the proposed framework reduces 68% parameters with competitive performance

System characteristics
Sampling rate 44.1kHz
Features log-mel energies
Classifier CNN
Complexity management Filter pruning and quantization
PDF

Ensemble of Simple Resnets with Various Mel-Spectrum Time-Frequency Resolutions for Acoustic Scene Classifications

Reiko Sugahara, Masatoshi Osawa and Ryo Sato
RION CO., LTD., Tokyo, Japan

Abstract

This technical report describes procedure for Task 1A in DCASE 2021[1][2]. Our method adopts ResNet-based models with a mel spectrogram as input. The accuracy was improved by the ensemble of ResNet-based simple models with various mel-spectrum time- frequency resolution. Data augmentations such as mixup, SpecAugment, time-shifting, and spectrum modulate were applied to prevent overfitting. The size of the model was reduced by quantization and pruning. Accordingly, the accuracy of our system was achieved 70.1% with 95 KB for the development set.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup, SpecAugment, time-shifting, spectrum modulation
Features log-mel powers
Classifier ResNet, ensemble
Decision making weighted score average; score average
Complexity management weight quantization, pruning
PDF

Low-Complexity Acoustic Scene Classification Using Mobile Inverted Bottleneck Blocks

Sergey Verbitskiy and Viacheslav Vyshegorodtsev
Deepsound, Novosibirsk, Russia

Abstract

This technical report describes our approaches for Task 1A (Low- Complexity Acoustic Scene Classification with Multiple Devices) of the DCASE 2021 Challenge. We propose a new architecture with mobile inverted bottleneck blocks (Fused-MBConv and MBConv) for acoustic scene classification tasks. This architecture is based on EfficientNetV2. Our models have a very small number of parameters. We also use several data augmentation techniques during the training of models. Our best model has 62,346 non-zero parameters and achieves a classification macro-average accuracy of 70.5% and an average multiclass cross-entropy (log loss) of 0.848 on the development dataset. The resulting model size is 121.8 KB (the model parameters are quantized to float16 after the training).

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup, temporal cropping, SpecAugment
Features log-mel energies
Classifier CNN, EfficientNetV2
Complexity management weight quantization
PDF

A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification

Chao-Han Huck Yang1, Hu Hu1, Sabato Marco Siniscalchi2, Qing Wang3, Wang Yuyang3, Xianjun Xia4, Yuanjun Zhao4, Yuzhong Wu4, Yannan Wang4, Jun Du3 and Chin-Hui Lee1
1School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA, 2Kore University of Enna, Italy, 3University of Science and Technology of China, HeFei, China, 4Tencent Media Lab, Shenzhen, China

Abstract

We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC). Specifically, we tackle the ASC task in a low-resource environment leveraging a recently proposed advanced neural network pruning mechanism, namely Lottery Ticket Hypothesis (LTH), to find a sub-network neural model associated with a small amount non-zero model parameters. The effectiveness of LTH for low-complexity acoustic modeling is assessed by investigating various data augmentation and compression schemes, and we report an efficient joint framework for low-complexity multi-device ASC, called Acoustic Lottery. Acoustic Lottery could compress an ASC model up to 1/104 and attain a superior performance (validation accuracy of 74.01% and Log loss of 0.76) compared to its not compressed seed model. All results reported in this work are based on a joint effort of four groups, namely GT-USTC-UKE-Tencent, aiming to address the 'Low-Complexity Acoustic Scene Classification (ASC) with Multiple Devices' in the DCASE 2021 Challenge Task 1a.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup, random cropping, channel confusion, SpecAugment, spectrum correction, reverberation-drc, pitch shifting, speed change, random noise, mix audios
Features log-mel energies
Classifier Inception, ensemble
Decision making average
Complexity management weight quantization, LTH pruning, teacher-student learning
PDF

Low-Complexity Acoustic Scene Classification with Multiple Devices

Chen Yihao, Liu Min and Xu Minqiang
SpeakIn Technology, Shanghai, China

Abstract

This report describes our submission to the Task1 Acoustic Scene Classification in the Dcase 2021 challenge. Final submission includes 4 results based on ResNet and SEResNet architectures. We perform several analysis of different backbones and also do experiments to confirm whether the pooling layer is needed. Due to the lack of training data, we try a variety of data enhancement methods including specaugment[1], cutout[2], audio acceleration and deceleration. To meet the requirement of model size, we also do pruning to the models.

System characteristics
Sampling rate 16kHz
Data augmentation SpecAugment
Features log-mel energies
Classifier CNN
Complexity management sparsity
PDF

DCASE 2021 Challenge Task1a Technical Report

Jiawang Zhang1, Shengchen Li2 and Bilei Zhu3
1AI-Lab Speech & Audio Team, Beijing University of Posts and Telecommunications & ByteDance, Shanghai, China, 2Xi’an Jiaotong-liverpool University, Suzhou, China, 3AI-Lab Speech & Audio Team, ByteDance, Shanghai, China

Abstract

This report describes our method for Task 1a (Low-Complexity Acoustic Scene Classification with Multiple Devices) of the DCASE 2021 challenge. The task targets low complexity solutions for the classification problem. This report uses Residual Network (ResNet) model and uses Log Mel Spectrogram to process features. To compress system complexity, this report uses Post Training Static Quantization. Post Training Static Quantization are used to do the 8-bits quantization, this method can reduce the model size by four times. The accuracy of the method proposed in this report on the development data set is 73%, which is 25% higher than the baseline.

System characteristics
Sampling rate 44.1kHz
Features log-mel energies
Classifier ResNet
Complexity management weight quantization
PDF

Low-Complexity Acoustic Scene Classification Using Knowledge Distillation and Multiple Classifiers

Na Zhao
Algorithm, Maxvision, Wuhan, China

Abstract

This technical report describes our submission for Task1a of DCASE2021 challenge. Based on the small-size Mobnet[1] of Tencent team in Dcase2020 task1b, we build our baseline model with only one frequency branch and two classifiers. The two classifiers are ten-class classifier and three-class classifier respectively, and they jointly optimize the baseline model. Due to the limitation of model size, we first train a high-accuracy large- size model, and then use distillation method to transfer the knowledge from the large-size model to our baseline model. The final system is quantified from 32-bit float-point to 16-bit float- point.We achieved an accuracy of 59.9% with a model size smaller than 128KB.

System characteristics
Sampling rate 44.1kHz
Data augmentation mixup, random cropping
Features log-mel energies, deltas, delta-deltas
Classifier MobileNet
Decision making model weights average
Complexity management weight quantization
PDF