Task description
This subtask is concerned with the basic problem of acoustic scene classification, in which it is required to classify a test audio recording into one of ten known acoustic scene classes. This task targets generalization across a number of different devices, and will use audio data recorded and simulated with a variety of devices. The task also targets low complexity solutions for the classification problem in terms of model size.
The development dataset consists of recordings from 10 European cities using 9 different devices: 3 real devices (A, B, C) and 6 simulated devices (S1-S6). Data from devices B, C, and S1-S6 consists of randomly selected segments from the simultaneous recordings, therefore all overlap with the data from device A, but not necessarily with each other. The total amount of audio in the development set is 64 hours.
The evaluation dataset contains data from 12 cities, 10 acoustic scenes, 11 devices. There are five new devices (not available in the development set): real device D and simulated devices S7-S11. Evaluation data contains 22 hours of audio.
The device A consists in a Soundman OKM II Klassik/studio A3, electret binaural microphone and a Zoom F8 audio recorder using 48kHz sampling rate and 24-bit resolution. The other devices are commonly available customer devices: device B is a Samsung Galaxy S7, device C is iPhone SE, and device D is a GoPro Hero5 Session.
More detailed task description can be found in the task description page
Systems ranking
Submission information | Evaluation dataset | Development dataset | ||||||
---|---|---|---|---|---|---|---|---|
Rank | Submission label | Name |
Technical Report |
Official system rank |
Logloss (Evaluation dataset) |
Accuracy with 95% confidence interval (Evaluation dataset) |
Logloss (Development dataset) | Accuracy (Development dataset) |
Byttebier_IDLab_task1a_1 | qat_8b | Byttebier2021 | 21 | 0.936 | 68.6 (67.6 - 69.6) | 0.820 | 71.2 | |
Byttebier_IDLab_task1a_2 | 8b_calibm | Byttebier2021 | 18 | 0.914 | 67.5 (66.5 - 68.6) | 0.820 | 71.2 | |
Byttebier_IDLab_task1a_3 | 8b_calibo | Byttebier2021 | 23 | 0.944 | 68.5 (67.5 - 69.6) | 0.820 | 71.2 | |
Byttebier_IDLab_task1a_4 | 16b_prune | Byttebier2021 | 17 | 0.905 | 68.8 (67.8 - 69.8) | 0.840 | 70.2 | |
Cao_SCUT_task1a_1 | sys_1 | Cao2021 | 49 | 1.136 | 66.7 (65.7 - 67.7) | 1.038 | 71.6 | |
Cao_SCUT_task1a_2 | sys_2 | Cao2021 | 56 | 1.200 | 64.6 (63.5 - 65.6) | 1.108 | 69.6 | |
Cao_SCUT_task1a_3 | sys_3 | Cao2021 | 50 | 1.137 | 67.2 (66.1 - 68.2) | 1.058 | 71.7 | |
Cao_SCUT_task1a_4 | sys_4 | Cao2021 | 53 | 1.147 | 66.1 (65.1 - 67.1) | 1.047 | 72.4 | |
Ding_TJU_task1a_1 | Ding_TJU | Ding2021 | 85 | 1.544 | 53.0 (51.9 - 54.1) | 1.360 | 55.5 | |
Ding_TJU_task1a_2 | Ding_TJU | Ding2021 | 70 | 1.326 | 51.1 (50.0 - 52.2) | 1.263 | ||
Ding_TJU_task1a_3 | Ding_TJU | Ding2021 | 61 | 1.226 | 49.1 (48.0 - 50.2) | 1.193 | 55.0 | |
Ding_TJU_task1a_4 | Ding_TJU | Ding2021 | 67 | 1.296 | 51.4 (50.3 - 52.5) | 1.268 | 50.0 | |
Fan_NWPU_task1a_1 | res-att | Cui2021 | 64 | 1.261 | 68.3 (67.3 - 69.3) | 0.870 | 69.7 | |
Galindo-Meza_ITESO_task1a_1 | e2e_CNN_INT8 | Galindo-Meza2021 | 97 | 2.221 | 53.9 (52.8 - 55.0) | 1.904 | 56.5 | |
Heo_Clova_task1a_1 | Clova_AMFM | Hee-Soo2021 | 42 | 1.087 | 67.0 (66.0 - 68.0) | 69.7 | ||
Heo_Clova_task1a_2 | Clova_Res | Hee-Soo2021 | 20 | 0.930 | 66.9 (65.9 - 67.9) | 70.5 | ||
Heo_Clova_task1a_3 | Clova_AMFM_W | Hee-Soo2021 | 34 | 1.045 | 70.0 (69.0 - 71.0) | |||
Heo_Clova_task1a_4 | Clova_Res_W | Hee-Soo2021 | 12 | 0.871 | 70.1 (69.1 - 71.1) | |||
Horváth_HIT_task1a_1 | R_MNv2_fl | Horvth2021 | 86 | 1.597 | 51.4 (50.3 - 52.5) | 1.258 | 55.3 | |
Horváth_HIT_task1a_2 | R_MNv2_af | Horvth2021 | 92 | 2.031 | 53.3 (52.2 - 54.4) | 2.021 | 54.3 | |
Horváth_HIT_task1a_3 | CPRes_fl | Horvth2021 | 76 | 1.460 | 51.6 (50.5 - 52.7) | 1.248 | 54.5 | |
Horváth_HIT_task1a_4 | CPRes_af | Horvth2021 | 95 | 2.065 | 49.2 (48.1 - 50.3) | 2.030 | 54.7 | |
Jeng_CHT+NSYSU_task1a_1 | SparseFCNN | Jeng2021 | 78 | 1.469 | 55.0 (53.9 - 56.1) | 1.464 | 54.6 | |
Jeng_CHT+NSYSU_task1a_2 | DiverseSpa | Jeng2021 | 84 | 1.543 | 51.3 (50.2 - 52.4) | 1.593 | 51.2 | |
Jeng_CHT+NSYSU_task1a_3 | SparseMNet | Jeng2021 | 79 | 1.470 | 56.3 (55.2 - 57.4) | 1.428 | 58.2 | |
Jeong_ETRI_task1a_1 | JYH_ETRI_1 | Jeong2021 | 33 | 1.041 | 66.0 (64.9 - 67.0) | 1.006 | 65.9 | |
Jeong_ETRI_task1a_2 | JYH_ETRI_2 | Jeong2021 | 25 | 0.952 | 67.0 (65.9 - 68.0) | 1.015 | 64.9 | |
Jeong_ETRI_task1a_3 | JYH_ETRI_3 | Jeong2021 | 30 | 1.023 | 66.7 (65.7 - 67.7) | 1.014 | 64.6 | |
Jeong_ETRI_task1a_4 | JYH_ETRI_4 | Jeong2021 | 63 | 1.228 | 66.1 (65.1 - 67.2) | 0.968 | 65.8 | |
Kek_NU_task1a_1 | DSSMNet1 | Kek2021 | 72 | 1.355 | 66.8 (65.7 - 67.8) | 1.410 | 63.0 | |
Kek_NU_task1a_2 | DSSMNet2 | Kek2021 | 57 | 1.207 | 63.5 (62.4 - 64.6) | 1.242 | 62.3 | |
Kim_3M_task1a_1 | CNN_pr1 | Kim2021 | 38 | 1.076 | 61.5 (60.4 - 62.6) | 1.010 | 63.4 | |
Kim_3M_task1a_2 | CNN_pr2 | Kim2021 | 39 | 1.077 | 61.6 (60.5 - 62.6) | 1.008 | 63.5 | |
Kim_3M_task1a_3 | CNN_pr3 | Kim2021 | 37 | 1.076 | 62.0 (61.0 - 63.1) | 1.009 | 63.3 | |
Kim_3M_task1a_4 | CNN_pr4 | Kim2021 | 40 | 1.078 | 61.3 (60.2 - 62.3) | 1.009 | 63.5 | |
Kim_KNU_task1a_1 | KNU-CP1 | Kim2021a | 46 | 1.115 | 64.7 (63.6 - 65.7) | 1.068 | 65.0 | |
Kim_KNU_task1a_2 | KNU-CP2 | Kim2021a | 28 | 1.010 | 63.8 (62.8 - 64.9) | 1.040 | 62.0 | |
Kim_KNU_task1a_3 | KNU-CP3 | Kim2021a | 55 | 1.188 | 61.3 (60.3 - 62.4) | 1.043 | 65.5 | |
Kim_KNU_task1a_4 | KNU-CP4 | Kim2021a | 52 | 1.143 | 62.9 (61.8 - 64.0) | 1.035 | 65.3 | |
Kim_QTI_task1a_1 | ResNorm_QTI1 | Kim2021b | 8 | 0.793 | 75.0 (74.0 - 76.0) | 0.722 | 77.0 | |
Kim_QTI_task1a_2 | ResNorm_QTI2 | Kim2021b | 1 | 0.724 | 76.1 (75.1 - 77.0) | 0.716 | 75.9 | |
Kim_QTI_task1a_3 | ResNorm_QTI3 | Kim2021b | 2 | 0.735 | 76.1 (75.2 - 77.1) | 0.723 | 77.5 | |
Kim_QTI_task1a_4 | ResNorm_QTI4 | Kim2021b | 5 | 0.764 | 75.2 (74.3 - 76.2) | 0.776 | 75.1 | |
Koutini_CPJKU_task1a_1 | DampedR7NB | Koutini2021 | 14 | 0.883 | 70.9 (69.9 - 71.9) | 0.916 | 68.6 | |
Koutini_CPJKU_task1a_2 | DampedR8 | Koutini2021 | 10 | 0.842 | 71.8 (70.8 - 72.8) | 0.944 | 66.9 | |
Koutini_CPJKU_task1a_3 | DampedR8NB | Koutini2021 | 9 | 0.834 | 72.1 (71.1 - 73.1) | 0.890 | 69.5 | |
Koutini_CPJKU_task1a_4 | DampedR8DA | Koutini2021 | 11 | 0.847 | 71.8 (70.9 - 72.8) | 0.880 | 69.5 | |
Lim_CAU_task1a_1 | CAUET-TEFF1-C45-Q | Lim2021 | 90 | 1.956 | 67.5 (66.5 - 68.5) | 1.673 | 65.5 | |
Lim_CAU_task1a_2 | CAUET-TEFF1-P45-Q | Lim2021 | 91 | 2.010 | 67.9 (66.9 - 69.0) | 1.801 | 65.7 | |
Lim_CAU_task1a_3 | CAUET-TEFF2-C70-Q | Lim2021 | 80 | 1.479 | 68.5 (67.5 - 69.5) | 1.625 | 65.2 | |
Lim_CAU_task1a_4 | CAUET-TEFF3-Q | Lim2021 | 93 | 2.039 | 65.8 (64.7 - 66.8) | 1.906 | 63.1 | |
Liu_UESTC_task1a_1 | FR_agm | Liu2021 | 16 | 0.900 | 68.8 (67.8 - 69.8) | 0.909 | 68.2 | |
Liu_UESTC_task1a_2 | onebit_agm | Liu2021 | 15 | 0.895 | 68.2 (67.2 - 69.2) | 0.923 | 68.0 | |
Liu_UESTC_task1a_3 | onebit_noagm | Liu2021 | 13 | 0.878 | 69.6 (68.6 - 70.6) | 0.990 | 65.0 | |
Liu_UESTC_task1a_4 | weight_qz | Liu2021 | 87 | 1.626 | 42.0 (40.9 - 43.1) | 1.434 | 45.4 | |
Madhu_CET_task1a_1 | DWTMSCNN | Madhu2021 | 99 | 3.950 | 9.7 (9.0 - 10.3) | 0.628 | 85.1 | |
DCASE2021 baseline | Baseline | 1.730 | 45.6 (44.5 - 46.7) | 1.461 | 46.9 | |||
Naranjo-Alcazar_ITI_task1a_1 | ASC_ResSE | Naranjo-Alcazar2021_t1a | 51 | 1.140 | 60.2 (59.2 - 61.3) | 64.2 | ||
Pham_AIT_task1a_1 | Pham_AIT | Pham2021 | 73 | 1.368 | 67.5 (66.4 - 68.5) | 66.7 | ||
Pham_AIT_task1a_2 | Pham_AIT | Pham2021 | 54 | 1.187 | 68.4 (67.4 - 69.4) | 66.7 | ||
Pham_AIT_task1a_3 | Pham_AIT | Pham2021 | 94 | 2.058 | 69.6 (68.6 - 70.6) | 66.7 | ||
Phan_UIUC_task1a_1 | ResNet | Phan2021 | 65 | 1.272 | 63.3 (62.3 - 64.4) | 1.259 | 64.1 | |
Phan_UIUC_task1a_2 | ResNet_t3 | Phan2021 | 71 | 1.335 | 63.3 (62.3 - 64.4) | 1.313 | 64.1 | |
Phan_UIUC_task1a_3 | ResNet_t2 | Phan2021 | 60 | 1.223 | 65.3 (64.3 - 66.4) | 1.259 | 64.1 | |
Phan_UIUC_task1a_4 | ResNet_t3 | Phan2021 | 66 | 1.292 | 65.3 (64.3 - 66.4) | 1.313 | 64.1 | |
Puy_VAI_task1a_1 | ce_tta | Puy2021 | 24 | 0.952 | 66.6 (65.6 - 67.6) | 0.898 | 66.8 | |
Puy_VAI_task1a_2 | ce_mu_tta | Puy2021 | 27 | 0.974 | 65.4 (64.4 - 66.5) | 0.927 | 66.2 | |
Puy_VAI_task1a_3 | fl_tta | Puy2021 | 22 | 0.939 | 66.2 (65.1 - 67.2) | 0.877 | 68.7 | |
Qiao_NCUT_task1a_1 | Qiao_NCUT | Qiao2021 | 88 | 1.630 | 52.2 (51.1 - 53.3) | 1.001 | 51.7 | |
Seo_SGU_task1a_1 | Penult | Seo2021 | 32 | 1.030 | 70.3 (69.3 - 71.3) | 1.040 | 69.0 | |
Seo_SGU_task1a_2 | Stride21 | Seo2021 | 41 | 1.080 | 71.4 (70.4 - 72.4) | 1.089 | 72.6 | |
Seo_SGU_task1a_3 | Stride22 | Seo2021 | 35 | 1.065 | 71.3 (70.3 - 72.3) | 1.092 | 72.1 | |
Seo_SGU_task1a_4 | Stride12 | Seo2021 | 44 | 1.087 | 71.8 (70.8 - 72.8) | 1.106 | 72.6 | |
Singh_IITMandi_task1a_1 | Singh_29KB | Singh2021 | 77 | 1.464 | 47.2 (46.1 - 48.3) | 1.383 | 47.7 | |
Singh_IITMandi_task1a_2 | Singh_53KB | Singh2021 | 83 | 1.515 | 44.7 (43.6 - 45.8) | 1.394 | 48.5 | |
Singh_IITMandi_task1a_3 | Singh_74KB | Singh2021 | 82 | 1.509 | 46.1 (45.0 - 47.2) | 1.395 | 49.0 | |
Singh_IITMandi_task1a_4 | Singh_71KB | Singh2021 | 81 | 1.488 | 46.8 (45.7 - 47.9) | 1.413 | 48.6 | |
Sugahara_RION_task1a_1 | RION1 | Sugahara2021 | 43 | 1.087 | 63.8 (62.8 - 64.9) | 0.958 | 70.1 | |
Sugahara_RION_task1a_2 | RION2 | Sugahara2021 | 36 | 1.070 | 65.2 (64.2 - 66.3) | 0.975 | 69.7 | |
Sugahara_RION_task1a_3 | RION3 | Sugahara2021 | 31 | 1.024 | 65.3 (64.3 - 66.4) | 0.937 | 66.8 | |
Sugahara_RION_task1a_4 | RION4 | Sugahara2021 | 68 | 1.297 | 64.7 (63.7 - 65.8) | 1.062 | 68.8 | |
Verbitskiy_DS_task1a_1 | ASC_MB32 | Verbitskiy2021 | 48 | 1.127 | 61.4 (60.3 - 62.4) | 1.042 | 64.4 | |
Verbitskiy_DS_task1a_2 | ASC_MB64 | Verbitskiy2021 | 29 | 1.019 | 64.5 (63.4 - 65.5) | 0.932 | 68.8 | |
Verbitskiy_DS_task1a_3 | ASC_MB128 | Verbitskiy2021 | 26 | 0.966 | 67.3 (66.3 - 68.4) | 0.859 | 70.9 | |
Verbitskiy_DS_task1a_4 | ASC_MB160 | Verbitskiy2021 | 19 | 0.924 | 68.1 (67.1 - 69.1) | 0.848 | 70.5 | |
Yang_GT_task1a_1 | Yang_GT_lth_a | Yang2021 | 6 | 0.768 | 73.1 (72.1 - 74.0) | 0.640 | 79.4 | |
Yang_GT_task1a_2 | Yang_GT_lth_b | Yang2021 | 4 | 0.764 | 72.9 (71.9 - 73.9) | |||
Yang_GT_task1a_3 | Yang_GT_lth_c | Yang2021 | 3 | 0.758 | 72.9 (71.9 - 73.8) | |||
Yang_GT_task1a_4 | Yang_GT_lth_d | Yang2021 | 7 | 0.774 | 72.8 (71.8 - 73.8) | |||
Yihao_speakin_task1a_1 | Yihao_ratio07 | Yihao2021 | 69 | 1.311 | 51.9 (50.8 - 53.0) | 0.893 | 69.4 | |
Yihao_speakin_task1a_2 | Yihao_ratio065 | Yihao2021 | 59 | 1.222 | 55.2 (54.1 - 56.3) | 0.727 | 76.1 | |
Yihao_speakin_task1a_3 | Yihao_seresnet | Yihao2021 | 96 | 2.105 | 53.5 (52.4 - 54.6) | 1.990 | 82.8 | |
Zhang_BUPT&BYTEDANCE_task1a_1 | Zhang_resnet_1 | Zhang2021 | 47 | 1.124 | 63.0 (62.0 - 64.1) | 78.2 | ||
Zhang_BUPT&BYTEDANCE_task1a_2 | Zhang_resnet_2 | Zhang2021 | 45 | 1.113 | 63.2 (62.2 - 64.3) | 76.4 | ||
Zhang_BUPT&BYTEDANCE_task1a_3 | Zhang_resnet_cbam | Zhang2021 | 98 | 3.359 | 52.2 (51.1 - 53.3) | 65.2 | ||
Zhang_BUPT&BYTEDANCE_task1a_4 | Zhang_resnet_senet | Zhang2021 | 89 | 1.946 | 59.0 (57.9 - 60.1) | 70.6 | ||
Zhao_Maxvision_task1a_1 | maxvision1 | Zhao2021 | 75 | 1.440 | 61.2 (60.2 - 62.3) | 1.494 | 57.6 | |
Zhao_Maxvision_task1a_2 | maxvision2 | Zhao2021 | 74 | 1.412 | 63.5 (62.4 - 64.6) | 1.482 | 59.6 | |
Zhao_Maxvision_task1a_3 | maxvision3 | Zhao2021 | 62 | 1.227 | 63.5 (62.5 - 64.6) | 1.258 | 59.9 | |
Zhao_Maxvision_task1a_4 | maxvision4 | Zhao2021 | 58 | 1.215 | 62.8 (61.8 - 63.9) | 1.485 | 57.8 |
Teams ranking
Table including only the best performing system per submitting team.
Submission information | Evaluation dataset | Development dataset | |||||||
---|---|---|---|---|---|---|---|---|---|
Rank | Submission label | Name |
Technical Report |
Official system rank |
Team rank | Logloss (Evaluation dataset) |
Accuracy with 95% confidence interval (Evaluation dataset) |
Logloss (Development dataset) | Accuracy (Development dataset) |
Byttebier_IDLab_task1a_4 | 16b_prune | Byttebier2021 | 17 | 6 | 0.905 | 68.8 (67.8 - 69.8) | 0.840 | 70.2 | |
Cao_SCUT_task1a_1 | sys_1 | Cao2021 | 49 | 15 | 1.136 | 66.7 (65.7 - 67.7) | 1.038 | 71.6 | |
Ding_TJU_task1a_3 | Ding_TJU | Ding2021 | 61 | 22 | 1.226 | 49.1 (48.0 - 50.2) | 1.193 | 55.0 | |
Fan_NWPU_task1a_1 | res-att | Cui2021 | 64 | 23 | 1.261 | 68.3 (67.3 - 69.3) | 0.870 | 69.7 | |
Galindo-Meza_ITESO_task1a_1 | e2e_CNN_INT8 | Galindo-Meza2021 | 97 | 29 | 2.221 | 53.9 (52.8 - 55.0) | 1.904 | 56.5 | |
Heo_Clova_task1a_4 | Clova_Res_W | Hee-Soo2021 | 12 | 4 | 0.871 | 70.1 (69.1 - 71.1) | |||
Horváth_HIT_task1a_3 | CPRes_fl | Horvth2021 | 76 | 24 | 1.460 | 51.6 (50.5 - 52.7) | 1.248 | 54.5 | |
Jeng_CHT+NSYSU_task1a_1 | SparseFCNN | Jeng2021 | 78 | 26 | 1.469 | 55.0 (53.9 - 56.1) | 1.464 | 54.6 | |
Jeong_ETRI_task1a_2 | JYH_ETRI_2 | Jeong2021 | 25 | 9 | 0.952 | 67.0 (65.9 - 68.0) | 1.015 | 64.9 | |
Kek_NU_task1a_2 | DSSMNet2 | Kek2021 | 57 | 18 | 1.207 | 63.5 (62.4 - 64.6) | 1.242 | 62.3 | |
Kim_3M_task1a_3 | CNN_pr3 | Kim2021 | 37 | 13 | 1.076 | 62.0 (61.0 - 63.1) | 1.009 | 63.3 | |
Kim_KNU_task1a_2 | KNU-CP2 | Kim2021a | 28 | 10 | 1.010 | 63.8 (62.8 - 64.9) | 1.040 | 62.0 | |
Kim_QTI_task1a_2 | ResNorm_QTI2 | Kim2021b | 1 | 1 | 0.724 | 76.1 (75.1 - 77.0) | 0.716 | 75.9 | |
Koutini_CPJKU_task1a_3 | DampedR8NB | Koutini2021 | 9 | 3 | 0.834 | 72.1 (71.1 - 73.1) | 0.890 | 69.5 | |
Lim_CAU_task1a_3 | CAUET-TEFF2-C70-Q | Lim2021 | 80 | 27 | 1.479 | 68.5 (67.5 - 69.5) | 1.625 | 65.2 | |
Liu_UESTC_task1a_3 | onebit_noagm | Liu2021 | 13 | 5 | 0.878 | 69.6 (68.6 - 70.6) | 0.990 | 65.0 | |
Madhu_CET_task1a_1 | DWTMSCNN | Madhu2021 | 99 | 30 | 3.950 | 9.7 (9.0 - 10.3) | 0.628 | 85.1 | |
DCASE2021 baseline | Baseline | 1.730 | 45.6 (44.5 - 46.7) | 1.461 | 46.9 | ||||
Naranjo-Alcazar_ITI_task1a_1 | ASC_ResSE | Naranjo-Alcazar2021_t1a | 51 | 16 | 1.140 | 60.2 (59.2 - 61.3) | 64.2 | ||
Pham_AIT_task1a_2 | Pham_AIT | Pham2021 | 54 | 17 | 1.187 | 68.4 (67.4 - 69.4) | 66.7 | ||
Phan_UIUC_task1a_3 | ResNet_t2 | Phan2021 | 60 | 21 | 1.223 | 65.3 (64.3 - 66.4) | 1.259 | 64.1 | |
Puy_VAI_task1a_3 | fl_tta | Puy2021 | 22 | 8 | 0.939 | 66.2 (65.1 - 67.2) | 0.877 | 68.7 | |
Qiao_NCUT_task1a_1 | Qiao_NCUT | Qiao2021 | 88 | 28 | 1.630 | 52.2 (51.1 - 53.3) | 1.001 | 51.7 | |
Seo_SGU_task1a_1 | Penult | Seo2021 | 32 | 12 | 1.030 | 70.3 (69.3 - 71.3) | 1.040 | 69.0 | |
Singh_IITMandi_task1a_1 | Singh_29KB | Singh2021 | 77 | 25 | 1.464 | 47.2 (46.1 - 48.3) | 1.383 | 47.7 | |
Sugahara_RION_task1a_3 | RION3 | Sugahara2021 | 31 | 11 | 1.024 | 65.3 (64.3 - 66.4) | 0.937 | 66.8 | |
Verbitskiy_DS_task1a_4 | ASC_MB160 | Verbitskiy2021 | 19 | 7 | 0.924 | 68.1 (67.1 - 69.1) | 0.848 | 70.5 | |
Yang_GT_task1a_3 | Yang_GT_lth_c | Yang2021 | 3 | 2 | 0.758 | 72.9 (71.9 - 73.8) | |||
Yihao_speakin_task1a_2 | Yihao_ratio065 | Yihao2021 | 59 | 20 | 1.222 | 55.2 (54.1 - 56.3) | 0.727 | 76.1 | |
Zhang_BUPT&BYTEDANCE_task1a_2 | Zhang_resnet_2 | Zhang2021 | 45 | 14 | 1.113 | 63.2 (62.2 - 64.3) | 76.4 | ||
Zhao_Maxvision_task1a_4 | maxvision4 | Zhao2021 | 58 | 19 | 1.215 | 62.8 (61.8 - 63.9) | 1.485 | 57.8 |
System complexity
Submission information | Evaluation dataset | Acoustic model | System | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Rank | Submission label |
Technical Report |
Official system rank |
Logloss (Eval) | Accuracy (Eval) | Parameters |
Non-zero parameters |
Sparsity |
Size (KB) * |
Complexity management |
Byttebier_IDLab_task1a_1 | Byttebier2021 | 21 | 0.936 | 68.6 | 114634 | 113976 | 0.0057400073276688834 | 127.6 | weight quantization, grouped convolutions, Conv+BN fusion | |
Byttebier_IDLab_task1a_2 | Byttebier2021 | 18 | 0.914 | 67.5 | 114634 | 113976 | 0.0057400073276688834 | 127.6 | weight quantization, grouped convolutions, Conv+BN fusion | |
Byttebier_IDLab_task1a_3 | Byttebier2021 | 23 | 0.944 | 68.5 | 114634 | 113976 | 0.0057400073276688834 | 127.6 | weight quantization, grouped convolutions, Conv+BN fusion | |
Byttebier_IDLab_task1a_4 | Byttebier2021 | 17 | 0.905 | 68.8 | 82910 | 62390 | 0.24749728621396694 | 121.9 | weight quantization, grouped convolutions, pruning | |
Cao_SCUT_task1a_1 | Cao2021 | 49 | 1.136 | 66.7 | 36658 | 34970 | 0.04604724753123468 | 71.6 | weight quantization | |
Cao_SCUT_task1a_2 | Cao2021 | 56 | 1.200 | 64.6 | 36658 | 34970 | 0.04604724753123468 | 71.6 | weight quantization | |
Cao_SCUT_task1a_3 | Cao2021 | 50 | 1.137 | 67.2 | 36658 | 34970 | 0.04604724753123468 | 71.6 | weight quantization | |
Cao_SCUT_task1a_4 | Cao2021 | 53 | 1.147 | 66.1 | 51926 | 50238 | 0.03250779956091365 | 102.9 | weight quantization | |
Ding_TJU_task1a_1 | Ding2021 | 85 | 1.544 | 53.0 | 40230 | 40230 | 0.0 | 78.6 | weight quantization | |
Ding_TJU_task1a_2 | Ding2021 | 70 | 1.326 | 51.1 | 20250 | 20250 | 0.0 | 39.5 | weight quantization | |
Ding_TJU_task1a_3 | Ding2021 | 61 | 1.226 | 49.1 | 63816 | 63816 | 0.0 | 124.6 | weight quantization | |
Ding_TJU_task1a_4 | Ding2021 | 67 | 1.296 | 51.4 | 20250 | 20250 | 0.0 | 39.5 | weight quantization | |
Fan_NWPU_task1a_1 | Cui2021 | 64 | 1.261 | 68.3 | 93323 | 93323 | 0.0 | 93.3 | weight quantization | |
Galindo-Meza_ITESO_task1a_1 | Galindo-Meza2021 | 97 | 2.221 | 53.9 | 127637 | 127637 | 0.0 | 124.6 | pruning, int8 weight quantization | |
Heo_Clova_task1a_1 | Hee-Soo2021 | 42 | 1.087 | 67.0 | 65424 | 65424 | 0.0 | 127.7 | weight quantization | |
Heo_Clova_task1a_2 | Hee-Soo2021 | 20 | 0.930 | 66.9 | 63547 | 63547 | 0.0 | 124.1 | weight quantization | |
Heo_Clova_task1a_3 | Hee-Soo2021 | 34 | 1.045 | 70.0 | 65424 | 65424 | 0.0 | 127.7 | weight quantization | |
Heo_Clova_task1a_4 | Hee-Soo2021 | 12 | 0.871 | 70.1 | 63547 | 63547 | 0.0 | 124.1 | weight quantization | |
Horváth_HIT_task1a_1 | Horvth2021 | 86 | 1.597 | 51.4 | 47939 | 47939 | 0.0 | 93.6 | weight quantization | |
Horváth_HIT_task1a_2 | Horvth2021 | 92 | 2.031 | 53.3 | 47939 | 47939 | 0.0 | 93.6 | weight quantization | |
Horváth_HIT_task1a_3 | Horvth2021 | 76 | 1.460 | 51.6 | 58266 | 58266 | 0.0 | 113.8 | weight quantization | |
Horváth_HIT_task1a_4 | Horvth2021 | 95 | 2.065 | 49.2 | 58266 | 58266 | 0.0 | 113.8 | weight quantization | |
Jeng_CHT+NSYSU_task1a_1 | Jeng2021 | 78 | 1.469 | 55.0 | 130457242 | 129320 | 0.9990087173543037 | 126.3 | sparsity, weight quantization | |
Jeng_CHT+NSYSU_task1a_2 | Jeng2021 | 84 | 1.543 | 51.3 | 130457242 | 127906 | 0.9990195561546518 | 124.9 | sparsity, weight quantization | |
Jeng_CHT+NSYSU_task1a_3 | Jeng2021 | 79 | 1.470 | 56.3 | 17186944 | 130999 | 0.9923779934350168 | 127.9 | sparsity, weight quantization | |
Jeong_ETRI_task1a_1 | Jeong2021 | 33 | 1.041 | 66.0 | 54845 | 54845 | 0.0 | 113.9 | weight quantization, depthwise separable convolutions | |
Jeong_ETRI_task1a_2 | Jeong2021 | 25 | 0.952 | 67.0 | 54845 | 54845 | 0.0 | 113.9 | weight quantization, depthwise separable convolutions | |
Jeong_ETRI_task1a_3 | Jeong2021 | 30 | 1.023 | 66.7 | 60236 | 60236 | 0.0 | 124.4 | weight quantization, depthwise separable convolutions | |
Jeong_ETRI_task1a_4 | Jeong2021 | 63 | 1.228 | 66.1 | 60236 | 60236 | 0.0 | 124.4 | weight quantization, depthwise separable convolutions | |
Kek_NU_task1a_1 | Kek2021 | 72 | 1.355 | 66.8 | 63448 | 59472 | 0.0626654898499559 | 123.9 | weight quantization | |
Kek_NU_task1a_2 | Kek2021 | 57 | 1.207 | 63.5 | 64850 | 60842 | 0.06180416345412487 | 126.6 | weight quantization | |
Kim_3M_task1a_1 | Kim2021 | 38 | 1.076 | 61.5 | 168778 | 116398 | 0.31034850513692547 | 113.7 | weight quantization, pruning | |
Kim_3M_task1a_2 | Kim2021 | 39 | 1.077 | 61.6 | 168778 | 113428 | 0.32794558532510165 | 110.8 | weight quantization, pruning | |
Kim_3M_task1a_3 | Kim2021 | 37 | 1.076 | 62.0 | 168778 | 120841 | 0.2840239841685528 | 118.0 | weight quantization, pruning | |
Kim_3M_task1a_4 | Kim2021 | 40 | 1.078 | 61.3 | 168778 | 116439 | 0.31010558248112907 | 113.7 | weight quantization, pruning | |
Kim_KNU_task1a_1 | Kim2021a | 46 | 1.115 | 64.7 | 58472 | 58374 | 0.0016760158708442052 | 125.6 | CP-decomposition, weight quantization | |
Kim_KNU_task1a_2 | Kim2021a | 28 | 1.010 | 63.8 | 64064 | 64064 | 0.0 | 125.1 | parameter sharing, weight quantization | |
Kim_KNU_task1a_3 | Kim2021a | 55 | 1.188 | 61.3 | 58472 | 58411 | 0.0010432343685866652 | 125.7 | CP-decomposition, weight quantization | |
Kim_KNU_task1a_4 | Kim2021a | 52 | 1.143 | 62.9 | 58472 | 58411 | 0.0010432343685866652 | 125.7 | CP-decomposition, weight quantization | |
Kim_QTI_task1a_1 | Kim2021b | 8 | 0.793 | 75.0 | 630042 | 95472 | 0.8484672450408068 | 121.9 | weight quantization, pruning, knowledge distillation | |
Kim_QTI_task1a_2 | Kim2021b | 1 | 0.724 | 76.1 | 630042 | 95472 | 0.8484672450408068 | 121.9 | weight quantization, pruning, knowledge distillation | |
Kim_QTI_task1a_3 | Kim2021b | 2 | 0.735 | 76.1 | 630042 | 95472 | 0.8484672450408068 | 121.9 | weight quantization, pruning, knowledge distillation | |
Kim_QTI_task1a_4 | Kim2021b | 5 | 0.764 | 75.2 | 314990 | 62721 | 0.800879392996603 | 122.5 | weight quantization, pruning, knowledge distillation | |
Koutini_CPJKU_task1a_1 | Koutini2021 | 14 | 0.883 | 70.9 | 504104 | 64690 | 0.8716733055083872 | 126.3 | float16, sparsity | |
Koutini_CPJKU_task1a_2 | Koutini2021 | 10 | 0.842 | 71.8 | 678184 | 64928 | 0.9042619702027768 | 126.8 | float16, sparsity | |
Koutini_CPJKU_task1a_3 | Koutini2021 | 9 | 0.834 | 72.1 | 635176 | 64625 | 0.8982565462171115 | 126.2 | float16, sparsity | |
Koutini_CPJKU_task1a_4 | Koutini2021 | 11 | 0.847 | 71.8 | 641320 | 63529 | 0.9009402482380091 | 124.1 | float16, sparsity | |
Lim_CAU_task1a_1 | Lim2021 | 90 | 1.956 | 67.5 | 89910 | 56499 | 0.3716049382716049 | 125.2 | weight quantization, sparsity | |
Lim_CAU_task1a_2 | Lim2021 | 91 | 2.010 | 67.9 | 89910 | 56499 | 0.3716049382716049 | 125.2 | weight quantization, sparsity | |
Lim_CAU_task1a_3 | Lim2021 | 80 | 1.479 | 68.5 | 134748 | 54504 | 0.5955116216938285 | 125.4 | weight quantization, sparsity | |
Lim_CAU_task1a_4 | Lim2021 | 93 | 2.039 | 65.8 | 56046 | 56046 | 0.0 | 118.8 | weight quantization, sparsity | |
Liu_UESTC_task1a_1 | Liu2021 | 16 | 0.900 | 68.8 | 643194 | 643194 | 0.0 | 106.7 | 1-bit quantization,FR_unit | |
Liu_UESTC_task1a_2 | Liu2021 | 15 | 0.895 | 68.2 | 268362 | 268368 | 2.235785990567507e-05 | 42.5 | 1-bit quantization | |
Liu_UESTC_task1a_3 | Liu2021 | 13 | 0.878 | 69.6 | 268362 | 268368 | 2.235785990567507e-05 | 42.5 | 1-bit quantization | |
Liu_UESTC_task1a_4 | Liu2021 | 87 | 1.626 | 42.0 | 60928 | 60928 | 0.0 | 119.0 | weight quantization | |
Madhu_CET_task1a_1 | Madhu2021 | 99 | 3.950 | 9.7 | 42774 | 42774 | 0.0 | 89.5 | weight quantization | |
DCASE2021 baseline | 1.730 | 45.6 | 46246 | 46246 | 0.0 | 90.3 | weight quantization | |||
Naranjo-Alcazar_ITI_task1a_1 | Naranjo-Alcazar2021_t1a | 51 | 1.140 | 60.2 | 50130 | 50130 | 0.0 | 96.0 | weight quantization, tflite, float16 | |
Pham_AIT_task1a_1 | Pham2021 | 73 | 1.368 | 67.5 | 10909 | 10909 | 0.0 | 128.0 | channel restriction and decomposed convolution | |
Pham_AIT_task1a_2 | Pham2021 | 54 | 1.187 | 68.4 | 10909 | 10909 | 0.0 | 128.0 | channel restriction and decomposed convolution | |
Pham_AIT_task1a_3 | Pham2021 | 94 | 2.058 | 69.6 | 10909 | 10909 | 0.0 | 128.0 | channel restriction and decomposed convolution | |
Phan_UIUC_task1a_1 | Phan2021 | 65 | 1.272 | 63.3 | 41356 | 36364 | 0.12070799883934613 | 75.2 | weight quantization, depthwise separable convolutions | |
Phan_UIUC_task1a_2 | Phan2021 | 71 | 1.335 | 63.3 | 41356 | 36364 | 0.12070799883934613 | 75.2 | weight quantization, depthwise separable convolutions | |
Phan_UIUC_task1a_3 | Phan2021 | 60 | 1.223 | 65.3 | 41356 | 36364 | 0.12070799883934613 | 75.2 | weight quantization, depthwise separable convolutions | |
Phan_UIUC_task1a_4 | Phan2021 | 66 | 1.292 | 65.3 | 41356 | 36364 | 0.12070799883934613 | 75.2 | weight quantization, depthwise separable convolutions | |
Puy_VAI_task1a_1 | Puy2021 | 24 | 0.952 | 66.6 | 62474 | 62474 | 0.0 | 122.0 | weight quantization | |
Puy_VAI_task1a_2 | Puy2021 | 27 | 0.974 | 65.4 | 62474 | 62474 | 0.0 | 122.0 | weight quantization | |
Puy_VAI_task1a_3 | Puy2021 | 22 | 0.939 | 66.2 | 62474 | 62474 | 0.0 | 122.0 | weight quantization | |
Qiao_NCUT_task1a_1 | Qiao2021 | 88 | 1.630 | 52.2 | 31852 | 31852 | 0.0 | 124.4 | weight quantization | |
Seo_SGU_task1a_1 | Seo2021 | 32 | 1.030 | 70.3 | 101173 | 101173 | 0.0 | 125.0 | weight quantization | |
Seo_SGU_task1a_2 | Seo2021 | 41 | 1.080 | 71.4 | 99557 | 99557 | 0.0 | 126.5 | weight quantization | |
Seo_SGU_task1a_3 | Seo2021 | 35 | 1.065 | 71.3 | 99614 | 99614 | 0.0 | 126.6 | weight quantization | |
Seo_SGU_task1a_4 | Seo2021 | 44 | 1.087 | 71.8 | 99603 | 99603 | 0.0 | 126.5 | weight quantization | |
Singh_IITMandi_task1a_1 | Singh2021 | 77 | 1.464 | 47.2 | 14754 | 14754 | 0.0 | 28.8 | Filter pruning and quantization | |
Singh_IITMandi_task1a_2 | Singh2021 | 83 | 1.515 | 44.7 | 27166 | 27166 | 0.0 | 53.1 | Filter pruning and quantization | |
Singh_IITMandi_task1a_3 | Singh2021 | 82 | 1.509 | 46.1 | 38110 | 38110 | 0.0 | 74.4 | Filter pruning and quantization | |
Singh_IITMandi_task1a_4 | Singh2021 | 81 | 1.488 | 46.8 | 36578 | 36578 | 0.0 | 71.4 | Filter pruning and quantization | |
Sugahara_RION_task1a_1 | Sugahara2021 | 43 | 1.087 | 63.8 | 339730 | 86577 | 0.7451593912813117 | 94.7 | weight quantization, pruning | |
Sugahara_RION_task1a_2 | Sugahara2021 | 36 | 1.070 | 65.2 | 339730 | 86577 | 0.7451593912813117 | 94.7 | weight quantization, pruning | |
Sugahara_RION_task1a_3 | Sugahara2021 | 31 | 1.024 | 65.3 | 203838 | 102606 | 0.496629676507815 | 108.3 | weight quantization, pruning | |
Sugahara_RION_task1a_4 | Sugahara2021 | 68 | 1.297 | 64.7 | 255940 | 109804 | 0.570977572868641 | 114.6 | weight quantization, pruning | |
Verbitskiy_DS_task1a_1 | Verbitskiy2021 | 48 | 1.127 | 61.4 | 62090 | 62090 | 0.0 | 121.3 | weight quantization | |
Verbitskiy_DS_task1a_2 | Verbitskiy2021 | 29 | 1.019 | 64.5 | 62154 | 62154 | 0.0 | 121.4 | weight quantization | |
Verbitskiy_DS_task1a_3 | Verbitskiy2021 | 26 | 0.966 | 67.3 | 62282 | 62282 | 0.0 | 121.6 | weight quantization | |
Verbitskiy_DS_task1a_4 | Verbitskiy2021 | 19 | 0.924 | 68.1 | 62346 | 62346 | 0.0 | 121.8 | weight quantization | |
Yang_GT_task1a_1 | Yang2021 | 6 | 0.768 | 73.1 | 4410180 | 30500 | 0.9930841825050225 | 122.0 | weight quantization, LTH pruning, teacher-student learning | |
Yang_GT_task1a_2 | Yang2021 | 4 | 0.764 | 72.9 | 14640720 | 111000 | 0.9924184056521811 | 111.0 | weight quantization, LTH pruning, teacher-student learning | |
Yang_GT_task1a_3 | Yang2021 | 3 | 0.758 | 72.9 | 7056288 | 45750 | 0.9935164210984586 | 125.0 | weight quantization, LTH pruning, teacher-student learning | |
Yang_GT_task1a_4 | Yang2021 | 7 | 0.774 | 72.8 | 7056288 | 45750 | 0.9935164210984586 | 125.0 | weight quantization, LTH pruning, teacher-student learning | |
Yihao_speakin_task1a_1 | Yihao2021 | 69 | 1.311 | 51.9 | 48075 | 48075 | 0.0 | 93.8 | sparsity | |
Yihao_speakin_task1a_2 | Yihao2021 | 59 | 1.222 | 55.2 | 63244 | 63244 | 0.0 | 123.5 | sparsity | |
Yihao_speakin_task1a_3 | Yihao2021 | 96 | 2.105 | 53.5 | 50952 | 50952 | 0.0 | 99.5 | sparsity | |
Zhang_BUPT&BYTEDANCE_task1a_1 | Zhang2021 | 47 | 1.124 | 63.0 | 83572 | 49738 | 0.4048485138563155 | 48.6 | weight quantization | |
Zhang_BUPT&BYTEDANCE_task1a_2 | Zhang2021 | 45 | 1.113 | 63.2 | 83572 | 49738 | 0.4048485138563155 | 48.6 | weight quantization | |
Zhang_BUPT&BYTEDANCE_task1a_3 | Zhang2021 | 98 | 3.359 | 52.2 | 87011 | 53177 | 0.3888473871119743 | 51.9 | weight quantization | |
Zhang_BUPT&BYTEDANCE_task1a_4 | Zhang2021 | 89 | 1.946 | 59.0 | 86516 | 85706 | 0.009362430070738337 | 83.7 | weight quantization | |
Zhao_Maxvision_task1a_1 | Zhao2021 | 75 | 1.440 | 61.2 | 59421 | 59376 | 0.0007573080224163586 | 116.1 | weight quantization | |
Zhao_Maxvision_task1a_2 | Zhao2021 | 74 | 1.412 | 63.5 | 59421 | 59376 | 0.0007573080224163586 | 116.1 | weight quantization | |
Zhao_Maxvision_task1a_3 | Zhao2021 | 62 | 1.227 | 63.5 | 59421 | 59376 | 0.0007573080224163586 | 116.1 | weight quantization | |
Zhao_Maxvision_task1a_4 | Zhao2021 | 58 | 1.215 | 62.8 | 59421 | 59376 | 0.0007573080224163586 | 116.1 | weight quantization |
*) Model size is calculated accordingly to the task specific rules, and will differ from a real model storage size. See model size calculation examples here.
Generalization performance
All results with evaluation dataset.
Submission information | Overall | Devices | Cities | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Evaluation dataset | Unseen | Seen | Unseen | Seen | |||||||||
Rank | Submission label |
Technical Report |
Official system rank |
Logloss (Evaluation dataset) | Accuracy (Evaluation dataset) |
Logloss / unseen devices (Evaluation dataset) |
Accuracy / unseen devices (Evaluation dataset) |
Logloss / seen devices (Evaluation dataset) |
Accuracy / seen devices (Evaluation dataset) |
Logloss / unseen cities (Evaluation dataset) |
Accuracy / unseen cities (Evaluation dataset) |
Logloss / seen cities (Evaluation dataset) |
Accuracy / seen cities (Evaluation dataset) |
Byttebier_IDLab_task1a_1 | Byttebier2021 | 21 | 0.936 | 68.6 | 1.065 | 64.5 | 0.829 | 72.0 | 0.972 | 67.5 | 0.926 | 68.6 | |
Byttebier_IDLab_task1a_2 | Byttebier2021 | 18 | 0.914 | 67.5 | 1.048 | 63.6 | 0.801 | 70.8 | 1.012 | 65.4 | 0.892 | 68.0 | |
Byttebier_IDLab_task1a_3 | Byttebier2021 | 23 | 0.944 | 68.5 | 1.094 | 64.7 | 0.820 | 71.7 | 1.007 | 67.3 | 0.931 | 68.7 | |
Byttebier_IDLab_task1a_4 | Byttebier2021 | 17 | 0.905 | 68.8 | 1.002 | 65.5 | 0.824 | 71.5 | 0.914 | 69.2 | 0.903 | 68.8 | |
Cao_SCUT_task1a_1 | Cao2021 | 49 | 1.136 | 66.7 | 1.214 | 62.5 | 1.071 | 70.2 | 1.190 | 63.2 | 1.126 | 67.3 | |
Cao_SCUT_task1a_2 | Cao2021 | 56 | 1.200 | 64.6 | 1.318 | 59.0 | 1.102 | 69.2 | 1.249 | 60.6 | 1.188 | 65.4 | |
Cao_SCUT_task1a_3 | Cao2021 | 50 | 1.137 | 67.2 | 1.223 | 63.3 | 1.066 | 70.4 | 1.196 | 63.4 | 1.123 | 68.1 | |
Cao_SCUT_task1a_4 | Cao2021 | 53 | 1.147 | 66.1 | 1.250 | 60.8 | 1.061 | 70.5 | 1.206 | 61.3 | 1.135 | 67.2 | |
Ding_TJU_task1a_1 | Ding2021 | 85 | 1.544 | 53.0 | 1.878 | 46.8 | 1.265 | 58.2 | 1.547 | 49.0 | 1.530 | 54.1 | |
Ding_TJU_task1a_2 | Ding2021 | 70 | 1.326 | 51.1 | 1.488 | 45.9 | 1.191 | 55.4 | 1.362 | 48.7 | 1.310 | 51.5 | |
Ding_TJU_task1a_3 | Ding2021 | 61 | 1.226 | 49.1 | 1.356 | 43.9 | 1.118 | 53.4 | 1.274 | 48.9 | 1.209 | 49.2 | |
Ding_TJU_task1a_4 | Ding2021 | 67 | 1.296 | 51.4 | 1.426 | 46.6 | 1.188 | 55.4 | 1.305 | 50.7 | 1.293 | 51.5 | |
Fan_NWPU_task1a_1 | Cui2021 | 64 | 1.261 | 68.3 | 1.458 | 65.6 | 1.098 | 70.6 | 1.628 | 65.6 | 1.187 | 69.0 | |
Galindo-Meza_ITESO_task1a_1 | Galindo-Meza2021 | 97 | 2.221 | 53.9 | 2.488 | 50.8 | 1.999 | 56.5 | 2.165 | 54.3 | 2.191 | 54.6 | |
Heo_Clova_task1a_1 | Hee-Soo2021 | 42 | 1.087 | 67.0 | 1.180 | 62.6 | 1.009 | 70.7 | 1.099 | 67.2 | 1.082 | 67.1 | |
Heo_Clova_task1a_2 | Hee-Soo2021 | 20 | 0.930 | 66.9 | 0.993 | 64.1 | 0.878 | 69.2 | 0.982 | 66.1 | 0.911 | 67.4 | |
Heo_Clova_task1a_3 | Hee-Soo2021 | 34 | 1.045 | 70.0 | 1.110 | 67.1 | 0.991 | 72.5 | 1.059 | 68.3 | 1.039 | 71.1 | |
Heo_Clova_task1a_4 | Hee-Soo2021 | 12 | 0.871 | 70.1 | 0.929 | 68.1 | 0.823 | 71.8 | 0.864 | 71.8 | 0.868 | 70.3 | |
Horváth_HIT_task1a_1 | Horvth2021 | 86 | 1.597 | 51.4 | 2.039 | 44.1 | 1.228 | 57.5 | 1.561 | 50.3 | 1.570 | 51.5 | |
Horváth_HIT_task1a_2 | Horvth2021 | 92 | 2.031 | 53.3 | 2.072 | 47.1 | 1.997 | 58.6 | 2.040 | 51.9 | 2.030 | 53.8 | |
Horváth_HIT_task1a_3 | Horvth2021 | 76 | 1.460 | 51.6 | 1.780 | 44.6 | 1.193 | 57.5 | 1.463 | 49.7 | 1.461 | 51.8 | |
Horváth_HIT_task1a_4 | Horvth2021 | 95 | 2.065 | 49.2 | 2.111 | 40.4 | 2.027 | 56.5 | 2.063 | 49.3 | 2.065 | 49.9 | |
Jeng_CHT+NSYSU_task1a_1 | Jeng2021 | 78 | 1.469 | 55.0 | 1.557 | 50.9 | 1.396 | 58.4 | 1.479 | 52.2 | 1.473 | 55.1 | |
Jeng_CHT+NSYSU_task1a_2 | Jeng2021 | 84 | 1.543 | 51.3 | 1.619 | 47.3 | 1.480 | 54.6 | 1.562 | 47.5 | 1.542 | 51.9 | |
Jeng_CHT+NSYSU_task1a_3 | Jeng2021 | 79 | 1.470 | 56.3 | 1.613 | 50.9 | 1.351 | 60.8 | 1.516 | 53.9 | 1.464 | 56.8 | |
Jeong_ETRI_task1a_1 | Jeong2021 | 33 | 1.041 | 66.0 | 1.219 | 60.6 | 0.893 | 70.5 | 1.045 | 64.6 | 1.045 | 66.2 | |
Jeong_ETRI_task1a_2 | Jeong2021 | 25 | 0.952 | 67.0 | 1.094 | 62.6 | 0.834 | 70.6 | 0.986 | 64.2 | 0.940 | 67.3 | |
Jeong_ETRI_task1a_3 | Jeong2021 | 30 | 1.023 | 66.7 | 1.187 | 61.4 | 0.886 | 71.1 | 0.971 | 66.9 | 1.056 | 66.7 | |
Jeong_ETRI_task1a_4 | Jeong2021 | 63 | 1.228 | 66.1 | 1.724 | 59.6 | 0.816 | 71.6 | 1.131 | 65.8 | 1.254 | 66.7 | |
Kek_NU_task1a_1 | Kek2021 | 72 | 1.355 | 66.8 | 1.461 | 61.3 | 1.266 | 71.3 | 1.358 | 66.3 | 1.354 | 66.6 | |
Kek_NU_task1a_2 | Kek2021 | 57 | 1.207 | 63.5 | 1.416 | 56.6 | 1.034 | 69.3 | 1.230 | 62.4 | 1.201 | 63.6 | |
Kim_3M_task1a_1 | Kim2021 | 38 | 1.076 | 61.5 | 1.185 | 57.7 | 0.986 | 64.6 | 1.062 | 60.7 | 1.079 | 62.1 | |
Kim_3M_task1a_2 | Kim2021 | 39 | 1.077 | 61.6 | 1.185 | 58.1 | 0.987 | 64.5 | 1.067 | 61.3 | 1.080 | 61.9 | |
Kim_3M_task1a_3 | Kim2021 | 37 | 1.076 | 62.0 | 1.183 | 58.6 | 0.986 | 64.9 | 1.060 | 60.7 | 1.079 | 62.3 | |
Kim_3M_task1a_4 | Kim2021 | 40 | 1.078 | 61.3 | 1.190 | 57.6 | 0.986 | 64.3 | 1.068 | 60.4 | 1.081 | 61.6 | |
Kim_KNU_task1a_1 | Kim2021a | 46 | 1.115 | 64.7 | 1.317 | 59.4 | 0.946 | 69.1 | 1.074 | 67.4 | 1.125 | 64.3 | |
Kim_KNU_task1a_2 | Kim2021a | 28 | 1.010 | 63.8 | 1.215 | 57.2 | 0.839 | 69.4 | 0.991 | 62.6 | 1.003 | 64.1 | |
Kim_KNU_task1a_3 | Kim2021a | 55 | 1.188 | 61.3 | 1.371 | 56.2 | 1.036 | 65.6 | 1.188 | 59.9 | 1.187 | 61.6 | |
Kim_KNU_task1a_4 | Kim2021a | 52 | 1.143 | 62.9 | 1.315 | 57.7 | 1.000 | 67.3 | 1.141 | 63.2 | 1.143 | 62.7 | |
Kim_QTI_task1a_1 | Kim2021b | 8 | 0.793 | 75.0 | 0.851 | 73.6 | 0.744 | 76.2 | 0.745 | 74.7 | 0.791 | 75.3 | |
Kim_QTI_task1a_2 | Kim2021b | 1 | 0.724 | 76.1 | 0.766 | 74.5 | 0.689 | 77.4 | 0.657 | 76.2 | 0.727 | 76.2 | |
Kim_QTI_task1a_3 | Kim2021b | 2 | 0.735 | 76.1 | 0.792 | 75.2 | 0.687 | 76.9 | 0.647 | 78.0 | 0.746 | 75.9 | |
Kim_QTI_task1a_4 | Kim2021b | 5 | 0.764 | 75.2 | 0.832 | 73.3 | 0.708 | 76.8 | 0.713 | 74.6 | 0.771 | 75.3 | |
Koutini_CPJKU_task1a_1 | Koutini2021 | 14 | 0.883 | 70.9 | 1.051 | 66.4 | 0.743 | 74.6 | 0.776 | 74.1 | 0.898 | 70.1 | |
Koutini_CPJKU_task1a_2 | Koutini2021 | 10 | 0.842 | 71.8 | 0.976 | 68.2 | 0.730 | 74.8 | 0.805 | 71.3 | 0.848 | 71.8 | |
Koutini_CPJKU_task1a_3 | Koutini2021 | 9 | 0.834 | 72.1 | 0.947 | 69.6 | 0.740 | 74.2 | 0.742 | 73.6 | 0.844 | 72.0 | |
Koutini_CPJKU_task1a_4 | Koutini2021 | 11 | 0.847 | 71.8 | 0.970 | 69.3 | 0.744 | 74.0 | 0.737 | 74.2 | 0.864 | 71.5 | |
Lim_CAU_task1a_1 | Lim2021 | 90 | 1.956 | 67.5 | 2.767 | 62.2 | 1.280 | 71.9 | 1.910 | 65.0 | 1.913 | 68.2 | |
Lim_CAU_task1a_2 | Lim2021 | 91 | 2.010 | 67.9 | 2.892 | 62.3 | 1.275 | 72.6 | 1.945 | 65.5 | 1.996 | 68.5 | |
Lim_CAU_task1a_3 | Lim2021 | 80 | 1.479 | 68.5 | 1.892 | 64.1 | 1.134 | 72.2 | 1.374 | 66.5 | 1.500 | 69.1 | |
Lim_CAU_task1a_4 | Lim2021 | 93 | 2.039 | 65.8 | 2.998 | 60.1 | 1.240 | 70.5 | 1.996 | 64.3 | 2.025 | 65.7 | |
Liu_UESTC_task1a_1 | Liu2021 | 16 | 0.900 | 68.8 | 0.974 | 66.1 | 0.838 | 71.1 | 0.884 | 70.9 | 0.904 | 68.5 | |
Liu_UESTC_task1a_2 | Liu2021 | 15 | 0.895 | 68.2 | 0.955 | 66.4 | 0.844 | 69.7 | 0.859 | 69.8 | 0.902 | 67.8 | |
Liu_UESTC_task1a_3 | Liu2021 | 13 | 0.878 | 69.6 | 0.966 | 66.8 | 0.804 | 71.9 | 0.866 | 70.8 | 0.880 | 69.5 | |
Liu_UESTC_task1a_4 | Liu2021 | 87 | 1.626 | 42.0 | 1.756 | 38.3 | 1.519 | 45.0 | 1.622 | 42.0 | 1.632 | 41.9 | |
Madhu_CET_task1a_1 | Madhu2021 | 99 | 3.950 | 9.7 | 3.952 | 9.2 | 3.948 | 10.1 | 4.011 | 10.1 | 3.924 | 10.0 | |
DCASE2021 baseline | 1.730 | 45.6 | 2.222 | 38.0 | 1.320 | 51.9 | 1.802 | 43.6 | 1.702 | 45.5 | |||
Naranjo-Alcazar_ITI_task1a_1 | Naranjo-Alcazar2021_t1a | 51 | 1.140 | 60.2 | 1.348 | 53.4 | 0.967 | 65.9 | 1.091 | 61.0 | 1.139 | 60.6 | |
Pham_AIT_task1a_1 | Pham2021 | 73 | 1.368 | 67.5 | 1.653 | 64.3 | 1.130 | 70.1 | 1.302 | 67.1 | 1.341 | 67.9 | |
Pham_AIT_task1a_2 | Pham2021 | 54 | 1.187 | 68.4 | 1.398 | 64.8 | 1.011 | 71.3 | 1.069 | 71.5 | 1.180 | 68.5 | |
Pham_AIT_task1a_3 | Pham2021 | 94 | 2.058 | 69.6 | 2.497 | 66.1 | 1.693 | 72.6 | 1.843 | 71.7 | 2.034 | 69.8 | |
Phan_UIUC_task1a_1 | Phan2021 | 65 | 1.272 | 63.3 | 1.369 | 59.2 | 1.191 | 66.7 | 1.250 | 62.8 | 1.271 | 63.6 | |
Phan_UIUC_task1a_2 | Phan2021 | 71 | 1.335 | 63.3 | 1.419 | 59.2 | 1.265 | 66.7 | 1.316 | 62.8 | 1.334 | 63.6 | |
Phan_UIUC_task1a_3 | Phan2021 | 60 | 1.223 | 65.3 | 1.294 | 62.8 | 1.164 | 67.5 | 1.190 | 65.0 | 1.220 | 65.7 | |
Phan_UIUC_task1a_4 | Phan2021 | 66 | 1.292 | 65.3 | 1.351 | 62.8 | 1.242 | 67.5 | 1.265 | 65.0 | 1.289 | 65.7 | |
Puy_VAI_task1a_1 | Puy2021 | 24 | 0.952 | 66.6 | 1.159 | 59.7 | 0.779 | 72.4 | 0.948 | 66.1 | 0.947 | 66.8 | |
Puy_VAI_task1a_2 | Puy2021 | 27 | 0.974 | 65.4 | 1.152 | 59.4 | 0.825 | 70.5 | 0.999 | 64.0 | 0.971 | 65.8 | |
Puy_VAI_task1a_3 | Puy2021 | 22 | 0.939 | 66.2 | 1.116 | 60.1 | 0.791 | 71.2 | 0.932 | 65.2 | 0.934 | 66.1 | |
Qiao_NCUT_task1a_1 | Qiao2021 | 88 | 1.630 | 52.2 | 1.651 | 50.7 | 1.612 | 53.5 | 1.598 | 53.9 | 1.636 | 52.1 | |
Seo_SGU_task1a_1 | Seo2021 | 32 | 1.030 | 70.3 | 1.107 | 67.4 | 0.965 | 72.8 | 1.087 | 67.9 | 1.018 | 70.7 | |
Seo_SGU_task1a_2 | Seo2021 | 41 | 1.080 | 71.4 | 1.164 | 67.7 | 1.010 | 74.4 | 1.108 | 71.6 | 1.073 | 71.2 | |
Seo_SGU_task1a_3 | Seo2021 | 35 | 1.065 | 71.3 | 1.149 | 67.6 | 0.995 | 74.4 | 1.086 | 72.1 | 1.057 | 71.5 | |
Seo_SGU_task1a_4 | Seo2021 | 44 | 1.087 | 71.8 | 1.175 | 67.6 | 1.014 | 75.3 | 1.094 | 71.8 | 1.083 | 71.9 | |
Singh_IITMandi_task1a_1 | Singh2021 | 77 | 1.464 | 47.2 | 1.687 | 41.5 | 1.277 | 51.9 | 1.444 | 45.8 | 1.470 | 47.1 | |
Singh_IITMandi_task1a_2 | Singh2021 | 83 | 1.515 | 44.7 | 1.730 | 40.0 | 1.337 | 48.5 | 1.531 | 41.9 | 1.506 | 44.5 | |
Singh_IITMandi_task1a_3 | Singh2021 | 82 | 1.509 | 46.1 | 1.761 | 40.9 | 1.299 | 50.4 | 1.490 | 45.4 | 1.517 | 46.1 | |
Singh_IITMandi_task1a_4 | Singh2021 | 81 | 1.488 | 46.8 | 1.738 | 41.3 | 1.279 | 51.5 | 1.485 | 45.0 | 1.501 | 46.7 | |
Sugahara_RION_task1a_1 | Sugahara2021 | 43 | 1.087 | 63.8 | 1.247 | 57.8 | 0.953 | 68.8 | 1.110 | 65.4 | 1.078 | 63.9 | |
Sugahara_RION_task1a_2 | Sugahara2021 | 36 | 1.070 | 65.2 | 1.231 | 58.2 | 0.936 | 71.0 | 1.091 | 66.7 | 1.061 | 65.3 | |
Sugahara_RION_task1a_3 | Sugahara2021 | 31 | 1.024 | 65.3 | 1.159 | 60.8 | 0.912 | 69.1 | 1.022 | 66.2 | 1.021 | 65.5 | |
Sugahara_RION_task1a_4 | Sugahara2021 | 68 | 1.297 | 64.7 | 1.610 | 57.9 | 1.036 | 70.4 | 1.228 | 65.5 | 1.294 | 64.8 | |
Verbitskiy_DS_task1a_1 | Verbitskiy2021 | 48 | 1.127 | 61.4 | 1.305 | 55.5 | 0.978 | 66.2 | 1.204 | 57.9 | 1.107 | 62.2 | |
Verbitskiy_DS_task1a_2 | Verbitskiy2021 | 29 | 1.019 | 64.5 | 1.144 | 60.4 | 0.915 | 67.8 | 1.112 | 60.6 | 0.998 | 65.2 | |
Verbitskiy_DS_task1a_3 | Verbitskiy2021 | 26 | 0.966 | 67.3 | 1.102 | 63.1 | 0.852 | 70.8 | 1.059 | 64.6 | 0.946 | 67.8 | |
Verbitskiy_DS_task1a_4 | Verbitskiy2021 | 19 | 0.924 | 68.1 | 1.040 | 64.2 | 0.827 | 71.4 | 1.009 | 63.5 | 0.905 | 69.2 | |
Yang_GT_task1a_1 | Yang2021 | 6 | 0.768 | 73.1 | 0.846 | 70.8 | 0.703 | 74.9 | 0.825 | 73.5 | 0.753 | 72.5 | |
Yang_GT_task1a_2 | Yang2021 | 4 | 0.764 | 72.9 | 0.840 | 70.0 | 0.700 | 75.4 | 0.806 | 73.3 | 0.754 | 72.5 | |
Yang_GT_task1a_3 | Yang2021 | 3 | 0.758 | 72.9 | 0.832 | 70.1 | 0.696 | 75.1 | 0.805 | 73.2 | 0.748 | 72.5 | |
Yang_GT_task1a_4 | Yang2021 | 7 | 0.774 | 72.8 | 0.850 | 70.2 | 0.710 | 74.9 | 0.819 | 73.3 | 0.762 | 72.3 | |
Yihao_speakin_task1a_1 | Yihao2021 | 69 | 1.311 | 51.9 | 1.376 | 49.7 | 1.257 | 53.6 | 1.293 | 49.9 | 1.305 | 52.4 | |
Yihao_speakin_task1a_2 | Yihao2021 | 59 | 1.222 | 55.2 | 1.284 | 53.5 | 1.171 | 56.6 | 1.233 | 54.3 | 1.214 | 55.7 | |
Yihao_speakin_task1a_3 | Yihao2021 | 96 | 2.105 | 53.5 | 2.114 | 50.7 | 2.097 | 55.8 | 2.100 | 52.5 | 2.106 | 53.3 | |
Zhang_BUPT&BYTEDANCE_task1a_1 | Zhang2021 | 47 | 1.124 | 63.0 | 1.243 | 58.9 | 1.024 | 66.4 | 1.161 | 59.5 | 1.112 | 63.3 | |
Zhang_BUPT&BYTEDANCE_task1a_2 | Zhang2021 | 45 | 1.113 | 63.2 | 1.242 | 57.4 | 1.006 | 68.1 | 1.102 | 60.4 | 1.100 | 64.0 | |
Zhang_BUPT&BYTEDANCE_task1a_3 | Zhang2021 | 98 | 3.359 | 52.2 | 3.840 | 47.3 | 2.958 | 56.3 | 3.654 | 51.2 | 3.265 | 52.8 | |
Zhang_BUPT&BYTEDANCE_task1a_4 | Zhang2021 | 89 | 1.946 | 59.0 | 2.451 | 53.2 | 1.525 | 63.8 | 1.963 | 57.0 | 1.971 | 59.9 | |
Zhao_Maxvision_task1a_1 | Zhao2021 | 75 | 1.440 | 61.2 | 1.598 | 54.2 | 1.308 | 67.1 | 1.475 | 58.0 | 1.429 | 62.4 | |
Zhao_Maxvision_task1a_2 | Zhao2021 | 74 | 1.412 | 63.5 | 1.551 | 55.6 | 1.297 | 70.0 | 1.436 | 62.3 | 1.408 | 63.5 | |
Zhao_Maxvision_task1a_3 | Zhao2021 | 62 | 1.227 | 63.5 | 1.430 | 55.9 | 1.057 | 70.0 | 1.339 | 62.4 | 1.196 | 63.7 | |
Zhao_Maxvision_task1a_4 | Zhao2021 | 58 | 1.215 | 62.8 | 1.406 | 56.2 | 1.056 | 68.3 | 1.253 | 61.0 | 1.213 | 63.5 |
Class-wise performance
Log loss
Rank | Submission label |
Technical Report |
Official system rank |
Logloss | Airport | Bus | Metro |
Metro station |
Park |
Public square |
Shopping mall |
Street pedestrian |
Street traffic |
Tram |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Byttebier_IDLab_task1a_1 | Byttebier2021 | 21 | 0.936 | 1.393 | 0.431 | 0.937 | 0.977 | 0.355 | 1.681 | 0.840 | 1.695 | 0.312 | 0.740 | |
Byttebier_IDLab_task1a_2 | Byttebier2021 | 18 | 0.914 | 1.224 | 0.526 | 0.790 | 1.201 | 0.434 | 1.256 | 0.949 | 1.586 | 0.426 | 0.743 | |
Byttebier_IDLab_task1a_3 | Byttebier2021 | 23 | 0.944 | 1.287 | 0.473 | 0.850 | 1.191 | 0.348 | 1.480 | 0.891 | 1.687 | 0.393 | 0.844 | |
Byttebier_IDLab_task1a_4 | Byttebier2021 | 17 | 0.905 | 1.245 | 0.540 | 0.840 | 0.985 | 0.392 | 1.682 | 0.852 | 1.581 | 0.259 | 0.673 | |
Cao_SCUT_task1a_1 | Cao2021 | 49 | 1.136 | 1.461 | 1.007 | 1.169 | 1.343 | 0.753 | 1.485 | 1.006 | 1.576 | 0.618 | 0.937 | |
Cao_SCUT_task1a_2 | Cao2021 | 56 | 1.200 | 1.430 | 0.808 | 1.228 | 1.294 | 0.812 | 1.702 | 1.272 | 1.863 | 0.670 | 0.924 | |
Cao_SCUT_task1a_3 | Cao2021 | 50 | 1.137 | 1.456 | 0.919 | 1.180 | 1.191 | 0.829 | 1.583 | 1.041 | 1.603 | 0.735 | 0.839 | |
Cao_SCUT_task1a_4 | Cao2021 | 53 | 1.147 | 1.519 | 0.997 | 1.150 | 1.195 | 0.713 | 1.545 | 1.100 | 1.610 | 0.775 | 0.867 | |
Ding_TJU_task1a_1 | Ding2021 | 85 | 1.544 | 1.955 | 1.598 | 1.454 | 1.693 | 1.021 | 2.443 | 1.179 | 1.872 | 1.335 | 0.886 | |
Ding_TJU_task1a_2 | Ding2021 | 70 | 1.326 | 1.503 | 1.280 | 1.362 | 1.761 | 0.947 | 1.563 | 1.171 | 1.778 | 1.142 | 0.753 | |
Ding_TJU_task1a_3 | Ding2021 | 61 | 1.226 | 1.763 | 1.135 | 1.285 | 1.329 | 0.827 | 1.591 | 0.802 | 1.749 | 1.041 | 0.741 | |
Ding_TJU_task1a_4 | Ding2021 | 67 | 1.296 | 1.806 | 1.231 | 1.184 | 1.479 | 0.809 | 1.753 | 0.943 | 1.764 | 1.167 | 0.828 | |
Fan_NWPU_task1a_1 | Cui2021 | 64 | 1.261 | 1.754 | 0.695 | 1.316 | 1.439 | 0.936 | 1.926 | 1.254 | 2.392 | 0.478 | 0.423 | |
Galindo-Meza_ITESO_task1a_1 | Galindo-Meza2021 | 97 | 2.221 | 2.875 | 1.782 | 1.970 | 3.164 | 1.838 | 3.269 | 1.712 | 3.128 | 1.088 | 1.388 | |
Heo_Clova_task1a_1 | Hee-Soo2021 | 42 | 1.087 | 1.372 | 0.949 | 1.127 | 1.202 | 0.829 | 1.288 | 1.065 | 1.463 | 0.615 | 0.956 | |
Heo_Clova_task1a_2 | Hee-Soo2021 | 20 | 0.930 | 1.270 | 0.616 | 0.929 | 1.105 | 0.720 | 1.322 | 0.712 | 1.487 | 0.468 | 0.670 | |
Heo_Clova_task1a_3 | Hee-Soo2021 | 34 | 1.045 | 1.309 | 0.913 | 1.003 | 1.240 | 0.806 | 1.220 | 1.124 | 1.390 | 0.634 | 0.812 | |
Heo_Clova_task1a_4 | Hee-Soo2021 | 12 | 0.871 | 1.205 | 0.583 | 0.868 | 1.003 | 0.492 | 1.284 | 0.862 | 1.342 | 0.452 | 0.622 | |
Horváth_HIT_task1a_1 | Horvth2021 | 86 | 1.597 | 1.615 | 0.865 | 1.424 | 1.637 | 1.438 | 2.358 | 1.861 | 2.608 | 1.062 | 1.102 | |
Horváth_HIT_task1a_2 | Horvth2021 | 92 | 2.031 | 2.103 | 1.884 | 2.065 | 2.072 | 1.991 | 2.205 | 2.039 | 2.172 | 1.866 | 1.913 | |
Horváth_HIT_task1a_3 | Horvth2021 | 76 | 1.460 | 1.589 | 0.764 | 1.800 | 1.833 | 1.101 | 1.651 | 1.907 | 1.910 | 0.892 | 1.148 | |
Horváth_HIT_task1a_4 | Horvth2021 | 95 | 2.065 | 2.131 | 1.857 | 2.106 | 2.135 | 2.041 | 2.191 | 2.069 | 2.192 | 1.945 | 1.988 | |
Jeng_CHT+NSYSU_task1a_1 | Jeng2021 | 78 | 1.469 | 1.695 | 1.508 | 1.709 | 1.454 | 0.849 | 1.839 | 1.540 | 1.746 | 1.061 | 1.289 | |
Jeng_CHT+NSYSU_task1a_2 | Jeng2021 | 84 | 1.543 | 1.382 | 1.506 | 1.907 | 1.536 | 0.887 | 2.451 | 1.403 | 1.991 | 1.016 | 1.348 | |
Jeng_CHT+NSYSU_task1a_3 | Jeng2021 | 79 | 1.470 | 1.645 | 1.297 | 1.599 | 1.426 | 0.996 | 2.077 | 1.567 | 1.724 | 0.996 | 1.370 | |
Jeong_ETRI_task1a_1 | Jeong2021 | 33 | 1.041 | 1.574 | 0.455 | 1.086 | 1.276 | 0.487 | 1.666 | 1.139 | 1.473 | 0.666 | 0.590 | |
Jeong_ETRI_task1a_2 | Jeong2021 | 25 | 0.952 | 1.390 | 0.455 | 1.047 | 1.272 | 0.451 | 1.379 | 1.111 | 1.287 | 0.566 | 0.561 | |
Jeong_ETRI_task1a_3 | Jeong2021 | 30 | 1.023 | 1.457 | 0.548 | 1.132 | 1.344 | 0.359 | 1.095 | 1.357 | 1.581 | 0.622 | 0.733 | |
Jeong_ETRI_task1a_4 | Jeong2021 | 63 | 1.228 | 1.116 | 0.528 | 1.008 | 1.301 | 0.746 | 1.164 | 1.407 | 3.708 | 0.467 | 0.840 | |
Kek_NU_task1a_1 | Kek2021 | 72 | 1.355 | 1.619 | 1.049 | 1.385 | 1.587 | 0.948 | 1.775 | 1.364 | 1.740 | 0.916 | 1.164 | |
Kek_NU_task1a_2 | Kek2021 | 57 | 1.207 | 1.683 | 0.572 | 1.162 | 1.303 | 0.808 | 1.749 | 1.429 | 1.694 | 0.809 | 0.864 | |
Kim_3M_task1a_1 | Kim2021 | 38 | 1.076 | 1.241 | 0.851 | 0.958 | 1.540 | 0.705 | 1.488 | 1.000 | 1.385 | 0.800 | 0.796 | |
Kim_3M_task1a_2 | Kim2021 | 39 | 1.077 | 1.274 | 0.848 | 0.954 | 1.438 | 0.715 | 1.536 | 1.002 | 1.398 | 0.787 | 0.819 | |
Kim_3M_task1a_3 | Kim2021 | 37 | 1.076 | 1.243 | 0.848 | 0.942 | 1.514 | 0.744 | 1.473 | 1.028 | 1.391 | 0.797 | 0.777 | |
Kim_3M_task1a_4 | Kim2021 | 40 | 1.078 | 1.273 | 0.921 | 0.923 | 1.506 | 0.678 | 1.380 | 1.040 | 1.577 | 0.805 | 0.679 | |
Kim_KNU_task1a_1 | Kim2021a | 46 | 1.115 | 1.511 | 0.694 | 1.220 | 1.114 | 0.773 | 1.322 | 1.491 | 1.662 | 0.566 | 0.791 | |
Kim_KNU_task1a_2 | Kim2021a | 28 | 1.010 | 1.228 | 0.547 | 0.962 | 1.003 | 0.564 | 1.259 | 1.327 | 1.564 | 1.011 | 0.636 | |
Kim_KNU_task1a_3 | Kim2021a | 55 | 1.188 | 1.537 | 0.924 | 1.486 | 1.311 | 0.914 | 1.483 | 1.097 | 1.844 | 0.600 | 0.685 | |
Kim_KNU_task1a_4 | Kim2021a | 52 | 1.143 | 1.456 | 0.856 | 1.302 | 1.332 | 0.786 | 1.348 | 1.183 | 1.700 | 0.571 | 0.898 | |
Kim_QTI_task1a_1 | Kim2021b | 8 | 0.793 | 1.242 | 0.397 | 0.723 | 0.890 | 0.363 | 1.419 | 0.721 | 1.397 | 0.426 | 0.351 | |
Kim_QTI_task1a_2 | Kim2021b | 1 | 0.724 | 1.050 | 0.351 | 0.550 | 0.810 | 0.400 | 1.261 | 0.671 | 1.298 | 0.436 | 0.411 | |
Kim_QTI_task1a_3 | Kim2021b | 2 | 0.735 | 0.976 | 0.398 | 0.557 | 0.876 | 0.378 | 1.356 | 0.722 | 1.310 | 0.381 | 0.393 | |
Kim_QTI_task1a_4 | Kim2021b | 5 | 0.764 | 1.232 | 0.332 | 0.542 | 0.744 | 0.273 | 1.468 | 0.826 | 1.350 | 0.417 | 0.460 | |
Koutini_CPJKU_task1a_1 | Koutini2021 | 14 | 0.883 | 1.097 | 0.369 | 0.742 | 0.853 | 0.309 | 1.419 | 1.151 | 1.905 | 0.499 | 0.489 | |
Koutini_CPJKU_task1a_2 | Koutini2021 | 10 | 0.842 | 1.036 | 0.378 | 0.696 | 0.858 | 0.334 | 1.386 | 1.059 | 1.628 | 0.483 | 0.562 | |
Koutini_CPJKU_task1a_3 | Koutini2021 | 9 | 0.834 | 0.989 | 0.364 | 0.738 | 0.939 | 0.322 | 1.418 | 0.974 | 1.682 | 0.439 | 0.477 | |
Koutini_CPJKU_task1a_4 | Koutini2021 | 11 | 0.847 | 1.070 | 0.374 | 0.722 | 0.824 | 0.307 | 1.462 | 1.038 | 1.685 | 0.466 | 0.520 | |
Lim_CAU_task1a_1 | Lim2021 | 90 | 1.956 | 2.394 | 0.393 | 1.310 | 2.305 | 0.596 | 4.185 | 2.131 | 4.457 | 0.666 | 1.123 | |
Lim_CAU_task1a_2 | Lim2021 | 91 | 2.010 | 2.454 | 0.364 | 2.061 | 2.557 | 0.805 | 3.464 | 2.529 | 4.386 | 0.520 | 0.956 | |
Lim_CAU_task1a_3 | Lim2021 | 80 | 1.479 | 1.898 | 0.509 | 1.270 | 2.146 | 0.472 | 2.372 | 2.108 | 2.408 | 0.789 | 0.815 | |
Lim_CAU_task1a_4 | Lim2021 | 93 | 2.039 | 1.350 | 0.411 | 1.910 | 3.807 | 0.749 | 2.299 | 3.124 | 4.841 | 0.828 | 1.073 | |
Liu_UESTC_task1a_1 | Liu2021 | 16 | 0.900 | 1.209 | 0.543 | 0.708 | 1.073 | 0.597 | 1.192 | 1.101 | 1.438 | 0.363 | 0.775 | |
Liu_UESTC_task1a_2 | Liu2021 | 15 | 0.895 | 1.024 | 0.522 | 0.987 | 0.901 | 0.463 | 1.299 | 0.992 | 1.524 | 0.465 | 0.768 | |
Liu_UESTC_task1a_3 | Liu2021 | 13 | 0.878 | 1.498 | 0.600 | 0.867 | 0.918 | 0.468 | 1.138 | 0.907 | 1.496 | 0.409 | 0.475 | |
Liu_UESTC_task1a_4 | Liu2021 | 87 | 1.626 | 1.626 | 2.583 | 1.539 | 1.375 | 1.413 | 2.058 | 1.112 | 1.931 | 1.040 | 1.587 | |
Madhu_CET_task1a_1 | Madhu2021 | 99 | 3.950 | 4.120 | 3.971 | 4.412 | 3.673 | 4.147 | 3.229 | 4.169 | 3.351 | 4.580 | 3.845 | |
DCASE2021 baseline | 1.730 | 2.077 | 1.615 | 1.159 | 1.955 | 2.173 | 2.455 | 1.227 | 1.744 | 1.825 | 1.073 | |||
Naranjo-Alcazar_ITI_task1a_1 | Naranjo-Alcazar2021_t1a | 51 | 1.140 | 1.346 | 1.046 | 1.057 | 0.809 | 0.875 | 1.569 | 1.491 | 1.352 | 1.040 | 0.817 | |
Pham_AIT_task1a_1 | Pham2021 | 73 | 1.368 | 1.380 | 0.624 | 1.154 | 1.791 | 0.608 | 2.558 | 1.565 | 2.133 | 0.921 | 0.942 | |
Pham_AIT_task1a_2 | Pham2021 | 54 | 1.187 | 1.403 | 1.045 | 1.093 | 1.035 | 0.510 | 1.658 | 1.212 | 2.524 | 1.001 | 0.385 | |
Pham_AIT_task1a_3 | Pham2021 | 94 | 2.058 | 2.187 | 1.302 | 1.749 | 2.298 | 0.853 | 3.526 | 2.204 | 3.980 | 1.616 | 0.870 | |
Phan_UIUC_task1a_1 | Phan2021 | 65 | 1.272 | 1.429 | 1.095 | 1.198 | 1.457 | 0.902 | 1.693 | 1.201 | 1.756 | 0.853 | 1.136 | |
Phan_UIUC_task1a_2 | Phan2021 | 71 | 1.335 | 1.499 | 1.195 | 1.301 | 1.496 | 0.991 | 1.707 | 1.278 | 1.764 | 0.907 | 1.211 | |
Phan_UIUC_task1a_3 | Phan2021 | 60 | 1.223 | 1.325 | 0.947 | 1.358 | 1.518 | 0.933 | 1.475 | 1.199 | 1.607 | 0.812 | 1.052 | |
Phan_UIUC_task1a_4 | Phan2021 | 66 | 1.292 | 1.423 | 1.061 | 1.414 | 1.560 | 1.014 | 1.515 | 1.265 | 1.654 | 0.865 | 1.147 | |
Puy_VAI_task1a_1 | Puy2021 | 24 | 0.952 | 1.536 | 0.404 | 1.053 | 1.072 | 0.480 | 1.468 | 1.038 | 1.485 | 0.437 | 0.546 | |
Puy_VAI_task1a_2 | Puy2021 | 27 | 0.974 | 1.353 | 0.638 | 1.010 | 1.175 | 0.448 | 1.394 | 1.232 | 1.395 | 0.556 | 0.536 | |
Puy_VAI_task1a_3 | Puy2021 | 22 | 0.939 | 1.499 | 0.486 | 0.959 | 1.049 | 0.501 | 1.339 | 1.045 | 1.322 | 0.601 | 0.588 | |
Qiao_NCUT_task1a_1 | Qiao2021 | 88 | 1.630 | 1.665 | 1.313 | 2.005 | 2.381 | 1.075 | 1.782 | 1.616 | 1.844 | 1.150 | 1.468 | |
Seo_SGU_task1a_1 | Seo2021 | 32 | 1.030 | 1.502 | 0.735 | 1.013 | 1.042 | 0.634 | 1.515 | 0.965 | 1.606 | 0.530 | 0.755 | |
Seo_SGU_task1a_2 | Seo2021 | 41 | 1.080 | 1.478 | 0.849 | 1.064 | 1.143 | 0.710 | 1.438 | 1.096 | 1.580 | 0.630 | 0.814 | |
Seo_SGU_task1a_3 | Seo2021 | 35 | 1.065 | 1.312 | 0.853 | 1.116 | 1.139 | 0.666 | 1.507 | 1.104 | 1.558 | 0.572 | 0.821 | |
Seo_SGU_task1a_4 | Seo2021 | 44 | 1.087 | 1.530 | 0.911 | 1.206 | 1.070 | 0.742 | 1.448 | 1.025 | 1.485 | 0.649 | 0.807 | |
Singh_IITMandi_task1a_1 | Singh2021 | 77 | 1.464 | 1.564 | 1.549 | 1.114 | 1.661 | 1.341 | 2.025 | 1.363 | 1.564 | 1.399 | 1.056 | |
Singh_IITMandi_task1a_2 | Singh2021 | 83 | 1.515 | 1.647 | 1.437 | 1.459 | 1.598 | 1.629 | 1.774 | 1.206 | 1.516 | 1.764 | 1.122 | |
Singh_IITMandi_task1a_3 | Singh2021 | 82 | 1.509 | 1.612 | 1.466 | 1.418 | 1.606 | 1.450 | 2.018 | 1.300 | 1.596 | 1.680 | 0.945 | |
Singh_IITMandi_task1a_4 | Singh2021 | 81 | 1.488 | 1.811 | 1.506 | 1.398 | 1.489 | 1.262 | 2.018 | 1.254 | 1.739 | 1.437 | 0.963 | |
Sugahara_RION_task1a_1 | Sugahara2021 | 43 | 1.087 | 1.687 | 0.924 | 0.892 | 1.182 | 0.544 | 1.433 | 1.316 | 1.382 | 0.608 | 0.902 | |
Sugahara_RION_task1a_2 | Sugahara2021 | 36 | 1.070 | 1.636 | 0.841 | 0.940 | 1.166 | 0.500 | 1.472 | 1.287 | 1.393 | 0.596 | 0.873 | |
Sugahara_RION_task1a_3 | Sugahara2021 | 31 | 1.024 | 1.318 | 0.613 | 1.139 | 1.103 | 0.446 | 1.698 | 1.124 | 1.539 | 0.543 | 0.720 | |
Sugahara_RION_task1a_4 | Sugahara2021 | 68 | 1.297 | 1.969 | 0.643 | 1.118 | 1.681 | 0.307 | 2.254 | 1.677 | 1.718 | 0.739 | 0.862 | |
Verbitskiy_DS_task1a_1 | Verbitskiy2021 | 48 | 1.127 | 1.466 | 0.827 | 0.778 | 1.045 | 0.799 | 1.855 | 1.136 | 1.985 | 0.607 | 0.771 | |
Verbitskiy_DS_task1a_2 | Verbitskiy2021 | 29 | 1.019 | 1.255 | 0.784 | 0.763 | 0.886 | 0.615 | 1.698 | 1.038 | 2.023 | 0.496 | 0.635 | |
Verbitskiy_DS_task1a_3 | Verbitskiy2021 | 26 | 0.966 | 1.148 | 0.558 | 0.838 | 0.985 | 0.512 | 1.572 | 1.005 | 1.935 | 0.499 | 0.603 | |
Verbitskiy_DS_task1a_4 | Verbitskiy2021 | 19 | 0.924 | 1.037 | 0.484 | 0.789 | 0.984 | 0.475 | 1.609 | 0.844 | 1.970 | 0.478 | 0.568 | |
Yang_GT_task1a_1 | Yang2021 | 6 | 0.768 | 1.092 | 0.316 | 0.728 | 0.885 | 0.431 | 1.080 | 0.722 | 1.559 | 0.430 | 0.438 | |
Yang_GT_task1a_2 | Yang2021 | 4 | 0.764 | 0.935 | 0.277 | 0.752 | 0.907 | 0.414 | 1.064 | 0.780 | 1.626 | 0.392 | 0.491 | |
Yang_GT_task1a_3 | Yang2021 | 3 | 0.758 | 0.975 | 0.282 | 0.737 | 0.907 | 0.406 | 1.065 | 0.762 | 1.588 | 0.387 | 0.473 | |
Yang_GT_task1a_4 | Yang2021 | 7 | 0.774 | 1.001 | 0.305 | 0.733 | 0.896 | 0.435 | 1.086 | 0.752 | 1.631 | 0.428 | 0.469 | |
Yihao_speakin_task1a_1 | Yihao2021 | 69 | 1.311 | 1.460 | 1.325 | 1.242 | 1.215 | 1.132 | 2.075 | 0.976 | 1.814 | 0.655 | 1.218 | |
Yihao_speakin_task1a_2 | Yihao2021 | 59 | 1.222 | 1.282 | 1.399 | 1.119 | 1.422 | 1.081 | 1.635 | 1.074 | 1.594 | 0.579 | 1.040 | |
Yihao_speakin_task1a_3 | Yihao2021 | 96 | 2.105 | 2.106 | 2.136 | 2.098 | 2.111 | 2.063 | 2.174 | 2.104 | 2.162 | 2.019 | 2.074 | |
Zhang_BUPT&BYTEDANCE_task1a_1 | Zhang2021 | 47 | 1.124 | 1.296 | 1.315 | 0.976 | 1.177 | 0.798 | 1.431 | 0.998 | 1.639 | 0.704 | 0.902 | |
Zhang_BUPT&BYTEDANCE_task1a_2 | Zhang2021 | 45 | 1.113 | 1.210 | 1.270 | 0.900 | 1.182 | 0.659 | 1.478 | 1.197 | 1.601 | 0.740 | 0.893 | |
Zhang_BUPT&BYTEDANCE_task1a_3 | Zhang2021 | 98 | 3.359 | 4.982 | 4.172 | 3.030 | 3.876 | 2.235 | 3.320 | 3.571 | 4.635 | 1.760 | 2.007 | |
Zhang_BUPT&BYTEDANCE_task1a_4 | Zhang2021 | 89 | 1.946 | 2.475 | 2.964 | 1.691 | 2.480 | 1.237 | 1.717 | 1.816 | 2.932 | 1.210 | 0.938 | |
Zhao_Maxvision_task1a_1 | Zhao2021 | 75 | 1.440 | 1.560 | 1.115 | 1.549 | 1.602 | 1.040 | 1.724 | 1.513 | 1.974 | 1.043 | 1.281 | |
Zhao_Maxvision_task1a_2 | Zhao2021 | 74 | 1.412 | 1.598 | 1.157 | 1.463 | 1.682 | 1.199 | 1.620 | 1.360 | 1.797 | 1.047 | 1.199 | |
Zhao_Maxvision_task1a_3 | Zhao2021 | 62 | 1.227 | 1.443 | 0.804 | 1.121 | 1.520 | 0.985 | 1.466 | 1.196 | 2.055 | 0.835 | 0.840 | |
Zhao_Maxvision_task1a_4 | Zhao2021 | 58 | 1.215 | 1.337 | 0.760 | 1.072 | 1.399 | 1.096 | 1.615 | 0.989 | 2.056 | 0.908 | 0.919 |
Accuracy
Rank | Submission label |
Technical Report |
Official system rank |
Accuracy | Airport | Bus | Metro |
Metro station |
Park |
Public square |
Shopping mall |
Street pedestrian |
Street traffic |
Tram |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Byttebier_IDLab_task1a_1 | Byttebier2021 | 21 | 68.6 | 49.0 | 88.1 | 66.8 | 66.8 | 90.0 | 40.7 | 73.7 | 41.5 | 93.6 | 75.9 | |
Byttebier_IDLab_task1a_2 | Byttebier2021 | 18 | 67.5 | 53.8 | 81.8 | 73.2 | 63.3 | 83.6 | 51.8 | 63.0 | 42.9 | 88.5 | 73.5 | |
Byttebier_IDLab_task1a_3 | Byttebier2021 | 23 | 68.5 | 57.2 | 83.7 | 72.5 | 63.5 | 88.4 | 50.5 | 67.6 | 41.7 | 89.9 | 70.5 | |
Byttebier_IDLab_task1a_4 | Byttebier2021 | 17 | 68.8 | 54.2 | 81.7 | 69.9 | 67.3 | 87.4 | 38.4 | 72.6 | 45.3 | 93.7 | 77.4 | |
Cao_SCUT_task1a_1 | Cao2021 | 49 | 66.7 | 43.7 | 77.0 | 67.2 | 59.3 | 84.1 | 49.9 | 77.3 | 41.9 | 85.0 | 81.6 | |
Cao_SCUT_task1a_2 | Cao2021 | 56 | 64.6 | 53.2 | 84.7 | 62.1 | 64.3 | 81.9 | 42.4 | 63.1 | 30.2 | 87.4 | 76.4 | |
Cao_SCUT_task1a_3 | Cao2021 | 50 | 67.2 | 46.0 | 80.4 | 67.0 | 66.0 | 83.2 | 44.4 | 72.7 | 41.2 | 84.5 | 86.1 | |
Cao_SCUT_task1a_4 | Cao2021 | 53 | 66.1 | 41.7 | 75.3 | 64.5 | 68.7 | 86.2 | 45.1 | 70.8 | 42.7 | 80.1 | 86.0 | |
Ding_TJU_task1a_1 | Ding2021 | 85 | 53.0 | 32.4 | 50.5 | 53.8 | 47.2 | 72.5 | 32.3 | 66.4 | 36.2 | 67.3 | 71.6 | |
Ding_TJU_task1a_2 | Ding2021 | 70 | 51.1 | 33.7 | 46.3 | 39.8 | 38.5 | 73.9 | 42.8 | 65.2 | 29.8 | 65.3 | 75.4 | |
Ding_TJU_task1a_3 | Ding2021 | 61 | 49.1 | 19.8 | 39.0 | 28.7 | 45.1 | 68.8 | 37.1 | 86.7 | 17.3 | 68.8 | 79.3 | |
Ding_TJU_task1a_4 | Ding2021 | 67 | 51.4 | 23.6 | 47.9 | 44.2 | 43.7 | 76.5 | 37.1 | 73.7 | 28.3 | 65.8 | 73.2 | |
Fan_NWPU_task1a_1 | Cui2021 | 64 | 68.3 | 53.2 | 81.9 | 61.6 | 63.9 | 81.3 | 49.1 | 64.4 | 54.5 | 88.0 | 84.8 | |
Galindo-Meza_ITESO_task1a_1 | Galindo-Meza2021 | 97 | 53.9 | 38.1 | 60.0 | 51.9 | 43.1 | 62.8 | 38.1 | 64.5 | 36.9 | 78.4 | 65.2 | |
Heo_Clova_task1a_1 | Hee-Soo2021 | 42 | 67.0 | 46.6 | 74.9 | 65.4 | 65.8 | 76.9 | 58.1 | 70.1 | 49.2 | 88.1 | 75.0 | |
Heo_Clova_task1a_2 | Hee-Soo2021 | 20 | 66.9 | 50.0 | 78.5 | 66.7 | 60.6 | 77.3 | 51.3 | 77.3 | 43.8 | 87.1 | 76.4 | |
Heo_Clova_task1a_3 | Hee-Soo2021 | 34 | 70.0 | 52.9 | 77.1 | 72.1 | 63.3 | 80.7 | 61.4 | 68.8 | 55.7 | 88.6 | 79.8 | |
Heo_Clova_task1a_4 | Hee-Soo2021 | 12 | 70.1 | 54.3 | 82.6 | 68.3 | 63.8 | 86.0 | 54.7 | 70.8 | 51.3 | 88.6 | 80.8 | |
Horváth_HIT_task1a_1 | Horvth2021 | 86 | 51.4 | 38.5 | 72.1 | 45.8 | 46.7 | 65.7 | 32.8 | 53.4 | 30.2 | 67.0 | 61.9 | |
Horváth_HIT_task1a_2 | Horvth2021 | 92 | 53.3 | 41.8 | 73.9 | 48.6 | 48.9 | 57.3 | 28.5 | 51.9 | 35.1 | 77.8 | 69.7 | |
Horváth_HIT_task1a_3 | Horvth2021 | 76 | 51.6 | 39.1 | 79.4 | 34.3 | 39.1 | 63.5 | 34.2 | 57.3 | 39.9 | 71.8 | 57.3 | |
Horváth_HIT_task1a_4 | Horvth2021 | 95 | 49.2 | 37.9 | 81.1 | 39.4 | 37.5 | 54.7 | 32.7 | 52.7 | 30.4 | 66.7 | 58.7 | |
Jeng_CHT+NSYSU_task1a_1 | Jeng2021 | 78 | 55.0 | 38.4 | 54.5 | 40.3 | 59.5 | 85.4 | 33.0 | 51.4 | 49.0 | 74.2 | 64.0 | |
Jeng_CHT+NSYSU_task1a_2 | Jeng2021 | 84 | 51.3 | 59.7 | 57.1 | 30.4 | 53.3 | 84.3 | 0.0 | 57.1 | 29.8 | 76.4 | 64.8 | |
Jeng_CHT+NSYSU_task1a_3 | Jeng2021 | 79 | 56.3 | 43.6 | 67.8 | 49.9 | 60.2 | 76.4 | 24.0 | 51.4 | 47.5 | 79.4 | 62.5 | |
Jeong_ETRI_task1a_1 | Jeong2021 | 33 | 66.0 | 45.6 | 88.8 | 57.8 | 59.7 | 88.3 | 43.2 | 64.0 | 51.3 | 81.8 | 79.3 | |
Jeong_ETRI_task1a_2 | Jeong2021 | 25 | 67.0 | 48.0 | 86.2 | 61.4 | 55.4 | 88.4 | 46.3 | 63.6 | 54.7 | 84.8 | 80.8 | |
Jeong_ETRI_task1a_3 | Jeong2021 | 30 | 66.7 | 47.6 | 85.0 | 60.0 | 59.1 | 88.6 | 60.7 | 61.7 | 51.8 | 80.3 | 72.2 | |
Jeong_ETRI_task1a_4 | Jeong2021 | 63 | 66.1 | 56.8 | 83.8 | 60.0 | 60.0 | 82.8 | 59.1 | 55.1 | 48.0 | 84.3 | 71.3 | |
Kek_NU_task1a_1 | Kek2021 | 72 | 66.8 | 48.1 | 89.8 | 62.6 | 58.1 | 89.9 | 43.4 | 68.2 | 43.7 | 85.9 | 77.9 | |
Kek_NU_task1a_2 | Kek2021 | 57 | 63.5 | 45.5 | 88.9 | 64.3 | 60.0 | 78.8 | 41.2 | 59.0 | 44.7 | 77.0 | 75.9 | |
Kim_3M_task1a_1 | Kim2021 | 38 | 61.5 | 53.5 | 71.0 | 66.2 | 48.5 | 77.9 | 46.0 | 61.9 | 44.9 | 76.1 | 69.1 | |
Kim_3M_task1a_2 | Kim2021 | 39 | 61.6 | 52.8 | 71.0 | 64.9 | 51.8 | 78.2 | 44.3 | 63.8 | 45.2 | 76.4 | 67.6 | |
Kim_3M_task1a_3 | Kim2021 | 37 | 62.0 | 53.8 | 70.7 | 66.8 | 49.9 | 77.3 | 46.8 | 62.6 | 45.6 | 76.3 | 70.7 | |
Kim_3M_task1a_4 | Kim2021 | 40 | 61.3 | 52.5 | 68.3 | 66.5 | 48.5 | 77.5 | 49.2 | 62.1 | 39.4 | 74.9 | 73.7 | |
Kim_KNU_task1a_1 | Kim2021a | 46 | 64.7 | 52.8 | 79.2 | 57.4 | 63.4 | 76.8 | 53.7 | 53.4 | 48.5 | 86.7 | 75.1 | |
Kim_KNU_task1a_2 | Kim2021a | 28 | 63.8 | 54.5 | 82.1 | 62.6 | 65.5 | 81.9 | 53.4 | 52.1 | 41.4 | 68.8 | 75.6 | |
Kim_KNU_task1a_3 | Kim2021a | 55 | 61.3 | 48.4 | 68.2 | 46.6 | 55.3 | 75.3 | 48.6 | 65.3 | 38.9 | 85.5 | 81.4 | |
Kim_KNU_task1a_4 | Kim2021a | 52 | 62.9 | 46.8 | 74.0 | 53.9 | 56.3 | 78.4 | 52.4 | 64.3 | 45.2 | 87.1 | 70.6 | |
Kim_QTI_task1a_1 | Kim2021b | 8 | 75.0 | 60.0 | 89.3 | 75.9 | 74.4 | 89.5 | 53.3 | 78.3 | 53.8 | 87.4 | 88.3 | |
Kim_QTI_task1a_2 | Kim2021b | 1 | 76.1 | 62.4 | 90.7 | 80.7 | 74.4 | 87.8 | 56.2 | 77.9 | 57.6 | 86.7 | 86.6 | |
Kim_QTI_task1a_3 | Kim2021b | 2 | 76.1 | 67.4 | 89.1 | 80.2 | 74.2 | 88.9 | 53.5 | 74.7 | 57.7 | 88.6 | 86.7 | |
Kim_QTI_task1a_4 | Kim2021b | 5 | 75.2 | 59.7 | 91.4 | 80.2 | 76.4 | 92.7 | 50.3 | 73.4 | 55.9 | 86.9 | 85.2 | |
Koutini_CPJKU_task1a_1 | Koutini2021 | 14 | 70.9 | 61.4 | 87.2 | 72.7 | 73.1 | 90.3 | 53.0 | 59.8 | 43.1 | 84.8 | 83.1 | |
Koutini_CPJKU_task1a_2 | Koutini2021 | 10 | 71.8 | 61.6 | 87.8 | 74.2 | 72.0 | 90.5 | 53.5 | 66.2 | 46.8 | 85.7 | 79.7 | |
Koutini_CPJKU_task1a_3 | Koutini2021 | 9 | 72.1 | 63.3 | 89.1 | 72.7 | 69.4 | 90.0 | 53.0 | 66.5 | 46.7 | 87.0 | 83.2 | |
Koutini_CPJKU_task1a_4 | Koutini2021 | 11 | 71.8 | 62.4 | 88.3 | 74.6 | 71.8 | 90.8 | 52.4 | 65.0 | 44.1 | 87.2 | 81.8 | |
Lim_CAU_task1a_1 | Lim2021 | 90 | 67.5 | 54.7 | 88.3 | 68.3 | 63.5 | 85.1 | 42.4 | 63.8 | 52.9 | 84.8 | 71.2 | |
Lim_CAU_task1a_2 | Lim2021 | 91 | 67.9 | 55.7 | 91.0 | 61.7 | 62.8 | 81.3 | 44.7 | 63.8 | 54.2 | 86.0 | 78.2 | |
Lim_CAU_task1a_3 | Lim2021 | 80 | 68.5 | 53.4 | 88.0 | 66.9 | 64.1 | 85.6 | 47.6 | 64.8 | 54.7 | 84.7 | 75.4 | |
Lim_CAU_task1a_4 | Lim2021 | 93 | 65.8 | 60.5 | 89.4 | 57.6 | 56.3 | 83.3 | 49.5 | 58.2 | 52.0 | 79.5 | 71.2 | |
Liu_UESTC_task1a_1 | Liu2021 | 16 | 68.8 | 54.2 | 83.7 | 78.3 | 61.2 | 79.5 | 57.7 | 63.1 | 49.0 | 90.3 | 71.2 | |
Liu_UESTC_task1a_2 | Liu2021 | 15 | 68.2 | 64.8 | 84.2 | 61.1 | 68.3 | 84.7 | 51.4 | 64.8 | 41.7 | 86.9 | 74.4 | |
Liu_UESTC_task1a_3 | Liu2021 | 13 | 69.6 | 43.2 | 80.6 | 65.7 | 69.6 | 85.4 | 59.2 | 72.9 | 45.3 | 89.1 | 85.1 | |
Liu_UESTC_task1a_4 | Liu2021 | 87 | 42.0 | 29.4 | 11.7 | 41.8 | 53.5 | 54.4 | 29.7 | 64.9 | 24.1 | 68.6 | 41.5 | |
Madhu_CET_task1a_1 | Madhu2021 | 99 | 9.7 | 5.7 | 13.8 | 6.6 | 9.6 | 12.4 | 9.5 | 10.7 | 11.5 | 7.8 | 9.3 | |
DCASE2021 baseline | 45.6 | 24.0 | 44.6 | 54.4 | 37.8 | 52.7 | 24.4 | 63.8 | 39.9 | 56.4 | 58.1 | |||
Naranjo-Alcazar_ITI_task1a_1 | Naranjo-Alcazar2021_t1a | 51 | 60.2 | 45.1 | 64.1 | 57.6 | 74.6 | 77.7 | 38.0 | 50.4 | 53.2 | 70.5 | 71.3 | |
Pham_AIT_task1a_1 | Pham2021 | 73 | 67.5 | 59.7 | 86.2 | 67.4 | 60.5 | 86.2 | 43.9 | 69.4 | 50.0 | 79.9 | 71.3 | |
Pham_AIT_task1a_2 | Pham2021 | 54 | 68.4 | 58.8 | 75.1 | 63.3 | 72.7 | 87.5 | 57.1 | 67.4 | 38.1 | 77.4 | 86.4 | |
Pham_AIT_task1a_3 | Pham2021 | 94 | 69.6 | 63.0 | 82.7 | 67.7 | 68.9 | 88.4 | 52.0 | 70.3 | 44.4 | 79.3 | 79.5 | |
Phan_UIUC_task1a_1 | Phan2021 | 65 | 63.3 | 48.2 | 75.9 | 59.3 | 57.8 | 85.0 | 42.2 | 70.6 | 39.6 | 85.6 | 69.1 | |
Phan_UIUC_task1a_2 | Phan2021 | 71 | 63.3 | 48.2 | 75.9 | 59.3 | 57.8 | 85.0 | 42.2 | 70.6 | 39.6 | 85.6 | 69.1 | |
Phan_UIUC_task1a_3 | Phan2021 | 60 | 65.3 | 55.7 | 83.5 | 48.5 | 54.0 | 83.5 | 48.6 | 70.2 | 45.7 | 86.6 | 77.1 | |
Phan_UIUC_task1a_4 | Phan2021 | 66 | 65.3 | 55.7 | 83.5 | 48.5 | 54.0 | 83.5 | 48.6 | 70.2 | 45.7 | 86.6 | 77.1 | |
Puy_VAI_task1a_1 | Puy2021 | 24 | 66.6 | 41.5 | 86.5 | 59.1 | 66.3 | 85.0 | 48.4 | 64.3 | 49.7 | 86.0 | 79.3 | |
Puy_VAI_task1a_2 | Puy2021 | 27 | 65.4 | 45.5 | 76.1 | 61.9 | 63.1 | 87.0 | 49.9 | 55.9 | 49.7 | 83.3 | 81.9 | |
Puy_VAI_task1a_3 | Puy2021 | 22 | 66.2 | 40.0 | 84.8 | 60.9 | 65.0 | 85.9 | 48.2 | 66.0 | 52.1 | 80.2 | 78.4 | |
Qiao_NCUT_task1a_1 | Qiao2021 | 88 | 52.2 | 41.5 | 67.8 | 37.8 | 14.9 | 87.9 | 40.5 | 60.2 | 36.7 | 81.3 | 53.5 | |
Seo_SGU_task1a_1 | Seo2021 | 32 | 70.3 | 50.9 | 83.5 | 71.7 | 69.7 | 86.0 | 50.5 | 73.4 | 48.2 | 88.4 | 81.2 | |
Seo_SGU_task1a_2 | Seo2021 | 41 | 71.4 | 52.3 | 83.5 | 74.4 | 71.0 | 85.7 | 54.9 | 71.0 | 50.3 | 88.1 | 82.6 | |
Seo_SGU_task1a_3 | Seo2021 | 35 | 71.3 | 59.8 | 82.7 | 71.2 | 69.9 | 88.1 | 50.1 | 70.2 | 48.9 | 88.8 | 83.5 | |
Seo_SGU_task1a_4 | Seo2021 | 44 | 71.8 | 47.6 | 83.1 | 67.7 | 75.6 | 86.1 | 53.0 | 74.1 | 55.9 | 87.8 | 87.4 | |
Singh_IITMandi_task1a_1 | Singh2021 | 77 | 47.2 | 25.6 | 38.5 | 57.2 | 43.1 | 62.9 | 31.3 | 62.4 | 37.8 | 60.7 | 52.3 | |
Singh_IITMandi_task1a_2 | Singh2021 | 83 | 44.7 | 25.5 | 48.1 | 40.3 | 42.6 | 53.3 | 30.7 | 66.8 | 38.4 | 49.5 | 51.5 | |
Singh_IITMandi_task1a_3 | Singh2021 | 82 | 46.1 | 29.0 | 44.9 | 44.4 | 40.3 | 61.0 | 24.4 | 63.8 | 38.5 | 50.5 | 64.0 | |
Singh_IITMandi_task1a_4 | Singh2021 | 81 | 46.8 | 21.6 | 43.3 | 42.8 | 45.3 | 62.9 | 26.8 | 68.9 | 33.3 | 60.4 | 63.1 | |
Sugahara_RION_task1a_1 | Sugahara2021 | 43 | 63.8 | 29.3 | 69.7 | 75.1 | 64.3 | 89.5 | 44.4 | 55.9 | 55.3 | 85.6 | 69.2 | |
Sugahara_RION_task1a_2 | Sugahara2021 | 36 | 65.2 | 33.2 | 76.9 | 71.2 | 66.5 | 91.5 | 41.8 | 56.3 | 56.8 | 84.8 | 73.0 | |
Sugahara_RION_task1a_3 | Sugahara2021 | 31 | 65.3 | 51.4 | 85.7 | 57.1 | 66.0 | 88.1 | 34.6 | 63.4 | 47.5 | 83.2 | 76.0 | |
Sugahara_RION_task1a_4 | Sugahara2021 | 68 | 64.7 | 39.3 | 80.9 | 66.5 | 52.8 | 91.8 | 46.5 | 54.8 | 58.5 | 80.6 | 75.6 | |
Verbitskiy_DS_task1a_1 | Verbitskiy2021 | 48 | 61.4 | 45.3 | 69.9 | 71.8 | 64.4 | 77.9 | 38.9 | 63.1 | 29.7 | 82.7 | 69.8 | |
Verbitskiy_DS_task1a_2 | Verbitskiy2021 | 29 | 64.5 | 50.4 | 73.9 | 70.3 | 68.8 | 82.7 | 43.3 | 65.9 | 26.8 | 86.0 | 76.6 | |
Verbitskiy_DS_task1a_3 | Verbitskiy2021 | 26 | 67.3 | 56.6 | 81.6 | 68.9 | 67.9 | 87.1 | 46.7 | 67.2 | 33.3 | 86.1 | 77.8 | |
Verbitskiy_DS_task1a_4 | Verbitskiy2021 | 19 | 68.1 | 61.1 | 82.8 | 70.7 | 67.6 | 88.6 | 46.2 | 72.0 | 27.0 | 86.6 | 78.3 | |
Yang_GT_task1a_1 | Yang2021 | 6 | 73.1 | 57.7 | 91.0 | 71.7 | 66.9 | 86.6 | 56.3 | 76.0 | 48.7 | 89.1 | 86.4 | |
Yang_GT_task1a_2 | Yang2021 | 4 | 72.9 | 64.6 | 91.9 | 70.8 | 67.3 | 87.0 | 58.2 | 72.0 | 44.7 | 89.9 | 82.7 | |
Yang_GT_task1a_3 | Yang2021 | 3 | 72.9 | 62.9 | 91.4 | 71.2 | 66.9 | 87.0 | 57.1 | 72.9 | 46.2 | 89.8 | 83.3 | |
Yang_GT_task1a_4 | Yang2021 | 7 | 72.8 | 61.7 | 90.9 | 71.8 | 67.0 | 85.6 | 56.7 | 74.1 | 47.0 | 89.0 | 84.1 | |
Yihao_speakin_task1a_1 | Yihao2021 | 69 | 51.9 | 41.2 | 54.3 | 53.4 | 57.6 | 63.0 | 25.0 | 67.7 | 28.0 | 79.8 | 48.6 | |
Yihao_speakin_task1a_2 | Yihao2021 | 59 | 55.2 | 49.9 | 49.0 | 59.6 | 51.1 | 61.9 | 38.0 | 64.4 | 36.4 | 81.6 | 60.5 | |
Yihao_speakin_task1a_3 | Yihao2021 | 96 | 53.5 | 48.6 | 48.6 | 56.7 | 53.4 | 67.0 | 36.9 | 49.6 | 33.7 | 77.5 | 62.9 | |
Zhang_BUPT&BYTEDANCE_task1a_1 | Zhang2021 | 47 | 63.0 | 52.5 | 63.0 | 66.2 | 64.3 | 78.3 | 49.1 | 67.0 | 39.3 | 81.1 | 69.4 | |
Zhang_BUPT&BYTEDANCE_task1a_2 | Zhang2021 | 45 | 63.2 | 55.9 | 63.9 | 70.3 | 60.9 | 81.3 | 49.0 | 59.2 | 40.4 | 81.7 | 69.6 | |
Zhang_BUPT&BYTEDANCE_task1a_3 | Zhang2021 | 98 | 52.2 | 40.7 | 47.5 | 53.7 | 51.9 | 64.8 | 44.2 | 48.6 | 39.8 | 68.8 | 62.5 | |
Zhang_BUPT&BYTEDANCE_task1a_4 | Zhang2021 | 89 | 59.0 | 44.8 | 56.4 | 57.1 | 60.0 | 68.3 | 50.8 | 60.5 | 46.3 | 75.6 | 70.5 | |
Zhao_Maxvision_task1a_1 | Zhao2021 | 75 | 61.2 | 51.8 | 83.0 | 56.7 | 54.3 | 79.3 | 45.5 | 58.3 | 35.4 | 80.2 | 68.1 | |
Zhao_Maxvision_task1a_2 | Zhao2021 | 74 | 63.5 | 47.9 | 78.4 | 63.9 | 50.0 | 74.6 | 52.7 | 67.7 | 42.4 | 81.3 | 76.1 | |
Zhao_Maxvision_task1a_3 | Zhao2021 | 62 | 63.5 | 50.0 | 79.4 | 65.4 | 49.5 | 73.6 | 52.3 | 68.4 | 38.6 | 80.4 | 77.8 | |
Zhao_Maxvision_task1a_4 | Zhao2021 | 58 | 62.8 | 48.5 | 82.8 | 67.0 | 60.1 | 72.1 | 42.0 | 68.8 | 40.5 | 77.7 | 68.6 |
Device-wise performance
Log loss
Unseen devices | Seen devices | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Rank | Submission label |
Technical Report |
Official system rank |
Log loss |
Accuracy / Unseen |
Accuracy / Seen |
D | S7 | S8 | S9 | S10 | A | B | C | S1 | S2 | S3 |
Byttebier_IDLab_task1a_1 | Byttebier2021 | 21 | 0.936 | 1.065 | 0.829 | 1.762 | 0.861 | 0.845 | 0.870 | 0.984 | 0.713 | 0.949 | 0.821 | 0.873 | 0.789 | 0.827 | |
Byttebier_IDLab_task1a_2 | Byttebier2021 | 18 | 0.914 | 1.048 | 0.801 | 1.777 | 0.843 | 0.794 | 0.875 | 0.954 | 0.683 | 0.923 | 0.809 | 0.847 | 0.766 | 0.779 | |
Byttebier_IDLab_task1a_3 | Byttebier2021 | 23 | 0.944 | 1.094 | 0.820 | 1.931 | 0.871 | 0.809 | 0.874 | 0.987 | 0.692 | 0.943 | 0.820 | 0.862 | 0.801 | 0.800 | |
Byttebier_IDLab_task1a_4 | Byttebier2021 | 17 | 0.905 | 1.002 | 0.824 | 1.570 | 0.808 | 0.823 | 0.857 | 0.953 | 0.708 | 0.957 | 0.818 | 0.849 | 0.790 | 0.824 | |
Cao_SCUT_task1a_1 | Cao2021 | 49 | 1.136 | 1.214 | 1.071 | 1.318 | 1.081 | 1.053 | 1.290 | 1.326 | 0.897 | 1.084 | 1.011 | 1.170 | 1.187 | 1.075 | |
Cao_SCUT_task1a_2 | Cao2021 | 56 | 1.200 | 1.318 | 1.102 | 1.507 | 1.122 | 1.072 | 1.405 | 1.485 | 0.878 | 1.141 | 1.057 | 1.222 | 1.230 | 1.084 | |
Cao_SCUT_task1a_3 | Cao2021 | 50 | 1.137 | 1.223 | 1.066 | 1.327 | 1.064 | 1.027 | 1.331 | 1.364 | 0.874 | 1.087 | 1.011 | 1.162 | 1.194 | 1.070 | |
Cao_SCUT_task1a_4 | Cao2021 | 53 | 1.147 | 1.250 | 1.061 | 1.403 | 1.076 | 1.008 | 1.315 | 1.448 | 0.885 | 1.060 | 1.019 | 1.152 | 1.185 | 1.068 | |
Ding_TJU_task1a_1 | Ding2021 | 85 | 1.544 | 1.878 | 1.265 | 2.188 | 1.404 | 1.304 | 2.228 | 2.264 | 1.070 | 1.293 | 1.106 | 1.413 | 1.374 | 1.336 | |
Ding_TJU_task1a_2 | Ding2021 | 70 | 1.326 | 1.488 | 1.191 | 1.879 | 1.316 | 1.181 | 1.593 | 1.473 | 0.983 | 1.187 | 1.022 | 1.286 | 1.374 | 1.291 | |
Ding_TJU_task1a_3 | Ding2021 | 61 | 1.226 | 1.356 | 1.118 | 1.388 | 1.211 | 1.119 | 1.447 | 1.612 | 0.941 | 1.099 | 1.003 | 1.196 | 1.285 | 1.187 | |
Ding_TJU_task1a_4 | Ding2021 | 67 | 1.296 | 1.426 | 1.188 | 1.566 | 1.338 | 1.200 | 1.366 | 1.662 | 0.966 | 1.203 | 1.029 | 1.317 | 1.360 | 1.253 | |
Fan_NWPU_task1a_1 | Cui2021 | 64 | 1.261 | 1.458 | 1.098 | 2.084 | 1.351 | 1.093 | 1.409 | 1.351 | 0.977 | 1.185 | 0.873 | 1.113 | 1.412 | 1.026 | |
Galindo-Meza_ITESO_task1a_1 | Galindo-Meza2021 | 97 | 2.221 | 2.488 | 1.999 | 2.869 | 2.020 | 1.888 | 2.683 | 2.979 | 1.434 | 2.178 | 2.152 | 2.084 | 2.420 | 1.729 | |
Heo_Clova_task1a_1 | Hee-Soo2021 | 42 | 1.087 | 1.180 | 1.009 | 1.627 | 1.024 | 1.010 | 1.058 | 1.180 | 0.926 | 1.048 | 0.991 | 1.065 | 1.001 | 1.022 | |
Heo_Clova_task1a_2 | Hee-Soo2021 | 20 | 0.930 | 0.993 | 0.878 | 1.278 | 0.879 | 0.884 | 0.955 | 0.967 | 0.785 | 0.902 | 0.843 | 0.928 | 0.911 | 0.896 | |
Heo_Clova_task1a_3 | Hee-Soo2021 | 34 | 1.045 | 1.110 | 0.991 | 1.390 | 1.007 | 0.995 | 1.049 | 1.109 | 0.916 | 1.040 | 0.992 | 1.029 | 0.998 | 0.971 | |
Heo_Clova_task1a_4 | Hee-Soo2021 | 12 | 0.871 | 0.929 | 0.823 | 1.205 | 0.822 | 0.802 | 0.905 | 0.912 | 0.754 | 0.838 | 0.843 | 0.881 | 0.821 | 0.802 | |
Horváth_HIT_task1a_1 | Horvth2021 | 86 | 1.597 | 2.039 | 1.228 | 2.242 | 1.388 | 1.311 | 3.143 | 2.111 | 1.093 | 1.133 | 1.066 | 1.322 | 1.491 | 1.265 | |
Horváth_HIT_task1a_2 | Horvth2021 | 92 | 2.031 | 2.072 | 1.997 | 2.149 | 2.012 | 2.002 | 2.099 | 2.096 | 1.926 | 2.027 | 1.996 | 2.023 | 2.032 | 1.978 | |
Horváth_HIT_task1a_3 | Horvth2021 | 76 | 1.460 | 1.780 | 1.193 | 2.223 | 1.223 | 1.442 | 2.377 | 1.634 | 1.117 | 1.215 | 1.165 | 1.264 | 1.289 | 1.108 | |
Horváth_HIT_task1a_4 | Horvth2021 | 95 | 2.065 | 2.111 | 2.027 | 2.215 | 2.037 | 2.052 | 2.127 | 2.123 | 1.963 | 2.037 | 2.022 | 2.045 | 2.071 | 2.027 | |
Jeng_CHT+NSYSU_task1a_1 | Jeng2021 | 78 | 1.469 | 1.557 | 1.396 | 1.640 | 1.445 | 1.347 | 1.638 | 1.713 | 1.077 | 1.404 | 1.326 | 1.521 | 1.612 | 1.436 | |
Jeng_CHT+NSYSU_task1a_2 | Jeng2021 | 84 | 1.543 | 1.619 | 1.480 | 1.703 | 1.510 | 1.457 | 1.671 | 1.752 | 1.185 | 1.496 | 1.386 | 1.593 | 1.676 | 1.542 | |
Jeng_CHT+NSYSU_task1a_3 | Jeng2021 | 79 | 1.470 | 1.613 | 1.351 | 1.855 | 1.448 | 1.454 | 1.613 | 1.694 | 1.107 | 1.410 | 1.345 | 1.413 | 1.477 | 1.352 | |
Jeong_ETRI_task1a_1 | Jeong2021 | 33 | 1.041 | 1.219 | 0.893 | 1.502 | 1.106 | 0.979 | 1.386 | 1.125 | 0.729 | 0.854 | 0.780 | 0.979 | 1.061 | 0.954 | |
Jeong_ETRI_task1a_2 | Jeong2021 | 25 | 0.952 | 1.094 | 0.834 | 1.241 | 0.927 | 0.920 | 1.309 | 1.071 | 0.712 | 0.823 | 0.719 | 0.962 | 0.952 | 0.835 | |
Jeong_ETRI_task1a_3 | Jeong2021 | 30 | 1.023 | 1.187 | 0.886 | 1.491 | 0.950 | 0.879 | 1.474 | 1.143 | 0.681 | 0.845 | 0.726 | 1.103 | 1.054 | 0.908 | |
Jeong_ETRI_task1a_4 | Jeong2021 | 63 | 1.228 | 1.724 | 0.816 | 4.295 | 0.911 | 0.991 | 1.279 | 1.144 | 0.708 | 0.831 | 0.786 | 0.895 | 0.915 | 0.758 | |
Kek_NU_task1a_1 | Kek2021 | 72 | 1.355 | 1.461 | 1.266 | 1.685 | 1.330 | 1.331 | 1.482 | 1.479 | 1.095 | 1.340 | 1.232 | 1.316 | 1.344 | 1.268 | |
Kek_NU_task1a_2 | Kek2021 | 57 | 1.207 | 1.416 | 1.034 | 2.204 | 1.061 | 1.087 | 1.419 | 1.309 | 0.883 | 1.076 | 1.036 | 1.086 | 1.116 | 1.002 | |
Kim_3M_task1a_1 | Kim2021 | 38 | 1.076 | 1.185 | 0.986 | 1.420 | 1.051 | 1.073 | 1.169 | 1.212 | 0.792 | 1.090 | 0.912 | 0.971 | 1.191 | 0.959 | |
Kim_3M_task1a_2 | Kim2021 | 39 | 1.077 | 1.185 | 0.987 | 1.430 | 1.038 | 1.068 | 1.168 | 1.222 | 0.795 | 1.093 | 0.907 | 0.976 | 1.190 | 0.961 | |
Kim_3M_task1a_3 | Kim2021 | 37 | 1.076 | 1.183 | 0.986 | 1.419 | 1.039 | 1.068 | 1.168 | 1.220 | 0.795 | 1.085 | 0.909 | 0.975 | 1.192 | 0.963 | |
Kim_3M_task1a_4 | Kim2021 | 40 | 1.078 | 1.190 | 0.986 | 1.430 | 1.043 | 1.076 | 1.151 | 1.249 | 0.803 | 1.097 | 0.917 | 0.969 | 1.164 | 0.963 | |
Kim_KNU_task1a_1 | Kim2021a | 46 | 1.115 | 1.317 | 0.946 | 2.529 | 0.953 | 1.046 | 1.003 | 1.056 | 0.866 | 1.030 | 0.925 | 0.985 | 0.936 | 0.931 | |
Kim_KNU_task1a_2 | Kim2021a | 28 | 1.010 | 1.215 | 0.839 | 1.412 | 0.988 | 0.854 | 1.612 | 1.212 | 0.734 | 0.842 | 0.757 | 0.931 | 0.930 | 0.839 | |
Kim_KNU_task1a_3 | Kim2021a | 55 | 1.188 | 1.371 | 1.036 | 2.379 | 1.047 | 1.130 | 1.102 | 1.198 | 0.909 | 1.150 | 0.983 | 1.106 | 1.050 | 1.016 | |
Kim_KNU_task1a_4 | Kim2021a | 52 | 1.143 | 1.315 | 1.000 | 2.113 | 1.063 | 1.146 | 1.073 | 1.182 | 0.883 | 1.081 | 0.958 | 1.045 | 1.034 | 0.997 | |
Kim_QTI_task1a_1 | Kim2021b | 8 | 0.793 | 0.851 | 0.744 | 1.162 | 0.756 | 0.720 | 0.784 | 0.832 | 0.631 | 0.780 | 0.749 | 0.784 | 0.773 | 0.749 | |
Kim_QTI_task1a_2 | Kim2021b | 1 | 0.724 | 0.766 | 0.689 | 1.059 | 0.665 | 0.631 | 0.720 | 0.754 | 0.561 | 0.754 | 0.719 | 0.721 | 0.704 | 0.675 | |
Kim_QTI_task1a_3 | Kim2021b | 2 | 0.735 | 0.792 | 0.687 | 1.195 | 0.680 | 0.643 | 0.719 | 0.724 | 0.585 | 0.730 | 0.724 | 0.739 | 0.685 | 0.659 | |
Kim_QTI_task1a_4 | Kim2021b | 5 | 0.764 | 0.832 | 0.708 | 1.169 | 0.733 | 0.720 | 0.768 | 0.772 | 0.598 | 0.751 | 0.725 | 0.746 | 0.747 | 0.680 | |
Koutini_CPJKU_task1a_1 | Koutini2021 | 14 | 0.883 | 1.051 | 0.743 | 1.704 | 0.784 | 0.808 | 0.990 | 0.968 | 0.612 | 0.746 | 0.720 | 0.815 | 0.841 | 0.727 | |
Koutini_CPJKU_task1a_2 | Koutini2021 | 10 | 0.842 | 0.976 | 0.730 | 1.592 | 0.741 | 0.778 | 0.868 | 0.904 | 0.581 | 0.783 | 0.723 | 0.812 | 0.795 | 0.686 | |
Koutini_CPJKU_task1a_3 | Koutini2021 | 9 | 0.834 | 0.947 | 0.740 | 1.477 | 0.739 | 0.759 | 0.865 | 0.896 | 0.600 | 0.783 | 0.748 | 0.821 | 0.791 | 0.695 | |
Koutini_CPJKU_task1a_4 | Koutini2021 | 11 | 0.847 | 0.970 | 0.744 | 1.624 | 0.761 | 0.752 | 0.856 | 0.859 | 0.625 | 0.807 | 0.786 | 0.776 | 0.755 | 0.716 | |
Lim_CAU_task1a_1 | Lim2021 | 90 | 1.956 | 2.767 | 1.280 | 5.170 | 1.776 | 1.894 | 3.051 | 1.944 | 1.100 | 1.152 | 1.168 | 1.272 | 1.546 | 1.443 | |
Lim_CAU_task1a_2 | Lim2021 | 91 | 2.010 | 2.892 | 1.275 | 6.246 | 1.406 | 1.768 | 3.284 | 1.754 | 1.167 | 1.131 | 1.201 | 1.168 | 1.576 | 1.406 | |
Lim_CAU_task1a_3 | Lim2021 | 80 | 1.479 | 1.892 | 1.134 | 2.711 | 1.624 | 1.320 | 1.849 | 1.955 | 0.837 | 1.282 | 1.061 | 1.241 | 1.280 | 1.106 | |
Lim_CAU_task1a_4 | Lim2021 | 93 | 2.039 | 2.998 | 1.240 | 7.522 | 1.699 | 1.348 | 2.471 | 1.952 | 1.069 | 1.334 | 0.977 | 1.412 | 1.229 | 1.417 | |
Liu_UESTC_task1a_1 | Liu2021 | 16 | 0.900 | 0.974 | 0.838 | 1.367 | 0.873 | 0.845 | 0.866 | 0.920 | 0.749 | 0.879 | 0.796 | 0.912 | 0.880 | 0.813 | |
Liu_UESTC_task1a_2 | Liu2021 | 15 | 0.895 | 0.955 | 0.844 | 1.334 | 0.848 | 0.834 | 0.851 | 0.907 | 0.759 | 0.902 | 0.796 | 0.909 | 0.890 | 0.810 | |
Liu_UESTC_task1a_3 | Liu2021 | 13 | 0.878 | 0.966 | 0.804 | 1.398 | 0.833 | 0.837 | 0.853 | 0.908 | 0.680 | 0.871 | 0.823 | 0.877 | 0.810 | 0.766 | |
Liu_UESTC_task1a_4 | Liu2021 | 87 | 1.626 | 1.756 | 1.519 | 1.893 | 1.622 | 1.643 | 1.826 | 1.796 | 1.222 | 1.539 | 1.276 | 1.648 | 1.848 | 1.580 | |
Madhu_CET_task1a_1 | Madhu2021 | 99 | 3.950 | 3.952 | 3.948 | 3.813 | 3.947 | 4.018 | 3.974 | 4.008 | 3.925 | 3.999 | 4.019 | 3.923 | 3.962 | 3.858 | |
DCASE2021 baseline | 1.730 | 2.222 | 1.320 | 3.255 | 1.609 | 1.610 | 2.142 | 2.494 | 1.085 | 1.361 | 1.174 | 1.361 | 1.468 | 1.473 | |||
Naranjo-Alcazar_ITI_task1a_1 | Naranjo-Alcazar2021_t1a | 51 | 1.140 | 1.348 | 0.967 | 1.821 | 1.043 | 1.048 | 1.543 | 1.285 | 0.814 | 0.999 | 0.944 | 1.035 | 1.061 | 0.949 | |
Pham_AIT_task1a_1 | Pham2021 | 73 | 1.368 | 1.653 | 1.130 | 2.525 | 1.185 | 1.230 | 1.882 | 1.443 | 0.889 | 1.399 | 1.037 | 1.103 | 1.355 | 0.994 | |
Pham_AIT_task1a_2 | Pham2021 | 54 | 1.187 | 1.398 | 1.011 | 1.877 | 1.105 | 1.090 | 1.438 | 1.479 | 0.798 | 1.200 | 1.089 | 1.105 | 1.042 | 0.830 | |
Pham_AIT_task1a_3 | Pham2021 | 94 | 2.058 | 2.497 | 1.693 | 3.776 | 1.829 | 1.871 | 2.705 | 2.306 | 1.429 | 2.085 | 1.691 | 1.686 | 1.852 | 1.412 | |
Phan_UIUC_task1a_1 | Phan2021 | 65 | 1.272 | 1.369 | 1.191 | 1.748 | 1.193 | 1.226 | 1.290 | 1.387 | 1.069 | 1.256 | 1.211 | 1.197 | 1.261 | 1.152 | |
Phan_UIUC_task1a_2 | Phan2021 | 71 | 1.335 | 1.419 | 1.265 | 1.766 | 1.270 | 1.278 | 1.350 | 1.431 | 1.145 | 1.318 | 1.271 | 1.284 | 1.330 | 1.240 | |
Phan_UIUC_task1a_3 | Phan2021 | 60 | 1.223 | 1.294 | 1.164 | 1.618 | 1.156 | 1.179 | 1.249 | 1.266 | 1.074 | 1.224 | 1.177 | 1.173 | 1.214 | 1.120 | |
Phan_UIUC_task1a_4 | Phan2021 | 66 | 1.292 | 1.351 | 1.242 | 1.643 | 1.228 | 1.240 | 1.303 | 1.342 | 1.146 | 1.304 | 1.251 | 1.248 | 1.301 | 1.201 | |
Puy_VAI_task1a_1 | Puy2021 | 24 | 0.952 | 1.159 | 0.779 | 1.621 | 0.937 | 0.887 | 1.286 | 1.066 | 0.666 | 0.823 | 0.647 | 0.822 | 0.911 | 0.804 | |
Puy_VAI_task1a_2 | Puy2021 | 27 | 0.974 | 1.152 | 0.825 | 1.404 | 0.953 | 0.939 | 1.265 | 1.199 | 0.658 | 0.880 | 0.701 | 0.874 | 0.990 | 0.848 | |
Puy_VAI_task1a_3 | Puy2021 | 22 | 0.939 | 1.116 | 0.791 | 1.331 | 0.920 | 0.915 | 1.310 | 1.107 | 0.672 | 0.806 | 0.688 | 0.838 | 0.912 | 0.829 | |
Qiao_NCUT_task1a_1 | Qiao2021 | 88 | 1.630 | 1.651 | 1.612 | 1.768 | 1.609 | 1.534 | 1.622 | 1.724 | 1.581 | 1.631 | 1.592 | 1.640 | 1.625 | 1.603 | |
Seo_SGU_task1a_1 | Seo2021 | 32 | 1.030 | 1.107 | 0.965 | 1.502 | 1.006 | 0.959 | 1.002 | 1.068 | 0.917 | 0.988 | 1.007 | 1.005 | 0.957 | 0.916 | |
Seo_SGU_task1a_2 | Seo2021 | 41 | 1.080 | 1.164 | 1.010 | 1.592 | 1.033 | 1.019 | 1.056 | 1.123 | 0.931 | 1.080 | 1.016 | 1.044 | 1.009 | 0.977 | |
Seo_SGU_task1a_3 | Seo2021 | 35 | 1.065 | 1.149 | 0.995 | 1.592 | 1.008 | 1.002 | 1.035 | 1.106 | 0.927 | 1.066 | 1.014 | 1.028 | 0.981 | 0.953 | |
Seo_SGU_task1a_4 | Seo2021 | 44 | 1.087 | 1.175 | 1.014 | 1.572 | 1.045 | 1.026 | 1.078 | 1.155 | 0.938 | 1.092 | 1.001 | 1.064 | 1.002 | 0.986 | |
Singh_IITMandi_task1a_1 | Singh2021 | 77 | 1.464 | 1.687 | 1.277 | 1.984 | 1.445 | 1.251 | 1.873 | 1.883 | 1.041 | 1.245 | 1.112 | 1.406 | 1.512 | 1.349 | |
Singh_IITMandi_task1a_2 | Singh2021 | 83 | 1.515 | 1.730 | 1.337 | 1.873 | 1.425 | 1.329 | 2.012 | 2.010 | 1.082 | 1.319 | 1.195 | 1.343 | 1.598 | 1.482 | |
Singh_IITMandi_task1a_3 | Singh2021 | 82 | 1.509 | 1.761 | 1.299 | 1.878 | 1.436 | 1.330 | 2.107 | 2.055 | 1.024 | 1.315 | 1.164 | 1.358 | 1.504 | 1.430 | |
Singh_IITMandi_task1a_4 | Singh2021 | 81 | 1.488 | 1.738 | 1.279 | 1.736 | 1.473 | 1.325 | 2.078 | 2.080 | 1.041 | 1.291 | 1.159 | 1.345 | 1.451 | 1.386 | |
Sugahara_RION_task1a_1 | Sugahara2021 | 43 | 1.087 | 1.247 | 0.953 | 1.601 | 1.038 | 1.102 | 1.336 | 1.159 | 0.863 | 1.010 | 0.878 | 1.023 | 1.007 | 0.939 | |
Sugahara_RION_task1a_2 | Sugahara2021 | 36 | 1.070 | 1.231 | 0.936 | 1.614 | 1.017 | 1.099 | 1.307 | 1.118 | 0.871 | 0.995 | 0.875 | 0.994 | 0.970 | 0.915 | |
Sugahara_RION_task1a_3 | Sugahara2021 | 31 | 1.024 | 1.159 | 0.912 | 1.608 | 0.945 | 1.017 | 1.185 | 1.043 | 0.933 | 0.947 | 0.864 | 0.954 | 0.898 | 0.875 | |
Sugahara_RION_task1a_4 | Sugahara2021 | 68 | 1.297 | 1.610 | 1.036 | 2.081 | 1.254 | 1.580 | 2.014 | 1.124 | 1.068 | 1.194 | 0.905 | 1.129 | 0.976 | 0.941 | |
Verbitskiy_DS_task1a_1 | Verbitskiy2021 | 48 | 1.127 | 1.305 | 0.978 | 1.410 | 1.114 | 1.027 | 1.532 | 1.444 | 0.856 | 0.935 | 0.902 | 1.009 | 1.163 | 1.005 | |
Verbitskiy_DS_task1a_2 | Verbitskiy2021 | 29 | 1.019 | 1.144 | 0.915 | 1.243 | 0.990 | 0.941 | 1.257 | 1.290 | 0.782 | 0.867 | 0.831 | 0.987 | 1.072 | 0.954 | |
Verbitskiy_DS_task1a_3 | Verbitskiy2021 | 26 | 0.966 | 1.102 | 0.852 | 1.304 | 0.888 | 0.919 | 1.198 | 1.203 | 0.726 | 0.805 | 0.771 | 0.926 | 0.996 | 0.886 | |
Verbitskiy_DS_task1a_4 | Verbitskiy2021 | 19 | 0.924 | 1.040 | 0.827 | 1.197 | 0.886 | 0.869 | 1.050 | 1.200 | 0.697 | 0.831 | 0.766 | 0.908 | 0.930 | 0.829 | |
Yang_GT_task1a_1 | Yang2021 | 6 | 0.768 | 0.846 | 0.703 | 1.075 | 0.721 | 0.737 | 0.902 | 0.792 | 0.611 | 0.738 | 0.688 | 0.787 | 0.724 | 0.673 | |
Yang_GT_task1a_2 | Yang2021 | 4 | 0.764 | 0.840 | 0.700 | 1.091 | 0.707 | 0.724 | 0.882 | 0.797 | 0.611 | 0.741 | 0.671 | 0.784 | 0.722 | 0.670 | |
Yang_GT_task1a_3 | Yang2021 | 3 | 0.758 | 0.832 | 0.696 | 1.058 | 0.711 | 0.723 | 0.875 | 0.795 | 0.608 | 0.738 | 0.667 | 0.785 | 0.711 | 0.667 | |
Yang_GT_task1a_4 | Yang2021 | 7 | 0.774 | 0.850 | 0.710 | 1.087 | 0.724 | 0.735 | 0.898 | 0.805 | 0.621 | 0.737 | 0.692 | 0.796 | 0.737 | 0.679 | |
Yihao_speakin_task1a_1 | Yihao2021 | 69 | 1.311 | 1.376 | 1.257 | 1.374 | 1.255 | 1.156 | 1.516 | 1.578 | 1.036 | 1.260 | 1.171 | 1.372 | 1.408 | 1.297 | |
Yihao_speakin_task1a_2 | Yihao2021 | 59 | 1.222 | 1.284 | 1.171 | 1.295 | 1.174 | 1.102 | 1.361 | 1.487 | 0.949 | 1.177 | 1.126 | 1.260 | 1.307 | 1.211 | |
Yihao_speakin_task1a_3 | Yihao2021 | 96 | 2.105 | 2.114 | 2.097 | 2.109 | 2.098 | 2.082 | 2.136 | 2.145 | 2.047 | 2.099 | 2.079 | 2.127 | 2.118 | 2.112 | |
Zhang_BUPT&BYTEDANCE_task1a_1 | Zhang2021 | 47 | 1.124 | 1.243 | 1.024 | 1.448 | 1.135 | 0.974 | 1.301 | 1.358 | 0.812 | 0.988 | 0.967 | 1.158 | 1.172 | 1.048 | |
Zhang_BUPT&BYTEDANCE_task1a_2 | Zhang2021 | 45 | 1.113 | 1.242 | 1.006 | 1.460 | 1.044 | 0.947 | 1.381 | 1.377 | 0.791 | 0.999 | 0.911 | 1.116 | 1.161 | 1.056 | |
Zhang_BUPT&BYTEDANCE_task1a_3 | Zhang2021 | 98 | 3.359 | 3.840 | 2.958 | 4.402 | 3.559 | 3.079 | 4.339 | 3.819 | 2.726 | 2.674 | 3.054 | 2.923 | 2.935 | 3.438 | |
Zhang_BUPT&BYTEDANCE_task1a_4 | Zhang2021 | 89 | 1.946 | 2.451 | 1.525 | 3.028 | 2.127 | 1.496 | 2.811 | 2.796 | 1.233 | 1.093 | 1.223 | 1.765 | 2.384 | 1.451 | |
Zhao_Maxvision_task1a_1 | Zhao2021 | 75 | 1.440 | 1.598 | 1.308 | 1.810 | 1.375 | 1.379 | 1.705 | 1.722 | 1.084 | 1.342 | 1.271 | 1.398 | 1.450 | 1.305 | |
Zhao_Maxvision_task1a_2 | Zhao2021 | 74 | 1.412 | 1.551 | 1.297 | 1.782 | 1.355 | 1.364 | 1.656 | 1.596 | 1.134 | 1.328 | 1.288 | 1.355 | 1.396 | 1.279 | |
Zhao_Maxvision_task1a_3 | Zhao2021 | 62 | 1.227 | 1.430 | 1.057 | 1.570 | 1.170 | 1.107 | 1.685 | 1.620 | 0.862 | 1.104 | 1.037 | 1.147 | 1.166 | 1.024 | |
Zhao_Maxvision_task1a_4 | Zhao2021 | 58 | 1.215 | 1.406 | 1.056 | 1.605 | 1.193 | 1.116 | 1.548 | 1.569 | 0.831 | 1.070 | 1.087 | 1.162 | 1.163 | 1.024 |
Accuracy
Unseen devices | Seen devices | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Rank | Submission label |
Technical Report |
Official system rank |
Accuracy |
Accuracy / Unseen |
Accuracy / Seen |
D | S7 | S8 | S9 | S10 | A | B | C | S1 | S2 | S3 |
Byttebier_IDLab_task1a_1 | Byttebier2021 | 21 | 68.6 | 64.5 | 72.0 | 44.0 | 69.9 | 71.1 | 70.3 | 67.4 | 77.4 | 68.3 | 71.5 | 70.0 | 72.2 | 72.6 | |
Byttebier_IDLab_task1a_2 | Byttebier2021 | 18 | 67.5 | 63.6 | 70.8 | 42.9 | 68.9 | 70.7 | 68.3 | 67.2 | 75.4 | 66.5 | 71.1 | 70.1 | 70.7 | 71.0 | |
Byttebier_IDLab_task1a_3 | Byttebier2021 | 23 | 68.5 | 64.7 | 71.7 | 44.2 | 70.6 | 72.4 | 68.8 | 67.8 | 76.1 | 68.1 | 71.5 | 71.5 | 70.8 | 72.2 | |
Byttebier_IDLab_task1a_4 | Byttebier2021 | 17 | 68.8 | 65.5 | 71.5 | 48.3 | 71.4 | 70.8 | 70.6 | 66.5 | 76.5 | 68.2 | 72.1 | 71.4 | 70.3 | 70.6 | |
Cao_SCUT_task1a_1 | Cao2021 | 49 | 66.7 | 62.5 | 70.2 | 58.3 | 68.3 | 67.9 | 58.5 | 59.3 | 75.4 | 70.6 | 72.6 | 65.8 | 67.9 | 68.9 | |
Cao_SCUT_task1a_2 | Cao2021 | 56 | 64.6 | 59.0 | 69.2 | 53.1 | 68.9 | 71.0 | 53.9 | 48.2 | 78.3 | 68.1 | 73.1 | 64.3 | 62.2 | 69.3 | |
Cao_SCUT_task1a_3 | Cao2021 | 50 | 67.2 | 63.3 | 70.4 | 58.8 | 71.4 | 70.3 | 58.6 | 57.2 | 76.8 | 70.1 | 72.5 | 66.5 | 65.7 | 70.8 | |
Cao_SCUT_task1a_4 | Cao2021 | 53 | 66.1 | 60.8 | 70.5 | 54.3 | 70.3 | 71.7 | 55.4 | 52.5 | 76.5 | 70.6 | 71.9 | 66.9 | 66.3 | 70.7 | |
Ding_TJU_task1a_1 | Ding2021 | 85 | 53.0 | 46.8 | 58.2 | 46.5 | 52.8 | 55.7 | 39.6 | 39.3 | 68.3 | 59.2 | 62.6 | 50.8 | 54.7 | 53.8 | |
Ding_TJU_task1a_2 | Ding2021 | 70 | 51.1 | 45.9 | 55.4 | 41.4 | 49.9 | 55.0 | 40.8 | 42.2 | 64.2 | 53.3 | 62.1 | 52.4 | 47.5 | 52.9 | |
Ding_TJU_task1a_3 | Ding2021 | 61 | 49.1 | 43.9 | 53.4 | 46.4 | 47.2 | 46.4 | 43.1 | 36.2 | 61.1 | 51.9 | 58.9 | 50.3 | 47.6 | 50.6 | |
Ding_TJU_task1a_4 | Ding2021 | 67 | 51.4 | 46.6 | 55.4 | 45.8 | 46.9 | 52.9 | 48.8 | 38.8 | 64.7 | 55.0 | 61.4 | 49.4 | 48.5 | 53.2 | |
Fan_NWPU_task1a_1 | Cui2021 | 64 | 68.3 | 65.6 | 70.6 | 54.7 | 71.8 | 69.4 | 65.8 | 66.0 | 74.7 | 69.0 | 74.0 | 68.6 | 67.5 | 69.4 | |
Galindo-Meza_ITESO_task1a_1 | Galindo-Meza2021 | 97 | 53.9 | 50.8 | 56.5 | 50.3 | 56.2 | 57.8 | 47.1 | 42.6 | 70.8 | 51.8 | 54.6 | 53.6 | 49.4 | 58.5 | |
Heo_Clova_task1a_1 | Hee-Soo2021 | 42 | 67.0 | 62.6 | 70.7 | 42.5 | 69.3 | 70.7 | 67.2 | 63.2 | 74.7 | 70.0 | 72.6 | 67.6 | 70.7 | 68.5 | |
Heo_Clova_task1a_2 | Hee-Soo2021 | 20 | 66.9 | 64.1 | 69.2 | 55.8 | 69.4 | 66.2 | 64.9 | 64.0 | 74.6 | 69.9 | 69.2 | 67.5 | 66.4 | 67.9 | |
Heo_Clova_task1a_3 | Hee-Soo2021 | 34 | 70.0 | 67.1 | 72.5 | 56.0 | 71.4 | 72.6 | 69.2 | 66.4 | 76.7 | 69.6 | 72.1 | 71.3 | 72.6 | 72.6 | |
Heo_Clova_task1a_4 | Hee-Soo2021 | 12 | 70.1 | 68.1 | 71.8 | 57.2 | 71.2 | 72.9 | 69.7 | 69.3 | 75.1 | 72.4 | 70.0 | 71.0 | 70.3 | 72.1 | |
Horváth_HIT_task1a_1 | Horvth2021 | 86 | 51.4 | 44.1 | 57.5 | 44.4 | 52.4 | 51.5 | 32.6 | 39.3 | 65.1 | 57.6 | 60.0 | 53.2 | 53.3 | 56.0 | |
Horváth_HIT_task1a_2 | Horvth2021 | 92 | 53.3 | 47.1 | 58.6 | 36.4 | 56.9 | 57.9 | 40.8 | 43.2 | 69.0 | 56.1 | 58.5 | 53.5 | 53.3 | 61.1 | |
Horváth_HIT_task1a_3 | Horvth2021 | 76 | 51.6 | 44.6 | 57.5 | 34.6 | 56.1 | 48.5 | 41.0 | 42.8 | 61.9 | 54.9 | 55.3 | 55.3 | 56.1 | 61.4 | |
Horváth_HIT_task1a_4 | Horvth2021 | 95 | 49.2 | 40.4 | 56.5 | 21.0 | 53.3 | 52.4 | 36.1 | 39.3 | 66.4 | 56.3 | 58.3 | 52.2 | 48.8 | 56.8 | |
Jeng_CHT+NSYSU_task1a_1 | Jeng2021 | 78 | 55.0 | 50.9 | 58.4 | 47.6 | 54.0 | 64.3 | 47.1 | 41.4 | 70.1 | 57.9 | 63.5 | 52.6 | 49.2 | 56.8 | |
Jeng_CHT+NSYSU_task1a_2 | Jeng2021 | 84 | 51.3 | 47.3 | 54.6 | 43.1 | 53.2 | 56.7 | 43.3 | 40.4 | 65.8 | 54.4 | 58.9 | 49.6 | 47.5 | 51.2 | |
Jeng_CHT+NSYSU_task1a_3 | Jeng2021 | 79 | 56.3 | 50.9 | 60.8 | 40.7 | 59.6 | 57.6 | 50.8 | 45.6 | 68.8 | 57.4 | 60.1 | 60.8 | 55.4 | 62.1 | |
Jeong_ETRI_task1a_1 | Jeong2021 | 33 | 66.0 | 60.6 | 70.5 | 51.0 | 66.0 | 66.0 | 56.9 | 62.9 | 75.6 | 69.9 | 73.8 | 68.2 | 65.3 | 70.3 | |
Jeong_ETRI_task1a_2 | Jeong2021 | 25 | 67.0 | 62.6 | 70.6 | 57.4 | 66.8 | 66.2 | 59.2 | 63.6 | 75.3 | 69.4 | 75.4 | 65.1 | 68.6 | 69.6 | |
Jeong_ETRI_task1a_3 | Jeong2021 | 30 | 66.7 | 61.4 | 71.1 | 52.8 | 67.1 | 66.2 | 59.6 | 61.3 | 76.7 | 69.7 | 75.6 | 67.5 | 67.1 | 70.3 | |
Jeong_ETRI_task1a_4 | Jeong2021 | 63 | 66.1 | 59.6 | 71.6 | 43.5 | 67.8 | 66.3 | 59.6 | 61.0 | 77.4 | 70.1 | 72.8 | 66.7 | 68.8 | 73.6 | |
Kek_NU_task1a_1 | Kek2021 | 72 | 66.8 | 61.3 | 71.3 | 45.0 | 69.3 | 68.9 | 61.9 | 61.2 | 78.5 | 65.4 | 72.4 | 71.0 | 69.2 | 71.5 | |
Kek_NU_task1a_2 | Kek2021 | 57 | 63.5 | 56.6 | 69.3 | 35.3 | 67.2 | 67.6 | 52.8 | 60.0 | 74.2 | 67.6 | 69.2 | 68.2 | 65.8 | 70.7 | |
Kim_3M_task1a_1 | Kim2021 | 38 | 61.5 | 57.7 | 64.6 | 51.0 | 62.6 | 60.7 | 59.6 | 54.9 | 70.0 | 62.9 | 66.8 | 63.6 | 58.6 | 65.8 | |
Kim_3M_task1a_2 | Kim2021 | 39 | 61.6 | 58.1 | 64.5 | 52.1 | 63.1 | 60.8 | 59.9 | 54.6 | 69.9 | 63.5 | 66.9 | 63.7 | 58.1 | 64.9 | |
Kim_3M_task1a_3 | Kim2021 | 37 | 62.0 | 58.6 | 64.9 | 52.1 | 63.5 | 61.4 | 60.7 | 55.4 | 71.1 | 63.1 | 66.4 | 63.5 | 58.3 | 67.1 | |
Kim_3M_task1a_4 | Kim2021 | 40 | 61.3 | 57.6 | 64.3 | 51.2 | 62.2 | 60.6 | 59.9 | 54.2 | 70.1 | 62.2 | 65.8 | 63.5 | 58.5 | 65.8 | |
Kim_KNU_task1a_1 | Kim2021a | 46 | 64.7 | 59.4 | 69.1 | 32.8 | 68.5 | 63.3 | 67.1 | 65.6 | 72.4 | 66.0 | 69.9 | 68.3 | 68.3 | 69.6 | |
Kim_KNU_task1a_2 | Kim2021a | 28 | 63.8 | 57.2 | 69.4 | 49.9 | 63.5 | 67.8 | 48.9 | 55.8 | 75.1 | 69.3 | 71.9 | 63.3 | 66.4 | 70.0 | |
Kim_KNU_task1a_3 | Kim2021a | 55 | 61.3 | 56.2 | 65.6 | 31.3 | 65.1 | 60.8 | 64.3 | 59.6 | 69.7 | 60.4 | 68.1 | 64.9 | 64.9 | 65.7 | |
Kim_KNU_task1a_4 | Kim2021a | 52 | 62.9 | 57.7 | 67.3 | 35.3 | 65.0 | 59.7 | 66.7 | 61.7 | 70.1 | 64.2 | 67.8 | 66.7 | 66.9 | 67.9 | |
Kim_QTI_task1a_1 | Kim2021b | 8 | 75.0 | 73.6 | 76.2 | 66.0 | 76.8 | 76.8 | 74.7 | 73.6 | 81.1 | 73.3 | 77.1 | 74.3 | 75.3 | 76.0 | |
Kim_QTI_task1a_2 | Kim2021b | 1 | 76.1 | 74.5 | 77.4 | 68.9 | 76.8 | 76.7 | 75.8 | 74.4 | 82.6 | 74.2 | 76.7 | 76.0 | 76.8 | 78.1 | |
Kim_QTI_task1a_3 | Kim2021b | 2 | 76.1 | 75.2 | 76.9 | 66.0 | 78.2 | 77.8 | 77.2 | 76.8 | 81.1 | 74.7 | 76.5 | 75.6 | 77.1 | 76.4 | |
Kim_QTI_task1a_4 | Kim2021b | 5 | 75.2 | 73.3 | 76.8 | 66.1 | 76.4 | 75.7 | 75.0 | 73.5 | 81.2 | 74.4 | 75.3 | 77.1 | 75.0 | 77.5 | |
Koutini_CPJKU_task1a_1 | Koutini2021 | 14 | 70.9 | 66.4 | 74.6 | 51.0 | 75.0 | 72.1 | 67.5 | 66.5 | 80.1 | 74.7 | 74.4 | 73.1 | 70.4 | 74.6 | |
Koutini_CPJKU_task1a_2 | Koutini2021 | 10 | 71.8 | 68.2 | 74.8 | 52.1 | 75.6 | 74.0 | 71.0 | 68.3 | 81.4 | 72.5 | 74.3 | 71.1 | 71.9 | 77.6 | |
Koutini_CPJKU_task1a_3 | Koutini2021 | 9 | 72.1 | 69.6 | 74.2 | 57.6 | 75.0 | 73.1 | 71.7 | 70.6 | 80.6 | 73.8 | 73.6 | 69.6 | 72.4 | 75.4 | |
Koutini_CPJKU_task1a_4 | Koutini2021 | 11 | 71.8 | 69.3 | 74.0 | 54.7 | 74.6 | 74.7 | 70.7 | 71.8 | 79.4 | 71.8 | 72.4 | 73.5 | 72.8 | 73.9 | |
Lim_CAU_task1a_1 | Lim2021 | 90 | 67.5 | 62.2 | 71.9 | 50.6 | 70.0 | 68.3 | 59.7 | 62.2 | 77.5 | 71.4 | 72.2 | 68.1 | 71.4 | 71.1 | |
Lim_CAU_task1a_2 | Lim2021 | 91 | 67.9 | 62.3 | 72.6 | 49.0 | 71.2 | 68.5 | 58.9 | 64.0 | 77.9 | 71.4 | 74.6 | 69.7 | 71.0 | 71.0 | |
Lim_CAU_task1a_3 | Lim2021 | 80 | 68.5 | 64.1 | 72.2 | 56.2 | 70.0 | 68.3 | 62.5 | 63.3 | 78.3 | 68.7 | 74.2 | 69.7 | 69.7 | 72.6 | |
Lim_CAU_task1a_4 | Lim2021 | 93 | 65.8 | 60.1 | 70.5 | 41.4 | 70.4 | 68.8 | 58.6 | 61.1 | 74.3 | 68.1 | 73.8 | 66.8 | 65.4 | 74.7 | |
Liu_UESTC_task1a_1 | Liu2021 | 16 | 68.8 | 66.1 | 71.1 | 53.5 | 68.9 | 69.6 | 69.9 | 68.6 | 75.0 | 68.1 | 72.9 | 70.8 | 68.1 | 71.8 | |
Liu_UESTC_task1a_2 | Liu2021 | 15 | 68.2 | 66.4 | 69.7 | 50.7 | 70.6 | 70.6 | 70.8 | 69.3 | 73.2 | 66.9 | 70.6 | 68.9 | 66.1 | 72.8 | |
Liu_UESTC_task1a_3 | Liu2021 | 13 | 69.6 | 66.8 | 71.9 | 54.4 | 72.5 | 69.4 | 70.4 | 67.4 | 78.7 | 69.4 | 71.4 | 67.9 | 70.4 | 73.5 | |
Liu_UESTC_task1a_4 | Liu2021 | 87 | 42.0 | 38.3 | 45.0 | 39.4 | 39.9 | 42.5 | 35.7 | 34.0 | 55.1 | 42.4 | 52.8 | 41.2 | 37.9 | 40.7 | |
Madhu_CET_task1a_1 | Madhu2021 | 99 | 9.7 | 9.2 | 10.1 | 9.2 | 9.9 | 9.4 | 8.3 | 9.3 | 10.4 | 10.4 | 7.8 | 9.6 | 10.6 | 11.7 | |
DCASE2021 baseline | 45.6 | 38.0 | 51.9 | 29.2 | 46.5 | 49.7 | 34.0 | 30.6 | 62.5 | 51.7 | 57.6 | 49.6 | 44.3 | 45.8 | |||
Naranjo-Alcazar_ITI_task1a_1 | Naranjo-Alcazar2021_t1a | 51 | 60.2 | 53.4 | 65.9 | 45.3 | 60.1 | 61.4 | 47.6 | 52.8 | 71.3 | 66.0 | 66.0 | 63.9 | 62.1 | 66.2 | |
Pham_AIT_task1a_1 | Pham2021 | 73 | 67.5 | 64.3 | 70.1 | 55.4 | 71.0 | 69.4 | 61.0 | 64.9 | 77.8 | 65.3 | 71.9 | 71.4 | 62.8 | 71.4 | |
Pham_AIT_task1a_2 | Pham2021 | 54 | 68.4 | 64.8 | 71.3 | 57.8 | 70.6 | 70.3 | 62.8 | 62.8 | 78.6 | 67.4 | 68.9 | 69.3 | 68.5 | 75.4 | |
Pham_AIT_task1a_3 | Pham2021 | 94 | 69.6 | 66.1 | 72.6 | 57.8 | 72.9 | 70.3 | 63.9 | 65.4 | 78.8 | 68.8 | 71.0 | 73.3 | 67.9 | 76.0 | |
Phan_UIUC_task1a_1 | Phan2021 | 65 | 63.3 | 59.2 | 66.7 | 43.6 | 64.4 | 65.3 | 63.9 | 59.0 | 73.5 | 61.5 | 67.4 | 66.4 | 63.1 | 68.6 | |
Phan_UIUC_task1a_2 | Phan2021 | 71 | 63.3 | 59.2 | 66.7 | 43.6 | 64.4 | 65.3 | 63.9 | 59.0 | 73.5 | 61.5 | 67.4 | 66.4 | 63.1 | 68.6 | |
Phan_UIUC_task1a_3 | Phan2021 | 60 | 65.3 | 62.8 | 67.5 | 49.9 | 66.7 | 67.2 | 66.4 | 63.9 | 72.8 | 63.9 | 69.0 | 66.5 | 63.7 | 68.8 | |
Phan_UIUC_task1a_4 | Phan2021 | 66 | 65.3 | 62.8 | 67.5 | 49.9 | 66.7 | 67.2 | 66.4 | 63.9 | 72.8 | 63.9 | 69.0 | 66.5 | 63.7 | 68.8 | |
Puy_VAI_task1a_1 | Puy2021 | 24 | 66.6 | 59.7 | 72.4 | 46.8 | 66.0 | 67.9 | 57.4 | 60.3 | 77.2 | 72.8 | 77.5 | 69.0 | 67.2 | 70.6 | |
Puy_VAI_task1a_2 | Puy2021 | 27 | 65.4 | 59.4 | 70.5 | 49.0 | 68.6 | 69.6 | 53.9 | 56.0 | 76.9 | 69.7 | 74.9 | 67.5 | 63.1 | 70.7 | |
Puy_VAI_task1a_3 | Puy2021 | 22 | 66.2 | 60.1 | 71.2 | 51.9 | 67.9 | 66.9 | 53.9 | 59.9 | 77.1 | 71.2 | 74.2 | 68.9 | 67.5 | 68.3 | |
Qiao_NCUT_task1a_1 | Qiao2021 | 88 | 52.2 | 50.7 | 53.5 | 42.8 | 54.4 | 55.7 | 53.3 | 47.1 | 55.1 | 49.4 | 54.2 | 52.1 | 54.9 | 55.4 | |
Seo_SGU_task1a_1 | Seo2021 | 32 | 70.3 | 67.4 | 72.8 | 53.2 | 71.1 | 73.5 | 71.0 | 68.3 | 75.1 | 73.3 | 70.3 | 71.0 | 72.6 | 74.3 | |
Seo_SGU_task1a_2 | Seo2021 | 41 | 71.4 | 67.7 | 74.4 | 46.8 | 73.3 | 77.2 | 71.9 | 69.2 | 78.5 | 73.6 | 74.9 | 72.2 | 72.8 | 74.6 | |
Seo_SGU_task1a_3 | Seo2021 | 35 | 71.3 | 67.6 | 74.4 | 49.6 | 72.5 | 74.3 | 72.4 | 69.2 | 77.2 | 73.2 | 73.8 | 72.6 | 74.0 | 75.8 | |
Seo_SGU_task1a_4 | Seo2021 | 44 | 71.8 | 67.6 | 75.3 | 47.4 | 73.6 | 75.1 | 73.5 | 68.5 | 79.2 | 72.4 | 75.6 | 73.1 | 75.3 | 76.7 | |
Singh_IITMandi_task1a_1 | Singh2021 | 77 | 47.2 | 41.5 | 51.9 | 40.8 | 45.0 | 51.4 | 34.7 | 35.7 | 63.6 | 50.7 | 57.9 | 47.6 | 42.9 | 48.5 | |
Singh_IITMandi_task1a_2 | Singh2021 | 83 | 44.7 | 40.0 | 48.5 | 41.0 | 47.9 | 49.2 | 30.1 | 31.9 | 60.6 | 47.6 | 54.3 | 46.0 | 41.0 | 41.7 | |
Singh_IITMandi_task1a_3 | Singh2021 | 82 | 46.1 | 40.9 | 50.4 | 40.6 | 47.6 | 50.8 | 33.8 | 31.9 | 63.7 | 48.5 | 55.1 | 47.1 | 42.6 | 45.1 | |
Singh_IITMandi_task1a_4 | Singh2021 | 81 | 46.8 | 41.3 | 51.5 | 45.1 | 47.2 | 50.7 | 32.5 | 30.8 | 61.0 | 48.6 | 58.6 | 48.8 | 44.7 | 47.2 | |
Sugahara_RION_task1a_1 | Sugahara2021 | 43 | 63.8 | 57.8 | 68.8 | 43.6 | 66.4 | 61.5 | 53.9 | 63.7 | 72.6 | 66.1 | 71.2 | 65.1 | 68.1 | 69.9 | |
Sugahara_RION_task1a_2 | Sugahara2021 | 36 | 65.2 | 58.2 | 71.0 | 42.2 | 67.2 | 62.1 | 55.0 | 64.7 | 72.8 | 69.4 | 72.4 | 67.1 | 71.8 | 72.6 | |
Sugahara_RION_task1a_3 | Sugahara2021 | 31 | 65.3 | 60.8 | 69.1 | 43.2 | 68.9 | 65.8 | 58.5 | 67.5 | 63.3 | 67.1 | 71.0 | 68.9 | 71.4 | 72.8 | |
Sugahara_RION_task1a_4 | Sugahara2021 | 68 | 64.7 | 57.9 | 70.4 | 43.1 | 65.0 | 61.8 | 54.4 | 65.3 | 72.4 | 68.3 | 71.9 | 67.2 | 70.7 | 71.8 | |
Verbitskiy_DS_task1a_1 | Verbitskiy2021 | 48 | 61.4 | 55.5 | 66.2 | 50.4 | 62.1 | 66.8 | 46.5 | 51.7 | 71.8 | 66.7 | 67.8 | 65.8 | 58.2 | 67.2 | |
Verbitskiy_DS_task1a_2 | Verbitskiy2021 | 29 | 64.5 | 60.4 | 67.8 | 55.6 | 66.2 | 68.6 | 55.6 | 56.1 | 73.1 | 68.8 | 71.2 | 65.6 | 62.1 | 66.4 | |
Verbitskiy_DS_task1a_3 | Verbitskiy2021 | 26 | 67.3 | 63.1 | 70.8 | 59.9 | 69.6 | 69.2 | 57.8 | 59.2 | 74.7 | 72.6 | 73.3 | 69.9 | 66.0 | 68.5 | |
Verbitskiy_DS_task1a_4 | Verbitskiy2021 | 19 | 68.1 | 64.2 | 71.4 | 61.4 | 69.3 | 68.5 | 64.4 | 57.2 | 76.4 | 71.2 | 74.0 | 68.5 | 68.1 | 70.0 | |
Yang_GT_task1a_1 | Yang2021 | 6 | 73.1 | 70.8 | 74.9 | 63.6 | 74.7 | 75.0 | 67.9 | 72.8 | 78.8 | 74.9 | 76.1 | 71.4 | 72.5 | 76.0 | |
Yang_GT_task1a_2 | Yang2021 | 4 | 72.9 | 70.0 | 75.4 | 61.7 | 74.6 | 73.9 | 67.8 | 71.9 | 78.5 | 74.7 | 77.8 | 71.4 | 73.3 | 76.5 | |
Yang_GT_task1a_3 | Yang2021 | 3 | 72.9 | 70.1 | 75.1 | 62.1 | 74.6 | 74.3 | 67.8 | 71.9 | 78.3 | 74.0 | 77.9 | 71.1 | 73.3 | 76.1 | |
Yang_GT_task1a_4 | Yang2021 | 7 | 72.8 | 70.2 | 74.9 | 63.1 | 75.1 | 74.3 | 67.4 | 71.4 | 76.7 | 74.7 | 76.9 | 72.4 | 72.9 | 76.0 | |
Yihao_speakin_task1a_1 | Yihao2021 | 69 | 51.9 | 49.7 | 53.6 | 49.6 | 52.6 | 55.6 | 48.2 | 42.8 | 64.2 | 53.3 | 56.2 | 50.4 | 45.6 | 51.9 | |
Yihao_speakin_task1a_2 | Yihao2021 | 59 | 55.2 | 53.5 | 56.6 | 53.3 | 56.5 | 59.9 | 51.0 | 46.9 | 66.0 | 57.6 | 57.2 | 54.7 | 49.4 | 54.9 | |
Yihao_speakin_task1a_3 | Yihao2021 | 96 | 53.5 | 50.7 | 55.8 | 49.3 | 55.1 | 59.4 | 45.7 | 44.0 | 63.1 | 57.1 | 58.5 | 50.0 | 52.2 | 54.0 | |
Zhang_BUPT&BYTEDANCE_task1a_1 | Zhang2021 | 47 | 63.0 | 58.9 | 66.4 | 53.5 | 63.1 | 70.3 | 54.7 | 53.1 | 74.2 | 67.6 | 70.7 | 62.4 | 59.3 | 64.4 | |
Zhang_BUPT&BYTEDANCE_task1a_2 | Zhang2021 | 45 | 63.2 | 57.4 | 68.1 | 49.9 | 66.1 | 68.5 | 49.9 | 52.6 | 75.8 | 70.3 | 71.2 | 62.1 | 61.1 | 67.9 | |
Zhang_BUPT&BYTEDANCE_task1a_3 | Zhang2021 | 98 | 52.2 | 47.3 | 56.3 | 42.1 | 52.9 | 57.1 | 39.0 | 45.6 | 64.3 | 59.9 | 60.8 | 51.4 | 50.7 | 50.8 | |
Zhang_BUPT&BYTEDANCE_task1a_4 | Zhang2021 | 89 | 59.0 | 53.2 | 63.8 | 54.4 | 57.6 | 60.7 | 48.1 | 45.4 | 70.6 | 68.8 | 68.5 | 58.8 | 55.8 | 60.7 | |
Zhao_Maxvision_task1a_1 | Zhao2021 | 75 | 61.2 | 54.2 | 67.1 | 46.9 | 64.9 | 66.1 | 47.6 | 45.6 | 76.4 | 66.4 | 70.4 | 61.9 | 58.9 | 68.5 | |
Zhao_Maxvision_task1a_2 | Zhao2021 | 74 | 63.5 | 55.6 | 70.0 | 45.3 | 67.5 | 67.1 | 48.2 | 50.1 | 77.4 | 67.4 | 70.4 | 66.2 | 65.8 | 73.1 | |
Zhao_Maxvision_task1a_3 | Zhao2021 | 62 | 63.5 | 55.9 | 70.0 | 46.2 | 66.9 | 67.5 | 48.3 | 50.3 | 76.8 | 67.9 | 70.3 | 66.0 | 66.0 | 72.8 | |
Zhao_Maxvision_task1a_4 | Zhao2021 | 58 | 62.8 | 56.2 | 68.3 | 44.2 | 68.5 | 66.2 | 51.7 | 50.4 | 77.4 | 67.1 | 70.4 | 64.6 | 61.7 | 68.9 |
System characteristics
General characteristics
Rank | Submission label |
Technical Report |
Official system rank |
Logloss (Eval) |
Accuracy (Eval) |
Sampling rate |
Data augmentation |
Features | Embeddings |
---|---|---|---|---|---|---|---|---|---|
Byttebier_IDLab_task1a_1 | Byttebier2021 | 21 | 0.936 | 68.6 | 44.1kHz | mixup, temporal cropping, speed augmentation | log-mel energies | ||
Byttebier_IDLab_task1a_2 | Byttebier2021 | 18 | 0.914 | 67.5 | 44.1kHz | mixup, temporal cropping, speed augmentation | log-mel energies | ||
Byttebier_IDLab_task1a_3 | Byttebier2021 | 23 | 0.944 | 68.5 | 44.1kHz | mixup, temporal cropping, speed augmentation | log-mel energies | ||
Byttebier_IDLab_task1a_4 | Byttebier2021 | 17 | 0.905 | 68.8 | 44.1kHz | mixup, temporal cropping, speed augmentation | log-mel energies | ||
Cao_SCUT_task1a_1 | Cao2021 | 49 | 1.136 | 66.7 | 44.1kHz | mixup, time stretching,pitch shifting,spectrum correction | log-mel energies | ||
Cao_SCUT_task1a_2 | Cao2021 | 56 | 1.200 | 64.6 | 44.1kHz | mixup, time stretching,pitch shifting,spectrum correction | log-mel energies | ||
Cao_SCUT_task1a_3 | Cao2021 | 50 | 1.137 | 67.2 | 44.1kHz | mixup, time stretching,pitch shifting,spectrum correction | log-mel energies | ||
Cao_SCUT_task1a_4 | Cao2021 | 53 | 1.147 | 66.1 | 44.1kHz | mixup, time stretching,pitch shifting,spectrum correction | log-mel energies | ||
Ding_TJU_task1a_1 | Ding2021 | 85 | 1.544 | 53.0 | 44.1kHz | log-mel energies | |||
Ding_TJU_task1a_2 | Ding2021 | 70 | 1.326 | 51.1 | 44.1kHz | log-mel energies | |||
Ding_TJU_task1a_3 | Ding2021 | 61 | 1.226 | 49.1 | 44.1kHz | log-mel energies | |||
Ding_TJU_task1a_4 | Ding2021 | 67 | 1.296 | 51.4 | 44.1kHz | log-mel energies | |||
Fan_NWPU_task1a_1 | Cui2021 | 64 | 1.261 | 68.3 | 44.1kHz | reverb, filtering, random gain adjust, SpecAugment | log-mel energies | ||
Galindo-Meza_ITESO_task1a_1 | Galindo-Meza2021 | 97 | 2.221 | 53.9 | 16kHz | random noise, random gain, random cropping, mixup | raw waveform | AemNet | |
Heo_Clova_task1a_1 | Hee-Soo2021 | 42 | 1.087 | 67.0 | 44.1kHz | mixup, spectrum augmentation, device augmentation | log-mel energies | ||
Heo_Clova_task1a_2 | Hee-Soo2021 | 20 | 0.930 | 66.9 | 44.1kHz | mixup, tempo, channel corruption | log-mel energies | ||
Heo_Clova_task1a_3 | Hee-Soo2021 | 34 | 1.045 | 70.0 | 44.1kHz | mixup, spectrum augmentation, device augmentation | log-mel energies | ||
Heo_Clova_task1a_4 | Hee-Soo2021 | 12 | 0.871 | 70.1 | 44.1kHz | mixup, tempo, channel corruption | log-mel energies | ||
Horváth_HIT_task1a_1 | Horvth2021 | 86 | 1.597 | 51.4 | 44.1kHz | mixup, time stretching, pitch shifting, random noise, spectrum augmentation, random temporal shuffle, volume change | log-mel energies, HPSS | ||
Horváth_HIT_task1a_2 | Horvth2021 | 92 | 2.031 | 53.3 | 44.1kHz | mixup, time stretching, pitch shifting, random noise, spectrum augmentation, random temporal shuffle, volume change | log-mel energies, HPSS | ||
Horváth_HIT_task1a_3 | Horvth2021 | 76 | 1.460 | 51.6 | 44.1kHz | mixup, time stretching, pitch shifting, random noise, spectrum augmentation, random temporal shuffle, volume change | log-mel energies, HPSS | ||
Horváth_HIT_task1a_4 | Horvth2021 | 95 | 2.065 | 49.2 | 44.1kHz | mixup, time stretching, pitch shifting, random noise, spectrum augmentation, random temporal shuffle, volume change | log-mel energies, HPSS | ||
Jeng_CHT+NSYSU_task1a_1 | Jeng2021 | 78 | 1.469 | 55.0 | 44.1kHz | log-mel energies | |||
Jeng_CHT+NSYSU_task1a_2 | Jeng2021 | 84 | 1.543 | 51.3 | 44.1kHz | log-mel energies | |||
Jeng_CHT+NSYSU_task1a_3 | Jeng2021 | 79 | 1.470 | 56.3 | 44.1kHz | log-mel energies | |||
Jeong_ETRI_task1a_1 | Jeong2021 | 33 | 1.041 | 66.0 | 44.1kHz | temporal cropping | log-mel energies, deltas, delta-deltas | ||
Jeong_ETRI_task1a_2 | Jeong2021 | 25 | 0.952 | 67.0 | 44.1kHz | temporal cropping, SpecAugment | log-mel energies, deltas, delta-deltas | ||
Jeong_ETRI_task1a_3 | Jeong2021 | 30 | 1.023 | 66.7 | 44.1kHz | temporal cropping | log-mel energies, deltas, delta-deltas | ||
Jeong_ETRI_task1a_4 | Jeong2021 | 63 | 1.228 | 66.1 | 44.1kHz | temporal cropping, SpecAugment | log-mel energies, deltas, delta-deltas | ||
Kek_NU_task1a_1 | Kek2021 | 72 | 1.355 | 66.8 | 44.1kHz | Wavelet Scattering | |||
Kek_NU_task1a_2 | Kek2021 | 57 | 1.207 | 63.5 | 44.1kHz | Wavelet Scattering | |||
Kim_3M_task1a_1 | Kim2021 | 38 | 1.076 | 61.5 | 22.05kHz | mixup, SpecAugment | Perceptually-weighted log-mel energies | VGGish | |
Kim_3M_task1a_2 | Kim2021 | 39 | 1.077 | 61.6 | 22.05kHz | mixup, SpecAugment | Perceptually-weighted log-mel energies | VGGish | |
Kim_3M_task1a_3 | Kim2021 | 37 | 1.076 | 62.0 | 22.05kHz | mixup, SpecAugment | Perceptually-weighted log-mel energies | VGGish | |
Kim_3M_task1a_4 | Kim2021 | 40 | 1.078 | 61.3 | 22.05kHz | mixup, SpecAugment | Perceptually-weighted log-mel energies | VGGish | |
Kim_KNU_task1a_1 | Kim2021a | 46 | 1.115 | 64.7 | 44.1kHz | mixup | log-mel energies, delta-log-mel energies, delta-delta-log-mel energies | ||
Kim_KNU_task1a_2 | Kim2021a | 28 | 1.010 | 63.8 | 44.1kHz | mixup | log-mel energies, delta-log-mel energies, delta-delta-log-mel energies | ||
Kim_KNU_task1a_3 | Kim2021a | 55 | 1.188 | 61.3 | 44.1kHz | mixup | log-mel energies, delta-log-mel energies, delta-delta-log-mel energies | ||
Kim_KNU_task1a_4 | Kim2021a | 52 | 1.143 | 62.9 | 44.1kHz | mixup | log-mel energies, delta-log-mel energies, delta-delta-log-mel energies | ||
Kim_QTI_task1a_1 | Kim2021b | 8 | 0.793 | 75.0 | 16kHz | mixup, specaugment, time rolling | log-mel energies | ||
Kim_QTI_task1a_2 | Kim2021b | 1 | 0.724 | 76.1 | 16kHz | mixup, specaugment, time rolling | log-mel energies | ||
Kim_QTI_task1a_3 | Kim2021b | 2 | 0.735 | 76.1 | 16kHz | mixup, specaugment, time rolling | log-mel energies | ||
Kim_QTI_task1a_4 | Kim2021b | 5 | 0.764 | 75.2 | 16kHz | mixup, specaugment, time rolling | log-mel energies | ||
Koutini_CPJKU_task1a_1 | Koutini2021 | 14 | 0.883 | 70.9 | 22.05kHz | mixup, pitch shifting | Perceptually-weighted log-mel energies | ||
Koutini_CPJKU_task1a_2 | Koutini2021 | 10 | 0.842 | 71.8 | 22.05kHz | mixup, pitch shifting | Perceptually-weighted log-mel energies | ||
Koutini_CPJKU_task1a_3 | Koutini2021 | 9 | 0.834 | 72.1 | 22.05kHz | mixup, pitch shifting | Perceptually-weighted log-mel energies | ||
Koutini_CPJKU_task1a_4 | Koutini2021 | 11 | 0.847 | 71.8 | 22.05kHz | mixup, pitch shifting | Perceptually-weighted log-mel energies | ||
Lim_CAU_task1a_1 | Lim2021 | 90 | 1.956 | 67.5 | 44.1kHz | spectrogram | |||
Lim_CAU_task1a_2 | Lim2021 | 91 | 2.010 | 67.9 | 44.1kHz | spectrogram | |||
Lim_CAU_task1a_3 | Lim2021 | 80 | 1.479 | 68.5 | 44.1kHz | spectrogram | |||
Lim_CAU_task1a_4 | Lim2021 | 93 | 2.039 | 65.8 | 44.1kHz | spectrogram | |||
Liu_UESTC_task1a_1 | Liu2021 | 16 | 0.900 | 68.8 | 44.1kHz | HRTF,mixup,temporal cropping,spectrum correction | log-mel energies,deltas,delta-deltas | ||
Liu_UESTC_task1a_2 | Liu2021 | 15 | 0.895 | 68.2 | 44.1kHz | HRTF,mixup,temporal cropping,spectrum correction | log-mel energies,deltas,delta-deltas | ||
Liu_UESTC_task1a_3 | Liu2021 | 13 | 0.878 | 69.6 | 44.1kHz | mixup,temporal cropping | log-mel energies,deltas,delta-deltas | ||
Liu_UESTC_task1a_4 | Liu2021 | 87 | 1.626 | 42.0 | 44.1kHz | mixup | log-mel energies | ||
Madhu_CET_task1a_1 | Madhu2021 | 99 | 3.950 | 9.7 | 44.1kHz | time stretching, pitch shifting, dynamic range compression, background noise, mixup | wavelet based log-mel energies | ||
DCASE2021 baseline | 1.730 | 45.6 | 44.1kHz | log-mel energies | |||||
Naranjo-Alcazar_ITI_task1a_1 | Naranjo-Alcazar2021_t1a | 51 | 1.140 | 60.2 | 44.1kHz | mixup | gammatone spectrogram | ||
Pham_AIT_task1a_1 | Pham2021 | 73 | 1.368 | 67.5 | 44.1kHz | mixup | CQT, Gammatonegram, log-mel energies | ||
Pham_AIT_task1a_2 | Pham2021 | 54 | 1.187 | 68.4 | 44.1kHz | mixup | CQT, Gammatonegram, log-mel energies | ||
Pham_AIT_task1a_3 | Pham2021 | 94 | 2.058 | 69.6 | 44.1kHz | mixup | CQT, Gammatonegram, log-mel energies | ||
Phan_UIUC_task1a_1 | Phan2021 | 65 | 1.272 | 63.3 | 44.1kHz | mixup | log-mel energies, deltas, delta-deltas | ||
Phan_UIUC_task1a_2 | Phan2021 | 71 | 1.335 | 63.3 | 44.1kHz | mixup | log-mel energies, deltas, delta-deltas | ||
Phan_UIUC_task1a_3 | Phan2021 | 60 | 1.223 | 65.3 | 44.1kHz | mixup | log-mel energies, deltas, delta-deltas | ||
Phan_UIUC_task1a_4 | Phan2021 | 66 | 1.292 | 65.3 | 44.1kHz | mixup | log-mel energies, deltas, delta-deltas | ||
Puy_VAI_task1a_1 | Puy2021 | 24 | 0.952 | 66.6 | 44.1kHz | SpecAugment | log-mel energies | ||
Puy_VAI_task1a_2 | Puy2021 | 27 | 0.974 | 65.4 | 44.1kHz | SpecAugment, mixup | log-mel energies | ||
Puy_VAI_task1a_3 | Puy2021 | 22 | 0.939 | 66.2 | 44.1kHz | SpecAugment | log-mel energies | ||
Qiao_NCUT_task1a_1 | Qiao2021 | 88 | 1.630 | 52.2 | 44.1kHz | mixup | log-mel energies, deltas, delta-deltas | ||
Seo_SGU_task1a_1 | Seo2021 | 32 | 1.030 | 70.3 | 44.1kHz | mixup, spectrum augmentation, spectrum correction, pitch shifting, speed change, mix audios | log-mel energies, deltas, delta-deltas | ||
Seo_SGU_task1a_2 | Seo2021 | 41 | 1.080 | 71.4 | 44.1kHz | mixup, spectrum augmentation, spectrum correction, pitch shifting, speed change, mix audios | log-mel energies, deltas, delta-deltas | ||
Seo_SGU_task1a_3 | Seo2021 | 35 | 1.065 | 71.3 | 44.1kHz | mixup, spectrum augmentation, spectrum correction, pitch shifting, speed change, mix audios | log-mel energies, deltas, delta-deltas | ||
Seo_SGU_task1a_4 | Seo2021 | 44 | 1.087 | 71.8 | 44.1kHz | mixup, spectrum augmentation, spectrum correction, pitch shifting, speed change, mix audios | log-mel energies, deltas, delta-deltas | ||
Singh_IITMandi_task1a_1 | Singh2021 | 77 | 1.464 | 47.2 | 44.1kHz | log-mel energies | |||
Singh_IITMandi_task1a_2 | Singh2021 | 83 | 1.515 | 44.7 | 44.1kHz | log-mel energies | |||
Singh_IITMandi_task1a_3 | Singh2021 | 82 | 1.509 | 46.1 | 44.1kHz | log-mel energies | |||
Singh_IITMandi_task1a_4 | Singh2021 | 81 | 1.488 | 46.8 | 44.1kHz | log-mel energies | |||
Sugahara_RION_task1a_1 | Sugahara2021 | 43 | 1.087 | 63.8 | 44.1kHz | mixup, SpecAugment, time-shifting, spectrum modulation | log-mel powers | ||
Sugahara_RION_task1a_2 | Sugahara2021 | 36 | 1.070 | 65.2 | 44.1kHz | mixup, SpecAugment, time-shifting, spectrum modulation | log-mel powers | ||
Sugahara_RION_task1a_3 | Sugahara2021 | 31 | 1.024 | 65.3 | 44.1kHz | mixup, SpecAugment, time-shifting, spectrum modulation | log-mel powers | ||
Sugahara_RION_task1a_4 | Sugahara2021 | 68 | 1.297 | 64.7 | 44.1kHz | mixup, SpecAugment, time-shifting, spectrum modulation | log-mel powers | ||
Verbitskiy_DS_task1a_1 | Verbitskiy2021 | 48 | 1.127 | 61.4 | 44.1kHz | mixup, temporal cropping, SpecAugment | log-mel energies | ||
Verbitskiy_DS_task1a_2 | Verbitskiy2021 | 29 | 1.019 | 64.5 | 44.1kHz | mixup, temporal cropping, SpecAugment | log-mel energies | ||
Verbitskiy_DS_task1a_3 | Verbitskiy2021 | 26 | 0.966 | 67.3 | 44.1kHz | mixup, temporal cropping, SpecAugment | log-mel energies | ||
Verbitskiy_DS_task1a_4 | Verbitskiy2021 | 19 | 0.924 | 68.1 | 44.1kHz | mixup, temporal cropping, SpecAugment | log-mel energies | ||
Yang_GT_task1a_1 | Yang2021 | 6 | 0.768 | 73.1 | 44.1kHz | mixup, random cropping, channel confusion, SpecAugment, spectrum correction, reverberation-drc, pitch shifting, speed change, random noise, mix audios | log-mel energies | ||
Yang_GT_task1a_2 | Yang2021 | 4 | 0.764 | 72.9 | 44.1kHz | mixup, random cropping, channel confusion, SpecAugment, spectrum correction, reverberation-drc, pitch shifting, speed change, random noise, mix audios | log-mel energies | ||
Yang_GT_task1a_3 | Yang2021 | 3 | 0.758 | 72.9 | 44.1kHz | mixup, random cropping, channel confusion, SpecAugment, spectrum correction, reverberation-drc, pitch shifting, speed change, random noise, mix audios | log-mel energies | ||
Yang_GT_task1a_4 | Yang2021 | 7 | 0.774 | 72.8 | 44.1kHz | mixup, random cropping, channel confusion, SpecAugment, spectrum correction, reverberation-drc, pitch shifting, speed change, random noise, mix audios | log-mel energies | ||
Yihao_speakin_task1a_1 | Yihao2021 | 69 | 1.311 | 51.9 | 16kHz | SpecAugment | log-mel energies | ||
Yihao_speakin_task1a_2 | Yihao2021 | 59 | 1.222 | 55.2 | 16kHz | SpecAugment | log-mel energies | ||
Yihao_speakin_task1a_3 | Yihao2021 | 96 | 2.105 | 53.5 | 16kHz | SpecAugment | log-mel energies | ||
Zhang_BUPT&BYTEDANCE_task1a_1 | Zhang2021 | 47 | 1.124 | 63.0 | 44.1kHz | log-mel energies | |||
Zhang_BUPT&BYTEDANCE_task1a_2 | Zhang2021 | 45 | 1.113 | 63.2 | 44.1kHz | log-mel energies | |||
Zhang_BUPT&BYTEDANCE_task1a_3 | Zhang2021 | 98 | 3.359 | 52.2 | 44.1kHz | log-mel energies | |||
Zhang_BUPT&BYTEDANCE_task1a_4 | Zhang2021 | 89 | 1.946 | 59.0 | 44.1kHz | log-mel energies | |||
Zhao_Maxvision_task1a_1 | Zhao2021 | 75 | 1.440 | 61.2 | 44.1kHz | mixup, random cropping | log-mel energies, deltas, delta-deltas | ||
Zhao_Maxvision_task1a_2 | Zhao2021 | 74 | 1.412 | 63.5 | 44.1kHz | mixup, random cropping | log-mel energies, deltas, delta-deltas | ||
Zhao_Maxvision_task1a_3 | Zhao2021 | 62 | 1.227 | 63.5 | 44.1kHz | mixup, random cropping | log-mel energies, deltas, delta-deltas | ||
Zhao_Maxvision_task1a_4 | Zhao2021 | 58 | 1.215 | 62.8 | 44.1kHz | mixup, random cropping | log-mel energies, deltas, delta-deltas |
Machine learning characteristics
Rank | Code |
Technical Report |
Official system rank |
Logloss (Eval) |
Accuracy (Eval) |
External data usage |
External data sources |
Model complexity |
Classifier |
Ensemble subsystems |
Decision making |
---|---|---|---|---|---|---|---|---|---|---|---|
Byttebier_IDLab_task1a_1 | Byttebier2021 | 21 | 0.936 | 68.6 | 114634 | SE-ResNet | maximum logit | ||||
Byttebier_IDLab_task1a_2 | Byttebier2021 | 18 | 0.914 | 67.5 | 114634 | SE-ResNet | multinomial logistic regression | ||||
Byttebier_IDLab_task1a_3 | Byttebier2021 | 23 | 0.944 | 68.5 | 114634 | SE-ResNet | ovr logistic regression | ||||
Byttebier_IDLab_task1a_4 | Byttebier2021 | 17 | 0.905 | 68.8 | 82910 | SE-ResNet | maximum logit | ||||
Cao_SCUT_task1a_1 | Cao2021 | 49 | 1.136 | 66.7 | embeddings | 36658 | CNN | ||||
Cao_SCUT_task1a_2 | Cao2021 | 56 | 1.200 | 64.6 | embeddings | 36658 | CNN | ||||
Cao_SCUT_task1a_3 | Cao2021 | 50 | 1.137 | 67.2 | embeddings | 36658 | CNN | ||||
Cao_SCUT_task1a_4 | Cao2021 | 53 | 1.147 | 66.1 | embeddings | 51926 | CNN | ||||
Ding_TJU_task1a_1 | Ding2021 | 85 | 1.544 | 53.0 | 40230 | CNN | |||||
Ding_TJU_task1a_2 | Ding2021 | 70 | 1.326 | 51.1 | 20250 | CNN | |||||
Ding_TJU_task1a_3 | Ding2021 | 61 | 1.226 | 49.1 | 63816 | CNN | majority vote | ||||
Ding_TJU_task1a_4 | Ding2021 | 67 | 1.296 | 51.4 | 20250 | CNN | |||||
Fan_NWPU_task1a_1 | Cui2021 | 64 | 1.261 | 68.3 | embeddings | 93323 | ResNet, Attention | ||||
Galindo-Meza_ITESO_task1a_1 | Galindo-Meza2021 | 97 | 2.221 | 53.9 | pre-trained model | Audioset | 127637 | CNN | Maximum softmax | ||
Heo_Clova_task1a_1 | Hee-Soo2021 | 42 | 1.087 | 67.0 | 65424 | CNN | |||||
Heo_Clova_task1a_2 | Hee-Soo2021 | 20 | 0.930 | 66.9 | 63547 | CNN | |||||
Heo_Clova_task1a_3 | Hee-Soo2021 | 34 | 1.045 | 70.0 | 65424 | CNN | |||||
Heo_Clova_task1a_4 | Hee-Soo2021 | 12 | 0.871 | 70.1 | 63547 | CNN | |||||
Horváth_HIT_task1a_1 | Horvth2021 | 86 | 1.597 | 51.4 | 47939 | MobileNetV2 | |||||
Horváth_HIT_task1a_2 | Horvth2021 | 92 | 2.031 | 53.3 | 47939 | MobileNetV2, ArcFace | |||||
Horváth_HIT_task1a_3 | Horvth2021 | 76 | 1.460 | 51.6 | 58266 | ResNet | |||||
Horváth_HIT_task1a_4 | Horvth2021 | 95 | 2.065 | 49.2 | 58266 | ResNet, ArcFace | |||||
Jeng_CHT+NSYSU_task1a_1 | Jeng2021 | 78 | 1.469 | 55.0 | 130457242 | CNN | logistical regression | ||||
Jeng_CHT+NSYSU_task1a_2 | Jeng2021 | 84 | 1.543 | 51.3 | 130457242 | CNN | logistical regression | ||||
Jeng_CHT+NSYSU_task1a_3 | Jeng2021 | 79 | 1.470 | 56.3 | 17186944 | CNN | logistical regression | ||||
Jeong_ETRI_task1a_1 | Jeong2021 | 33 | 1.041 | 66.0 | 54845 | ResNet | |||||
Jeong_ETRI_task1a_2 | Jeong2021 | 25 | 0.952 | 67.0 | 54845 | ResNet | |||||
Jeong_ETRI_task1a_3 | Jeong2021 | 30 | 1.023 | 66.7 | 60236 | ResNet | |||||
Jeong_ETRI_task1a_4 | Jeong2021 | 63 | 1.228 | 66.1 | 60236 | ResNet | |||||
Kek_NU_task1a_1 | Kek2021 | 72 | 1.355 | 66.8 | 63448 | CNN, MobileNetV2 | |||||
Kek_NU_task1a_2 | Kek2021 | 57 | 1.207 | 63.5 | 64850 | CNN, MobileNetV2, Group convolution, Channel attention | |||||
Kim_3M_task1a_1 | Kim2021 | 38 | 1.076 | 61.5 | pre-trained weights of Vggish | 168778 | CNN | ||||
Kim_3M_task1a_2 | Kim2021 | 39 | 1.077 | 61.6 | pre-trained weights of Vggish | 168778 | CNN | ||||
Kim_3M_task1a_3 | Kim2021 | 37 | 1.076 | 62.0 | pre-trained weights of Vggish | 168778 | CNN | ||||
Kim_3M_task1a_4 | Kim2021 | 40 | 1.078 | 61.3 | pre-trained weights of Vggish | 168778 | CNN | ||||
Kim_KNU_task1a_1 | Kim2021a | 46 | 1.115 | 64.7 | 58472 | ResNet | |||||
Kim_KNU_task1a_2 | Kim2021a | 28 | 1.010 | 63.8 | 64064 | CNN (Inception) | |||||
Kim_KNU_task1a_3 | Kim2021a | 55 | 1.188 | 61.3 | 58472 | ResNet | |||||
Kim_KNU_task1a_4 | Kim2021a | 52 | 1.143 | 62.9 | 58472 | ResNet | |||||
Kim_QTI_task1a_1 | Kim2021b | 8 | 0.793 | 75.0 | 630042 | CNN, BC-ResNet | 2 | maximum likelihood | |||
Kim_QTI_task1a_2 | Kim2021b | 1 | 0.724 | 76.1 | 630042 | CNN, BC-ResNet | 2 | maximum likelihood | |||
Kim_QTI_task1a_3 | Kim2021b | 2 | 0.735 | 76.1 | 630042 | CNN, BC-ResNet | 2 | maximum likelihood | |||
Kim_QTI_task1a_4 | Kim2021b | 5 | 0.764 | 75.2 | 314990 | CNN, BC-ResNet | maximum likelihood | ||||
Koutini_CPJKU_task1a_1 | Koutini2021 | 14 | 0.883 | 70.9 | 504104 | RF-regularized CNNs | |||||
Koutini_CPJKU_task1a_2 | Koutini2021 | 10 | 0.842 | 71.8 | 678184 | RF-regularized CNNs | |||||
Koutini_CPJKU_task1a_3 | Koutini2021 | 9 | 0.834 | 72.1 | 635176 | RF-regularized CNNs | |||||
Koutini_CPJKU_task1a_4 | Koutini2021 | 11 | 0.847 | 71.8 | 641320 | RF-regularized CNNs | |||||
Lim_CAU_task1a_1 | Lim2021 | 90 | 1.956 | 67.5 | 89910 | CNN | |||||
Lim_CAU_task1a_2 | Lim2021 | 91 | 2.010 | 67.9 | 89910 | CNN | |||||
Lim_CAU_task1a_3 | Lim2021 | 80 | 1.479 | 68.5 | 134748 | CNN | |||||
Lim_CAU_task1a_4 | Lim2021 | 93 | 2.039 | 65.8 | 56046 | CNN | |||||
Liu_UESTC_task1a_1 | Liu2021 | 16 | 0.900 | 68.8 | 643194 | ResNet | |||||
Liu_UESTC_task1a_2 | Liu2021 | 15 | 0.895 | 68.2 | 268362 | ResNet | |||||
Liu_UESTC_task1a_3 | Liu2021 | 13 | 0.878 | 69.6 | 268362 | ResNet | |||||
Liu_UESTC_task1a_4 | Liu2021 | 87 | 1.626 | 42.0 | 60928 | CNN | |||||
Madhu_CET_task1a_1 | Madhu2021 | 99 | 3.950 | 9.7 | 42774 | CNN | |||||
DCASE2021 baseline | 1.730 | 45.6 | embeddings | 46246 | CNN | ||||||
Naranjo-Alcazar_ITI_task1a_1 | Naranjo-Alcazar2021_t1a | 51 | 1.140 | 60.2 | 50130 | CNN | |||||
Pham_AIT_task1a_1 | Pham2021 | 73 | 1.368 | 67.5 | 10909 | CNN | 3 | PROD late fusion | |||
Pham_AIT_task1a_2 | Pham2021 | 54 | 1.187 | 68.4 | 10909 | CNN | 3 | PROD late fusion | |||
Pham_AIT_task1a_3 | Pham2021 | 94 | 2.058 | 69.6 | 10909 | CNN | 3 | PROD late fusion | |||
Phan_UIUC_task1a_1 | Phan2021 | 65 | 1.272 | 63.3 | 41356 | CNN | |||||
Phan_UIUC_task1a_2 | Phan2021 | 71 | 1.335 | 63.3 | 41356 | CNN | |||||
Phan_UIUC_task1a_3 | Phan2021 | 60 | 1.223 | 65.3 | 41356 | CNN | |||||
Phan_UIUC_task1a_4 | Phan2021 | 66 | 1.292 | 65.3 | 41356 | CNN | |||||
Puy_VAI_task1a_1 | Puy2021 | 24 | 0.952 | 66.6 | 62474 | CNN | 30 | average | |||
Puy_VAI_task1a_2 | Puy2021 | 27 | 0.974 | 65.4 | 62474 | CNN | 30 | average | |||
Puy_VAI_task1a_3 | Puy2021 | 22 | 0.939 | 66.2 | 62474 | CNN | 30 | average | |||
Qiao_NCUT_task1a_1 | Qiao2021 | 88 | 1.630 | 52.2 | 31852 | ResNet ensemble | 2 | average | |||
Seo_SGU_task1a_1 | Seo2021 | 32 | 1.030 | 70.3 | 101173 | MobileNet | |||||
Seo_SGU_task1a_2 | Seo2021 | 41 | 1.080 | 71.4 | 99557 | MobileNet | |||||
Seo_SGU_task1a_3 | Seo2021 | 35 | 1.065 | 71.3 | 99614 | MobileNet | |||||
Seo_SGU_task1a_4 | Seo2021 | 44 | 1.087 | 71.8 | 99603 | MobileNet | |||||
Singh_IITMandi_task1a_1 | Singh2021 | 77 | 1.464 | 47.2 | embeddings | 14754 | CNN | ||||
Singh_IITMandi_task1a_2 | Singh2021 | 83 | 1.515 | 44.7 | embeddings | 27166 | CNN | ||||
Singh_IITMandi_task1a_3 | Singh2021 | 82 | 1.509 | 46.1 | embeddings | 38110 | CNN | ||||
Singh_IITMandi_task1a_4 | Singh2021 | 81 | 1.488 | 46.8 | embeddings | 36578 | CNN | ||||
Sugahara_RION_task1a_1 | Sugahara2021 | 43 | 1.087 | 63.8 | 339730 | ResNet, ensemble | 5 | weighted score average | |||
Sugahara_RION_task1a_2 | Sugahara2021 | 36 | 1.070 | 65.2 | 339730 | ResNet, ensemble | 5 | score average | |||
Sugahara_RION_task1a_3 | Sugahara2021 | 31 | 1.024 | 65.3 | 203838 | ResNet, ensemble | 3 | score average | |||
Sugahara_RION_task1a_4 | Sugahara2021 | 68 | 1.297 | 64.7 | 255940 | ResNet, ensemble | 3 | weighted score average | |||
Verbitskiy_DS_task1a_1 | Verbitskiy2021 | 48 | 1.127 | 61.4 | 62090 | CNN, EfficientNetV2 | |||||
Verbitskiy_DS_task1a_2 | Verbitskiy2021 | 29 | 1.019 | 64.5 | 62154 | CNN, EfficientNetV2 | |||||
Verbitskiy_DS_task1a_3 | Verbitskiy2021 | 26 | 0.966 | 67.3 | 62282 | CNN, EfficientNetV2 | |||||
Verbitskiy_DS_task1a_4 | Verbitskiy2021 | 19 | 0.924 | 68.1 | 62346 | CNN, EfficientNetV2 | |||||
Yang_GT_task1a_1 | Yang2021 | 6 | 0.768 | 73.1 | 4410180 | Inception | 5 | average | |||
Yang_GT_task1a_2 | Yang2021 | 4 | 0.764 | 72.9 | 14640720 | Inception | 20 | average | |||
Yang_GT_task1a_3 | Yang2021 | 3 | 0.758 | 72.9 | 7056288 | Inception | 8 | average | |||
Yang_GT_task1a_4 | Yang2021 | 7 | 0.774 | 72.8 | 7056288 | Inception | 8 | average | |||
Yihao_speakin_task1a_1 | Yihao2021 | 69 | 1.311 | 51.9 | 48075 | CNN | |||||
Yihao_speakin_task1a_2 | Yihao2021 | 59 | 1.222 | 55.2 | 63244 | CNN | |||||
Yihao_speakin_task1a_3 | Yihao2021 | 96 | 2.105 | 53.5 | 50952 | CNN | |||||
Zhang_BUPT&BYTEDANCE_task1a_1 | Zhang2021 | 47 | 1.124 | 63.0 | 83572 | ResNet | |||||
Zhang_BUPT&BYTEDANCE_task1a_2 | Zhang2021 | 45 | 1.113 | 63.2 | 83572 | ResNet | |||||
Zhang_BUPT&BYTEDANCE_task1a_3 | Zhang2021 | 98 | 3.359 | 52.2 | 87011 | ResNet | |||||
Zhang_BUPT&BYTEDANCE_task1a_4 | Zhang2021 | 89 | 1.946 | 59.0 | 86516 | ResNet | |||||
Zhao_Maxvision_task1a_1 | Zhao2021 | 75 | 1.440 | 61.2 | 59421 | MobileNet | 2 | model weights average | |||
Zhao_Maxvision_task1a_2 | Zhao2021 | 74 | 1.412 | 63.5 | 59421 | MobileNet | 2 | model weights average | |||
Zhao_Maxvision_task1a_3 | Zhao2021 | 62 | 1.227 | 63.5 | 59421 | MobileNet | 2 | model weights average | |||
Zhao_Maxvision_task1a_4 | Zhao2021 | 58 | 1.215 | 62.8 | 59421 | MobileNet | 2 | model weights average |
Technical reports
Small-Footprint Acoustic Scene Classification Through 8-Bit Quantization-Aware Training and Pruning of ResNet Models
Laurens Byttebier, Brecht Desplanques, Jenthe Thienpondt, Siyuan Song, Kris Demuynck and Nilesh Madhu
ELIS, Ghent University - imec, Ghent, Belgium
Byttebier_IDLab_task1a_1 Byttebier_IDLab_task1a_2 Byttebier_IDLab_task1a_3 Byttebier_IDLab_task1a_4
Small-Footprint Acoustic Scene Classification Through 8-Bit Quantization-Aware Training and Pruning of ResNet Models
Laurens Byttebier, Brecht Desplanques, Jenthe Thienpondt, Siyuan Song, Kris Demuynck and Nilesh Madhu
ELIS, Ghent University - imec, Ghent, Belgium
Abstract
This report describes the IDLab submissions for Task 1a of the DCASE Challenge 2021. The challenge consists of constructing an acoustic scene classification model with a size of less than 128 KB. All submitted systems consist of a ResNet based model enhanced with Squeeze-and-Excitation (SE) blocks trained with temporal cropping, time domain mixup and speed-change augmentation strategies. Grouped convolutions are incorporated in all models to reduce the model complexity. Three submissions are based on 8-bit quantization-aware training with a fusion of batch norm and convolutional layers to reduce the parameter count even further. Further, two of these three systems explore multi-class score calibration by means of multinomial or one-vs-rest logistic regression. The calibration is then fused with the final linear output layer of the network to avoid an increase in model size. The fourth submission explores parameter pruning on a model with 16-bit weights as an alternative to the 8-bit weight quantization. The uncalibrated 8-bit model out- performs the pruned 16-bit model slightly and achieves a log loss of 0.82 and an accuracy of 71.2% on the standard test set of the TAU Urban Acoustic Scenes 2020 Mobile development dataset.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, temporal cropping, speed augmentation |
Features | log-mel energies |
Classifier | SE-ResNet |
Decision making | maximum logit; multinomial logistic regression; ovr logistic regression |
Complexity management | weight quantization, grouped convolutions, Conv+BN fusion; weight quantization, grouped convolutions, pruning |
Acoustic Scene Classification Using Lightweight ResNet with Attention
Wenchang Cao, Yanxiong Li and Qisheng Huang
School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China
Cao_SCUT_task1a_1 Cao_SCUT_task1a_2 Cao_SCUT_task1a_3 Cao_SCUT_task1a_4
Acoustic Scene Classification Using Lightweight ResNet with Attention
Wenchang Cao, Yanxiong Li and Qisheng Huang
School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China
Abstract
This technical report describes our system for the subtask A (Low-Complexity Acoustic Scene Classification with Multiple Devices) of Task1 (Acoustic Scene Classification) of the DCASE2021 Challenge. Due to the limited space-complexity of the model, we choose ResNet with depthwise separable convolution as our backbone network, and introduce the attention mechanism to the network. In addition, some data augmentation techniques, such as Mixup, Spectrum correction, are adopted for expanding the diversities of dataset. Our system achieves the accuracy rate of 71.6% on the development dataset, and the model size meets the requirement of subtask A.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, time stretching,pitch shifting,spectrum correction |
Features | log-mel energies |
Classifier | CNN |
Complexity management | weight quantization |
Consistency Learning Based Acoustic Scene Classification with Res-Attention
MengFan Cui, Fan Kui and Liyong Guo
Northwestern Polytechnic University, China
Fan_NWPU_task1a_1
Consistency Learning Based Acoustic Scene Classification with Res-Attention
MengFan Cui, Fan Kui and Liyong Guo
Northwestern Polytechnic University, China
Abstract
In this report, we propose a consistency learning based method with different data augmentation methods to tackle Acoustic Scene Classification task1a in the DCASE2021 Challenge. Classification of data from multiple devices (real and simulated) targeting generalization properties of systems across a number of different devices and focusing on low- complexity solutions. Consistency learning is used to reduce the embedding distance of the augmented sample and the original sample. With the consistency learning, the algorithm is robust with device variances. For low-complexity and high-accuracy, a Res-Attention structure which combines residual structure with separable convolution layer and attention layer is proposed. On Task1a development dataset, the presented method gets 69.71% accuracy (0.87 log CrossEntropy loss) with the model size 93.3KB by using int8 quantization.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | reverb, filtering, random gain adjust, SpecAugment |
Features | log-mel energies |
Classifier | ResNet, Attention |
Complexity management | weight quantization |
Low-Complexity Acoustic Scene Classification Using Simple CNN
Biyun Ding
School of Electrical and Information Engineering, Tianjin University, Tianjin, China
Ding_TJU_task1a_1 Ding_TJU_task1a_2 Ding_TJU_task1a_3 Ding_TJU_task1a_4
Low-Complexity Acoustic Scene Classification Using Simple CNN
Biyun Ding
School of Electrical and Information Engineering, Tianjin University, Tianjin, China
Abstract
This technical report describes our Acoustic Scene Classification systems for DCASE2021 challenge Task1A: Low-Complexity Acoustic Scene Classification with Multiple Devices. In this work, many factors affect the performance. To improve the performance while ensure the model complexity, we attempt different methods in term of features, sampling rate, channel, classifier type, the network architecture of CNN, and the post- processing of predictions. According to the experiments on TAU urban acoustic scenes 2020 mobile development dataset, the best accuracy of single system we implemented is 55.89%, which is an improvement of 7% compared to Baseline CNN. Besides, the accuracy of the late fusion is 59.80%, which is an improvement of 11.35% compared to Baseline CNN.
System characteristics
Sampling rate | 44.1kHz |
Features | log-mel energies |
Classifier | CNN |
Decision making | majority vote |
Complexity management | weight quantization |
End-To-End CNN Optimization for Low-Complexity Acoustic Scene Classification in the DCASE 2021 Challenge
Carlos Alberto Galindo-Meza1, Juan Antonio Del Hoyo Ontiveros2, Jose Torres Ortega3 and Paulo Lopez-Meyer2
1Departamento de Electronica, Sistemas e Informatica, Instituto Tecnologico de Estudios Superiores de Occidente, Jalisco, Mexico, 2Intel Labs, Intel Corporation, Jalisco, Mexico, 3Intel Labs, Intel Corporation, California, USA
Galindo-Meza_ITESO_task1a_1
End-To-End CNN Optimization for Low-Complexity Acoustic Scene Classification in the DCASE 2021 Challenge
Carlos Alberto Galindo-Meza1, Juan Antonio Del Hoyo Ontiveros2, Jose Torres Ortega3 and Paulo Lopez-Meyer2
1Departamento de Electronica, Sistemas e Informatica, Instituto Tecnologico de Estudios Superiores de Occidente, Jalisco, Mexico, 2Intel Labs, Intel Corporation, Jalisco, Mexico, 3Intel Labs, Intel Corporation, California, USA
Abstract
For the DCASE 2021 challenge we implemented an optimization pipeline to comply with the low-complexity restrictions specified with the Task 1a constraints. Initially, we trained and validated an end-to-end convolutional neural networks-based audio classification model following a typical deep learning training strategy. We then applied an efficient pruning procedure based on the lottery ticket hypothesis, and finally we executed a training-aware quantization to convert the model’s weights from FP32 to INT8 format. Experimentation proved the feasibility of this approach by obtaining accuracy results above the baseline models reported in the challenge guidelines.
System characteristics
Sampling rate | 16kHz |
Data augmentation | random noise, random gain, random cropping, mixup |
Features | raw waveform |
Embeddings | AemNet |
Classifier | CNN |
Decision making | Maximum softmax |
Complexity management | pruning, int8 weight quantization |
Clova Submission for the DCASE 2021 Challenge: Acoustic Scene Classification Using Light Architectures and Device Augmentation
Heo Hee-Soo1, Jung Jee-weon1, Shim Hye-jin2 and Lee Bong-Jin1
1Naver Corporation, Seongnam, South Korea, 2Computer Science, University of Seoul, Seoul, South Korea
Heo_Clova_task1a_1 Heo_Clova_task1a_2 Heo_Clova_task1a_3 Heo_Clova_task1a_4
Clova Submission for the DCASE 2021 Challenge: Acoustic Scene Classification Using Light Architectures and Device Augmentation
Heo Hee-Soo1, Jung Jee-weon1, Shim Hye-jin2 and Lee Bong-Jin1
1Naver Corporation, Seongnam, South Korea, 2Computer Science, University of Seoul, Seoul, South Korea
Abstract
This technical report addresses the submitted system of Naver Clova for the DCASE 2021 challenge task 1-a. The aim is to develop an acoustic scene classification system that can generalize towards unknown devices using a DNN with a limited number of parameters. We propose two lightweight architectures using residual networks, a method referred to as attentive max feature map, and multitask learning. After the initial training, the model is further fine-tuned using knowledge distillation. Two augmentation methods are also explored to simulate various recording devices. The proposed two architectures have 63,547 and 65,424 non-zeros parameters with a 16-bit resolution, both less than 128KB. Following the official protocol of train and test set split from the TAU Urban Acoustic Scenes 2020 Mobile development dataset, each model achieves 70.48% and 69.68% accuracy respectively.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, spectrum augmentation, device augmentation; mixup, tempo, channel corruption |
Features | log-mel energies |
Classifier | CNN |
Complexity management | weight quantization |
Using Arcface Metric Learning for Low-Complexity Acoustic Scene Classification
Kristóf Horváth, Harsh Purohit, Yohei Kawaguchi, Ryo Tanabe, Kota Dohi, Takashi Endo, Masaaki Yamamoto and Tomoya Nishida
Hitachi Ltd., Tokyo, Japan
Horváth_HIT_task1a_1 Horváth_HIT_task1a_2 Horváth_HIT_task1a_3 Horváth_HIT_task1a_4
Using Arcface Metric Learning for Low-Complexity Acoustic Scene Classification
Kristóf Horváth, Harsh Purohit, Yohei Kawaguchi, Ryo Tanabe, Kota Dohi, Takashi Endo, Masaaki Yamamoto and Tomoya Nishida
Hitachi Ltd., Tokyo, Japan
Abstract
In this technical report we present our submissions for DCASE 2021 Challenge Task 1A. For the low-complexity model, we used both a MobileNetV2-based model and a ResNet-based model with reduced number of layers and trained it using ArcFace metric learning. To increase the accuracy, we used test-time augmentation (TTA) during inference. On the development dataset, our models attain an ASC accuracy of around 54–55%, while having less than 128 kB of total parameters.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, time stretching, pitch shifting, random noise, spectrum augmentation, random temporal shuffle, volume change |
Features | log-mel energies, HPSS |
Classifier | MobileNetV2; MobileNetV2, ArcFace; ResNet; ResNet, ArcFace |
Complexity management | weight quantization |
Diverse Sparsity System Using Convolution Neural Network
Hui Hsin Jeng1, Chia-Ping Chen1, Chung Li Lu2 and Bo-Cheng Chan2
1Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan, 2Chunghwa Telecom, Taoyuan, Taiwan
Jeng_CHT+NSYSU_task1a_1 Jeng_CHT+NSYSU_task1a_2 Jeng_CHT+NSYSU_task1a_3
Diverse Sparsity System Using Convolution Neural Network
Hui Hsin Jeng1, Chia-Ping Chen1, Chung Li Lu2 and Bo-Cheng Chan2
1Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan, 2Chunghwa Telecom, Taoyuan, Taiwan
Abstract
In this technical report, we present our works on pruning convolution neural networks and using the quantization method to reduce parameters. DCASE2021 subtask 1A limit classifier size smaller than DCASE2020 subtask 1B with only 128 KB. Therefore we propose three pruning and quantization methods on Convolution Neural Networks. To prune the bigger network ( FCNN ) with single sparsity or diverse sparsity and quantization method. Another proposed method is simply pruning a smaller network ( MobNet ) with single sparsity and quantization method. Our best system performs 1.428 on validation log loss.
System characteristics
Sampling rate | 44.1kHz |
Features | log-mel energies |
Classifier | CNN |
Decision making | logistical regression |
Complexity management | sparsity, weight quantization |
Trident Resnets with Low Complexity for Acoustic Scene Classification
Youngho Jeong, Sooyoung Park and Taejin Lee
Media Coding Research Section, Electronics and Telecommunications Research Institute, Daejeon, Republic of Korea
Jeong_ETRI_task1a_1 Jeong_ETRI_task1a_2 Jeong_ETRI_task1a_3 Jeong_ETRI_task1a_4
Trident Resnets with Low Complexity for Acoustic Scene Classification
Youngho Jeong, Sooyoung Park and Taejin Lee
Media Coding Research Section, Electronics and Telecommunications Research Institute, Daejeon, Republic of Korea
Abstract
This technical report describes our Acoustic Scene Classification systems for DCASE2021 challenge Task1 subtask A. We designed two Trident ResNets with three parallel path, which is targeted to low complexity. The trident structure with respect to the frequency domain is beneficial when analyzing samples collected from minority or unseen devices. To satisfy the model size requirement, we replaced a standard convolution with a depthwise separable convolution and applied weight quantization to the trained model. As a result of performance evaluation, Trident ResNet B trained by applying data augmentation showed a log loss of 0.968 and a classification accuracy of 65.8% for the test split.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | temporal cropping; temporal cropping, SpecAugment |
Features | log-mel energies, deltas, delta-deltas |
Classifier | ResNet |
Complexity management | weight quantization, depthwise separable convolutions |
Technical Paper: Deep Scattering Spectrum with Mobile Network for Low Complexity Acoustic Scene Classification
Xing Yong Kek1, Cheng Siong Chin1 and Li Ye2
1Faculty if Science, Agriculture & Engineering, Newcastle University, Singapore, 2Xylem Inc, Singapore
Kek_NU_task1a_1 Kek_NU_task1a_2
Technical Paper: Deep Scattering Spectrum with Mobile Network for Low Complexity Acoustic Scene Classification
Xing Yong Kek1, Cheng Siong Chin1 and Li Ye2
1Faculty if Science, Agriculture & Engineering, Newcastle University, Singapore, 2Xylem Inc, Singapore
Abstract
We present a technical paper that provide details of our classification model submitted to DCASE 2021 Task1a challenge. In this paper, we proposed the use of DSS with mobile network to tackle low complexity computation.
System characteristics
Sampling rate | 44.1kHz |
Features | Wavelet Scattering |
Classifier | CNN, MobileNetV2; CNN, MobileNetV2, Group convolution, Channel attention |
Complexity management | weight quantization |
Building Light-Weight Convolutional Neural Networks for Acoustic Scene Classification Using Audio Embeddings
Bongjun Kim
3M, Saint Paul, United States
Kim_3M_task1a_1 Kim_3M_task1a_2 Kim_3M_task1a_3 Kim_3M_task1a_4
Building Light-Weight Convolutional Neural Networks for Acoustic Scene Classification Using Audio Embeddings
Bongjun Kim
3M, Saint Paul, United States
Abstract
This technical report describes acoustic scene classification mod- els from our submissions for DCASE challenge 2021-task1A. The task is to build a system to perform classification on acoustic scene data. The dataset has 10 acoustic scene labels. Our submissions are Convolutional Neural Network (CNN)-based models which consist of 3 convolutional layers and 1 fully-connected layer. We utilize a small subset of deep audio embedding that has been pre-trained on a large scale of a dataset. We also perform quantization and pruning to reduce the complexity of models to meet the size limit of 128KB for the challenge. We compare the performance of our models with the baseline approach on the provided test dataset. The results show that our models outperform the baseline system.
System characteristics
Sampling rate | 22.05kHz |
Data augmentation | mixup, SpecAugment |
Features | Perceptually-weighted log-mel energies |
Embeddings | VGGish |
Classifier | CNN |
Complexity management | weight quantization, pruning |
Acoustic Scene Classification with Decomposed Convolution Neural Networks
Minhan Kim1, SeungHyeon Shin1, Seungjae Baek1, Seokjin Lee2, Sooyoung Park3 and Youngho Jeong3
1School of Electronic and Electrical Engineering, Kyungpook National University, Daegu, Republic of Korea, 2School of Electronics Engineering, School of Electronic and Electrical Engineering, Kyungpook National University, Daegu, Republic of Korea, 3Electronics and Telecommunications Research Institute, Daejeon, Republic of Korea
Kim_KNU_task1a_1 Kim_KNU_task1a_2 Kim_KNU_task1a_3 Kim_KNU_task1a_4
Acoustic Scene Classification with Decomposed Convolution Neural Networks
Minhan Kim1, SeungHyeon Shin1, Seungjae Baek1, Seokjin Lee2, Sooyoung Park3 and Youngho Jeong3
1School of Electronic and Electrical Engineering, Kyungpook National University, Daegu, Republic of Korea, 2School of Electronics Engineering, School of Electronic and Electrical Engineering, Kyungpook National University, Daegu, Republic of Korea, 3Electronics and Telecommunications Research Institute, Daejeon, Republic of Korea
Abstract
This report describes a model submitted to DCASE2021 Task 1 sub- task A. Our model is developed by applying canonical polyadic decomposition to the conventional convolutional-neural-network- based models to reduce the model size to achieve the goal of Task 1A. More specifically, we apply the decomposition method to dual ResNet, which divides the features into two parts along the frequency axis and processes them independently, and shallow inception model. In order to evaluate our model, a simulation for acoustic scene classification was performed with the development dataset of DCASE 2021 Task 1A, and our model showed about log loss of 1.03-1.06 and macro accuracy of 62%-66% far better than that of the baseline model. Also, the model size of our system is smaller than 128 kbytes, which is the limit of the DCASE2021 Task 1A.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup |
Features | log-mel energies, delta-log-mel energies, delta-delta-log-mel energies |
Classifier | ResNet; CNN (Inception) |
Complexity management | CP-decomposition, weight quantization; parameter sharing, weight quantization |
QTI Submission to DCASE 2021: Residual Normalization for Device-Imbalanced Acoustic Scene Classification with Efficient Design
Byeonggeun Kim, Seunghan Yang, Jangho Kim and Simyung Chang
Qualcomm AI Research, Qualcomm Korea YH, Seoul, Korea
Kim_QTI_task1a_1 Kim_QTI_task1a_2 Kim_QTI_task1a_3 Kim_QTI_task1a_4
QTI Submission to DCASE 2021: Residual Normalization for Device-Imbalanced Acoustic Scene Classification with Efficient Design
Byeonggeun Kim, Seunghan Yang, Jangho Kim and Simyung Chang
Qualcomm AI Research, Qualcomm Korea YH, Seoul, Korea
Abstract
This technical report describes the details of our TASK1A submission of the DCASE2021 challenge. The goal of the task is to design an audio scene classification system for device-imbalanced datasets under the constraints of model complexity. This report introduces four methods to achieve the goal. First, we propose Residual Normalization, a novel feature normalization method that uses instance normalization with a shortcut path to discard unnecessary device- specific information without losing useful information for classification. Second, we design an efficient architecture, BC-ResNet- Mod, a modified version of the baseline architecture with a limited receptive field. Third, we exploit spectrogram-to-spectrogram translation from one to multiple devices to augment training data. Finally, we utilize three model compression schemes: pruning, quantization, and knowledge distillation to reduce model complexity. The proposed system achieves an average test accuracy of 76.3% in TAU Urban Acoustic Scenes 2020 Mobile, development dataset with 315k parameters, and average test accuracy of 75.3% after compression to 62kB of non-zero parameters.
System characteristics
Sampling rate | 16kHz |
Data augmentation | mixup, specaugment, time rolling |
Features | log-mel energies |
Classifier | CNN, BC-ResNet |
Decision making | maximum likelihood |
Complexity management | weight quantization, pruning, knowledge distillation |
Cpjku Submission to Dcase21: Cross-Device Audio Scene Classification with Wide Sparse Frequency-Damped CNNs
Khaled Koutini1, Schlüter Jan2 and Gerhard Widmer2
1Computational Perception (CP), Johannes Kepler University (JKU) Linz, Linz, Austria, 2Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria
Koutini_CPJKU_task1a_1 Koutini_CPJKU_task1a_2 Koutini_CPJKU_task1a_3 Koutini_CPJKU_task1a_4
Cpjku Submission to Dcase21: Cross-Device Audio Scene Classification with Wide Sparse Frequency-Damped CNNs
Khaled Koutini1, Schlüter Jan2 and Gerhard Widmer2
1Computational Perception (CP), Johannes Kepler University (JKU) Linz, Linz, Austria, 2Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria
Abstract
We describe the CP-JKU team's submission for Task 1A Low- Complexity Acoustic Scene Classification with Multiple Devices of the DCASE2021 Challenge. We use Receptive Field (RF) regularized Convolutional Neural Network (CNN) with Frequency Damping as a baseline. We investigate widening the convolutional layers without increasing the number of parameters by grouping and pruning. We apply iterative magnitude pruning to sparsify the weights of the models. Additionally, We investigate an adversarial domain adaptation approach.
System characteristics
Sampling rate | 22.05kHz |
Data augmentation | mixup, pitch shifting |
Features | Perceptually-weighted log-mel energies |
Classifier | RF-regularized CNNs |
Complexity management | float16, sparsity |
CAU-ET Submission to DCASE 2021: Light-Efficientnet for Acoustic Scene Classification
Soyoung Lim1, Yerin Lee1 and Il-Youp Kwak2
1Statistics Dept., Chung-Ang University, Seoul, South Korea, 2Department of Applied Statistics, Chung-Ang University, Seoul, South Korea
Lim_CAU_task1a_1 Lim_CAU_task1a_2 Lim_CAU_task1a_3 Lim_CAU_task1a_4
CAU-ET Submission to DCASE 2021: Light-Efficientnet for Acoustic Scene Classification
Soyoung Lim1, Yerin Lee1 and Il-Youp Kwak2
1Statistics Dept., Chung-Ang University, Seoul, South Korea, 2Department of Applied Statistics, Chung-Ang University, Seoul, South Korea
Abstract
Acoustic scene classification (ASC) categorizes an audio file based on the environment in which it has been recorded. This has long been studied in the detection and classification of acoustic scenes and events (DCASE). We presents the solution to Task 1 A (Low- Complexity Acoustic Scene Classification with Multiple Devices) of the DCASE 2021 challenge submitted by the Chung-Ang University team. We proposed light-efficientnet model with 3 scaling factors: width, depth, resolution. Additionally, we used lightweight deep learning techniques such as pruning and quantization.
System characteristics
Sampling rate | 44.1kHz |
Features | spectrogram |
Classifier | CNN |
Complexity management | weight quantization, sparsity |
DCASE 2021 Task 1 Subtask A: Low-Complexity Acoustic Scene Classification
Yingzi Liu1, Jiangnan Liang1, Luojun Zhao2, Jia Liu2, Kexin Zhao2, Weiyu Liu2, Long Zhang2, Tanyue Xu2 and Chuang Shi1
1School of imformation and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China, 2University of Electronic Science and Technology of China, Chengdu,China
Liu_UESTC_task1a_1 Liu_UESTC_task1a_2 Liu_UESTC_task1a_3 Liu_UESTC_task1a_4
DCASE 2021 Task 1 Subtask A: Low-Complexity Acoustic Scene Classification
Yingzi Liu1, Jiangnan Liang1, Luojun Zhao2, Jia Liu2, Kexin Zhao2, Weiyu Liu2, Long Zhang2, Tanyue Xu2 and Chuang Shi1
1School of imformation and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China, 2University of Electronic Science and Technology of China, Chengdu,China
Abstract
This technical report describes the systems for the task 1/subtask A of the DCASE 2021 challenge. In order to reduce the number of model parameters, we add the feature reuse units to the deep residual network. Also the one-bit-per-weight convolution layer are used in this paper. The log-mel spectrograms, delta features and delta-delta features are extracted to train the acoustic scene classification model. The HRTF and spectrum correction are used to augment the acoustic features. Our system achieves higher classification accuracies and lower log loss in the development dataset than baseline system.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | HRTF,mixup,temporal cropping,spectrum correction; mixup,temporal cropping; mixup |
Features | log-mel energies,deltas,delta-deltas; log-mel energies |
Classifier | ResNet; CNN |
Complexity management | 1-bit quantization,FR_unit; 1-bit quantization; weight quantization |
Wavelet Based Mel Scaled Representation for Low Complexity ASC with Multiple Devices
Aswathy Madhu1 and Suresh K2
1Electronics & Communication, College of Engineering Trivandrum, Thiruvananthapuram, Kerala, India, 2Electronics & Communication, Govt. Engineering College, Barton Hill, Thiruvananthapuram, Kerala, India
Madhu_CET_task1a_1
Wavelet Based Mel Scaled Representation for Low Complexity ASC with Multiple Devices
Aswathy Madhu1 and Suresh K2
1Electronics & Communication, College of Engineering Trivandrum, Thiruvananthapuram, Kerala, India, 2Electronics & Communication, Govt. Engineering College, Barton Hill, Thiruvananthapuram, Kerala, India
Abstract
This technical report presents our submission to the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 for Task1 (Acoustic Scene Classification), subtask A (Low-Complexity Acoustic Scene Classification with Multiple Devices). The proposed system is a simple state-of-the- art approach employing wavelet based mel scaled representation for acoustic signals and a CNN classifier. We use data augmentation to handle device mismatch and post training quantization of network weights to enforce low complexity in terms of model size. The submitted system surpasses the baseline system utilizing CNN developed for this subtask.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | time stretching, pitch shifting, dynamic range compression, background noise, mixup |
Features | wavelet based log-mel energies |
Classifier | CNN |
Complexity management | weight quantization |
Task 1A DCASE 2021: Acoustic Scene Classification with Mismatch-Devices Using Squeeze-Excitation Technique and Low-Complexity Constraint
Javier Naranjo-Alcazar1,2, Sergi Perez-Castanos1, Maximo Cobos1, Francesc J. Ferri1 and Pedro Zuccarello2
1Computer Science, Universitat de Valencia, Burjassot, Spain, 2Intituto Tecnológico de Informática, Valencia, Spain
Naranjo-Alcazar_ITI_task1a_1
Task 1A DCASE 2021: Acoustic Scene Classification with Mismatch-Devices Using Squeeze-Excitation Technique and Low-Complexity Constraint
Javier Naranjo-Alcazar1,2, Sergi Perez-Castanos1, Maximo Cobos1, Francesc J. Ferri1 and Pedro Zuccarello2
1Computer Science, Universitat de Valencia, Burjassot, Spain, 2Intituto Tecnológico de Informática, Valencia, Spain
Abstract
Acoustic scene classification (ASC) is one of the most popular problems in the field of machine listening. The objective of this problem is to classify an audio clip into one of the predefined scenes using only the audio data. This problem has considerably progressed over the years in the different editions of DCASE. It usually has several subtasks that allow to tackle this problem with different approaches. The subtask presented in this report corresponds to a ASC problem that is constrained by the complexity of the model as well as having audio recorded from different devices, known as mismatch devices (real and simulated). The work presented in this report follows the research line carried out by the team in previous years. Specifically, a system based on two steps is proposed: a two-dimensional representation of the audio using the Gamamtone filter bank and a convolutional neural network using squeeze-excitation techniques. The presented system outperforms the baseline by about 17 percentage points.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup |
Features | gammatone spectrogram |
Classifier | CNN |
Complexity management | weight quantization, tflite, float16 |
DCASE 2021 Task 1A: Technique Report
Lam Pham1, Alexander Schindler1, Hieu Tang2 and Truong Hoang3
1Center for Digital Safety & Security, Austrian Institute of Technology, Vienna, Austria, 2Department of Electronic and Electrical Engineering, Hongik University, Korea, 3FPT company, Ho Chi Minh, Vietnam
Pham_AIT_task1a_1 Pham_AIT_task1a_2 Pham_AIT_task1a_3
DCASE 2021 Task 1A: Technique Report
Lam Pham1, Alexander Schindler1, Hieu Tang2 and Truong Hoang3
1Center for Digital Safety & Security, Austrian Institute of Technology, Vienna, Austria, 2Department of Electronic and Electrical Engineering, Hongik University, Korea, 3FPT company, Ho Chi Minh, Vietnam
Abstract
In this report, we presents a low-complexity deep learning frameworks for acoustic scene classification (ASC). The proposed framework can be separated into three main steps: Front-end spectrogram extraction, back-end classification, and late fusion of predicted probabilities. In the first step, we use Mel filter, Gammatone filter and Constant Q Transform (CQT) to transform draw audio signal into spectrograms. Three spectrograms are then feed into three individual back- end convolutional neural networks (CNNs) for classification. Finally, a late fusion of three predicted probabilities obtained from three CNNs is conducted to achieve the final classification result. To reduce the complexity of CNN network architecture proposed, we apply two model compression techniques: model restriction and decomposed convolution. Our experiments, which are conducted on DCASE 2021 Task 1A development dataset, achieve a low-complexity CNN based framework with 128 KB trainable parameters and the best classification accuracy of 66.7%, improving DCASE baseline by 19.0%.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup |
Features | CQT, Gammatonegram, log-mel energies |
Classifier | CNN |
Decision making | PROD late fusion |
Complexity management | channel restriction and decomposed convolution |
DCASE 2021 Task 1 Subtask A: Low-Complexity Acoustic Scene Classification
Duc Phan and Douglas Jones
ECE, University of Illinois, Urban-Champaign, Illinois, US
Phan_UIUC_task1a_1 Phan_UIUC_task1a_2 Phan_UIUC_task1a_3 Phan_UIUC_task1a_4
DCASE 2021 Task 1 Subtask A: Low-Complexity Acoustic Scene Classification
Duc Phan and Douglas Jones
ECE, University of Illinois, Urban-Champaign, Illinois, US
Abstract
Decomposing 2D convolution into time and frequency separable 1D convolutions produces a low-complexity neural network with good performance for acoustic scene classification. The final proposed network has roughly 41K parameters with a size of 75KB. It significantly outperforms the DCASE 2021 baseline network [1], with an accuracy of 64 percent on the development dataset[2].
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup |
Features | log-mel energies, deltas, delta-deltas |
Classifier | CNN |
Complexity management | weight quantization, depthwise separable convolutions |
Separable Convolutions and Test-Time Augmentations for Low-Complexity and Calibrated Acoustic Scene Classification
Gilles Puy, Himalaya Jain and Andrei Bursuc
valeo.ai, Paris, France
Abstract
This report details the architecture we used to address Task 1a of the of DCASE2021 challenge. Our architecture is based on 4 layer convolutional neural network taking as input a log-mel spectrogram. The complexity of this network is controlled by using separable convolutions in the channel, time and frequency dimensions. We train different models to investigate the benefit of mixup, focal loss and test time augmentations in improving the performance of the system.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | SpecAugment; SpecAugment, mixup |
Features | log-mel energies |
Classifier | CNN |
Decision making | average |
Complexity management | weight quantization |
Acoustic Scene Classification Model Based on Two Parallel Residual Networks
Ziling Qiao, Hongxia Dong, Xichang Cai and Menglong Wu
Electronic and Communication Engineering, North China University of Technology, Beijing, China
Qiao_NCUT_task1a_1
Acoustic Scene Classification Model Based on Two Parallel Residual Networks
Ziling Qiao, Hongxia Dong, Xichang Cai and Menglong Wu
Electronic and Communication Engineering, North China University of Technology, Beijing, China
Abstract
This technical report describes our submission for task1a of dcase2021 challenge. We calculated 128 log-mel energies under the original sampling rate of 44.1KHz for each time slice by taking 2048 FFT points with 50% overlap. Additionally, deltas and delta- deltas were calculated from the log Mel spectrogram and stacked into the channel axis. The resulting spectrograms were of size 128 frequency bins, 423 time samples and 3 channels with each representing log-mel spectrograms, its delta features and its delta-delta features respectively. Then, the three channel feature map is divided into 0-64 and 64-128 Mel bins on the frequency axis, and the high and low frequency features are input into the two parallel residual networks with identical residual blocks and convolutional residual blocks for training, and then the two network models are concatenate on the channel axis. Finally, after 1 ×1 convolution and global average pooling, the classification results are obtained through softmax output.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup |
Features | log-mel energies, deltas, delta-deltas |
Classifier | ResNet ensemble |
Decision making | average |
Complexity management | weight quantization |
Mobilenet Using Coordinate Attention and Fusions for Low-Complexity Acoustic Scene Classification with Multiple Devices
Soonshin Seo and Ji-Hwan Kim
Dept. of Computer Science and Engineering, Sogang University, Seoul, Repulic of Korea
Seo_SGU_task1a_1 Seo_SGU_task1a_2 Seo_SGU_task1a_3 Seo_SGU_task1a_4
Mobilenet Using Coordinate Attention and Fusions for Low-Complexity Acoustic Scene Classification with Multiple Devices
Soonshin Seo and Ji-Hwan Kim
Dept. of Computer Science and Engineering, Sogang University, Seoul, Repulic of Korea
Abstract
In this technical report, we describe our acoustic scene classification methods submitted to detection and classification of acoustic scenes and events challenge 2021 task 1a. We extracted the log- Mel filter bank features with delta and delta-delta from the acoustic signals and applied normalization. A total of 6 data augmentations were applied as follows: mixup, spectrum augmentation, spectrum correction, pitch shift, speed change, and mix audios. In addition, we designed MobileNet using coordinate attention and fusions. Inspired by MobileNetV2, inverted residuals and linear bottlenecks are adapted for mobile blocks of the proposed MobileNet. We applied coordinate attention and early/late fusion methods after mobile blocks. In addition, we reduced the model size by applying weight quantization to the trained model. Experiments were conducted on the cross-validation setup of the official development set. We confirmed that our model achieved a log- loss of 1.040 and an accuracy of 72.6% within the 128 KB model size.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, spectrum augmentation, spectrum correction, pitch shifting, speed change, mix audios |
Features | log-mel energies, deltas, delta-deltas |
Classifier | MobileNet |
Complexity management | weight quantization |
Pruning and Quantization for Low-Complexity Acoustic Scene Classification
Arshdeep Singh1, Dhanunjaya Varma Devalraju2 and Padmanabhan Rajan2
1SCEE, Indian institute of technology, Mandi, Mandi, India, 2School of Computing and Electrical engineering, Indian institute of technology, Mandi, Mandi, India
Singh_IITMandi_task1a_1 Singh_IITMandi_task1a_2 Singh_IITMandi_task1a_3 Singh_IITMandi_task1a_4
Pruning and Quantization for Low-Complexity Acoustic Scene Classification
Arshdeep Singh1, Dhanunjaya Varma Devalraju2 and Padmanabhan Rajan2
1SCEE, Indian institute of technology, Mandi, Mandi, India, 2School of Computing and Electrical engineering, Indian institute of technology, Mandi, Mandi, India
Abstract
This technical report describes the IITMandi AudioTeam’s submission for DCASE 2021 ASC Task 1, Subtask Low-Complexity Acoustic Scene Classification with Multiple Devices. This report aims to design low-complexity systems for acoustic scene classification by eliminating filters in a pre-trained convolution neural network. A filter pruning strategy is opted, which consists of three steps. Step 1 aims to identify the redundant filters which have low- norm. Step 2 explicitly removes the redundant filters and their connecting feature maps from the unpruned network to give a pruned network. Step 3 involves fine-tuning of the pruned network to regain performance. Further, the trained parameters are quantized to 16- bit. On DCASE-2021 task 1A development dataset, the proposed framework reduces 68% parameters with competitive performance
System characteristics
Sampling rate | 44.1kHz |
Features | log-mel energies |
Classifier | CNN |
Complexity management | Filter pruning and quantization |
Ensemble of Simple Resnets with Various Mel-Spectrum Time-Frequency Resolutions for Acoustic Scene Classifications
Reiko Sugahara, Masatoshi Osawa and Ryo Sato
RION CO., LTD., Tokyo, Japan
Sugahara_RION_task1a_1 Sugahara_RION_task1a_2 Sugahara_RION_task1a_3 Sugahara_RION_task1a_4
Ensemble of Simple Resnets with Various Mel-Spectrum Time-Frequency Resolutions for Acoustic Scene Classifications
Reiko Sugahara, Masatoshi Osawa and Ryo Sato
RION CO., LTD., Tokyo, Japan
Abstract
This technical report describes procedure for Task 1A in DCASE 2021[1][2]. Our method adopts ResNet-based models with a mel spectrogram as input. The accuracy was improved by the ensemble of ResNet-based simple models with various mel-spectrum time- frequency resolution. Data augmentations such as mixup, SpecAugment, time-shifting, and spectrum modulate were applied to prevent overfitting. The size of the model was reduced by quantization and pruning. Accordingly, the accuracy of our system was achieved 70.1% with 95 KB for the development set.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, SpecAugment, time-shifting, spectrum modulation |
Features | log-mel powers |
Classifier | ResNet, ensemble |
Decision making | weighted score average; score average |
Complexity management | weight quantization, pruning |
Low-Complexity Acoustic Scene Classification Using Mobile Inverted Bottleneck Blocks
Sergey Verbitskiy and Viacheslav Vyshegorodtsev
Deepsound, Novosibirsk, Russia
Verbitskiy_DS_task1a_1 Verbitskiy_DS_task1a_2 Verbitskiy_DS_task1a_3 Verbitskiy_DS_task1a_4
Low-Complexity Acoustic Scene Classification Using Mobile Inverted Bottleneck Blocks
Sergey Verbitskiy and Viacheslav Vyshegorodtsev
Deepsound, Novosibirsk, Russia
Abstract
This technical report describes our approaches for Task 1A (Low- Complexity Acoustic Scene Classification with Multiple Devices) of the DCASE 2021 Challenge. We propose a new architecture with mobile inverted bottleneck blocks (Fused-MBConv and MBConv) for acoustic scene classification tasks. This architecture is based on EfficientNetV2. Our models have a very small number of parameters. We also use several data augmentation techniques during the training of models. Our best model has 62,346 non-zero parameters and achieves a classification macro-average accuracy of 70.5% and an average multiclass cross-entropy (log loss) of 0.848 on the development dataset. The resulting model size is 121.8 KB (the model parameters are quantized to float16 after the training).
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, temporal cropping, SpecAugment |
Features | log-mel energies |
Classifier | CNN, EfficientNetV2 |
Complexity management | weight quantization |
A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification
Chao-Han Huck Yang1, Hu Hu1, Sabato Marco Siniscalchi2, Qing Wang3, Wang Yuyang3, Xianjun Xia4, Yuanjun Zhao4, Yuzhong Wu4, Yannan Wang4, Jun Du3 and Chin-Hui Lee1
1School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA, 2Kore University of Enna, Italy, 3University of Science and Technology of China, HeFei, China, 4Tencent Media Lab, Shenzhen, China
Yang_GT_task1a_1 Yang_GT_task1a_2 Yang_GT_task1a_3 Yang_GT_task1a_4
A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification
Chao-Han Huck Yang1, Hu Hu1, Sabato Marco Siniscalchi2, Qing Wang3, Wang Yuyang3, Xianjun Xia4, Yuanjun Zhao4, Yuzhong Wu4, Yannan Wang4, Jun Du3 and Chin-Hui Lee1
1School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA, 2Kore University of Enna, Italy, 3University of Science and Technology of China, HeFei, China, 4Tencent Media Lab, Shenzhen, China
Abstract
We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC). Specifically, we tackle the ASC task in a low-resource environment leveraging a recently proposed advanced neural network pruning mechanism, namely Lottery Ticket Hypothesis (LTH), to find a sub-network neural model associated with a small amount non-zero model parameters. The effectiveness of LTH for low-complexity acoustic modeling is assessed by investigating various data augmentation and compression schemes, and we report an efficient joint framework for low-complexity multi-device ASC, called Acoustic Lottery. Acoustic Lottery could compress an ASC model up to 1/104 and attain a superior performance (validation accuracy of 74.01% and Log loss of 0.76) compared to its not compressed seed model. All results reported in this work are based on a joint effort of four groups, namely GT-USTC-UKE-Tencent, aiming to address the 'Low-Complexity Acoustic Scene Classification (ASC) with Multiple Devices' in the DCASE 2021 Challenge Task 1a.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, random cropping, channel confusion, SpecAugment, spectrum correction, reverberation-drc, pitch shifting, speed change, random noise, mix audios |
Features | log-mel energies |
Classifier | Inception |
Decision making | average |
Complexity management | weight quantization, LTH pruning, teacher-student learning |
Low-Complexity Acoustic Scene Classification with Multiple Devices
Chen Yihao, Liu Min and Xu Minqiang
SpeakIn Technology, Shanghai, China
Yihao_speakin_task1a_1 Yihao_speakin_task1a_2 Yihao_speakin_task1a_3
Low-Complexity Acoustic Scene Classification with Multiple Devices
Chen Yihao, Liu Min and Xu Minqiang
SpeakIn Technology, Shanghai, China
Abstract
This report describes our submission to the Task1 Acoustic Scene Classification in the Dcase 2021 challenge. Final submission includes 4 results based on ResNet and SEResNet architectures. We perform several analysis of different backbones and also do experiments to confirm whether the pooling layer is needed. Due to the lack of training data, we try a variety of data enhancement methods including specaugment[1], cutout[2], audio acceleration and deceleration. To meet the requirement of model size, we also do pruning to the models.
System characteristics
Sampling rate | 16kHz |
Data augmentation | SpecAugment |
Features | log-mel energies |
Classifier | CNN |
Complexity management | sparsity |
DCASE 2021 Challenge Task1a Technical Report
Jiawang Zhang1, Shengchen Li2 and Bilei Zhu3
1AI-Lab Speech & Audio Team, Beijing University of Posts and Telecommunications & ByteDance, Shanghai, China, 2Xi’an Jiaotong-liverpool University, Suzhou, China, 3AI-Lab Speech & Audio Team, ByteDance, Shanghai, China
Zhang_BUPT&BYTEDANCE_task1a_1 Zhang_BUPT&BYTEDANCE_task1a_2 Zhang_BUPT&BYTEDANCE_task1a_3 Zhang_BUPT&BYTEDANCE_task1a_4
DCASE 2021 Challenge Task1a Technical Report
Jiawang Zhang1, Shengchen Li2 and Bilei Zhu3
1AI-Lab Speech & Audio Team, Beijing University of Posts and Telecommunications & ByteDance, Shanghai, China, 2Xi’an Jiaotong-liverpool University, Suzhou, China, 3AI-Lab Speech & Audio Team, ByteDance, Shanghai, China
Abstract
This report describes our method for Task 1a (Low-Complexity Acoustic Scene Classification with Multiple Devices) of the DCASE 2021 challenge. The task targets low complexity solutions for the classification problem. This report uses Residual Network (ResNet) model and uses Log Mel Spectrogram to process features. To compress system complexity, this report uses Post Training Static Quantization. Post Training Static Quantization are used to do the 8-bits quantization, this method can reduce the model size by four times. The accuracy of the method proposed in this report on the development data set is 73%, which is 25% higher than the baseline.
System characteristics
Sampling rate | 44.1kHz |
Features | log-mel energies |
Classifier | ResNet |
Complexity management | weight quantization |
Low-Complexity Acoustic Scene Classification Using Knowledge Distillation and Multiple Classifiers
Na Zhao
Algorithm, Maxvision, Wuhan, China
Zhao_Maxvision_task1a_1 Zhao_Maxvision_task1a_2 Zhao_Maxvision_task1a_3 Zhao_Maxvision_task1a_4
Low-Complexity Acoustic Scene Classification Using Knowledge Distillation and Multiple Classifiers
Na Zhao
Algorithm, Maxvision, Wuhan, China
Abstract
This technical report describes our submission for Task1a of DCASE2021 challenge. Based on the small-size Mobnet[1] of Tencent team in Dcase2020 task1b, we build our baseline model with only one frequency branch and two classifiers. The two classifiers are ten-class classifier and three-class classifier respectively, and they jointly optimize the baseline model. Due to the limitation of model size, we first train a high-accuracy large- size model, and then use distillation method to transfer the knowledge from the large-size model to our baseline model. The final system is quantified from 32-bit float-point to 16-bit float- point.We achieved an accuracy of 59.9% with a model size smaller than 128KB.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, random cropping |
Features | log-mel energies, deltas, delta-deltas |
Classifier | MobileNet |
Decision making | model weights average |
Complexity management | weight quantization |