Task description
This subtask is concerned with the classification of data from multiple devices (real and simulated) targeting the generalization properties of systems across a number of different devices.
The development dataset consists of recordings from 10 European cities using 9 different devices: 3 real devices (A, B, C) and 6 simulated devices (S1-S6). Data from devices B, C, and S1-S6 consists of randomly selected segments from the simultaneous recordings, therefore all overlap with the data from device A, but not necessarily with each other. The total amount of audio in the development set is 64 hours.
The evaluation dataset contains data from 12 cities, 10 acoustic scenes, 11 devices. There are five new devices (not available in the development set): real device D and simulated devices S7-S11. Evaluation data contains 33 hours of audio.
The device A consists in a Soundman OKM II Klassik/studio A3, electret binaural microphone and a Zoom F8 audio recorder using 48kHz sampling rate and 24-bit resolution. The other devices are commonly available customer devices: device B is a Samsung Galaxy S7, device C is iPhone SE, and device D is a GoPro Hero5 Session.
More detailed task description can be found in the task description page
Systems ranking
Submission information | Evaluation dataset | Development dataset | ||||||
---|---|---|---|---|---|---|---|---|
Rank | Submission label | Name |
Technical Report |
Official system rank |
Accuracy with 95% confidence interval (Evaluation dataset) |
Logloss (Evaluation dataset) | Accuracy (Development dataset) | Logloss (Development dataset) |
Abbasi_ARI_task1a_1 | 1a_CNN | Abbasi2020 | 78 | 59.7 (58.8 - 60.6) | 1.099 | 61.6 | 1.071 | |
Abbasi_ARI_task1a_2 | 1a_CNN | Abbasi2020 | 76 | 60.6 (59.7 - 61.5) | 1.063 | 62.1 | ||
Cao_JNU_task1a_1 | CaoJNU1 | Fei2020 | 63 | 65.7 (64.9 - 66.6) | 1.265 | 68.3 | 1.186 | |
Cao_JNU_task1a_2 | CaoJNU2 | Fei2020 | 64 | 65.7 (64.8 - 66.5) | 1.259 | 68.9 | 1.163 | |
Cao_JNU_task1a_3 | CaoJNU3 | Fei2020 | 61 | 66.0 (65.1 - 66.8) | 1.268 | 68.7 | 1.202 | |
Cao_JNU_task1a_4 | CaoJNU4 | Fei2020 | 62 | 65.9 (65.1 - 66.8) | 1.267 | 69.2 | 1.171 | |
FanVaf__task1a_1 | CRNN_4kHz | Fanioudakis2020 | 72 | 63.4 (62.5 - 64.2) | 1.106 | 65.4 | 2.070 | |
FanVaf__task1a_2 | CRNN_8kHz | Fanioudakis2020 | 75 | 60.7 (59.9 - 61.6) | 1.142 | 62.8 | ||
FanVaf__task1a_3 | CRNN_ens | Fanioudakis2020 | 66 | 64.8 (63.9 - 65.6) | 1.298 | 67.4 | ||
FanVaf__task1a_4 | CRNN_ens | Fanioudakis2020 | 54 | 67.5 (66.6 - 68.3) | 1.240 | |||
Gao_UNISA_task1a_1 | Baseline | Gao2020 | 9 | 75.0 (74.3 - 75.8) | 1.225 | 71.7 | ||
Gao_UNISA_task1a_2 | focal_ls | Gao2020 | 12 | 74.1 (73.3 - 74.9) | 1.242 | 71.8 | ||
Gao_UNISA_task1a_3 | da | Gao2020 | 11 | 74.7 (73.9 - 75.5) | 1.231 | 71.4 | ||
Gao_UNISA_task1a_4 | ensemble | Gao2020 | 8 | 75.2 (74.4 - 76.0) | 1.230 | 72.5 | ||
DCASE2020 baseline | Baseline | 51.4 (50.5 - 52.3) | 1.902 | 51.6 | 1.405 | |||
Helin_ADSPLAB_task1a_1 | Helin1 | Wang2020_t1 | 14 | 73.4 (72.6 - 74.2) | 0.850 | 84.2 | 0.569 | |
Helin_ADSPLAB_task1a_2 | Helin2 | Wang2020_t1 | 49 | 68.4 (67.6 - 69.3) | 0.991 | 81.8 | 0.694 | |
Helin_ADSPLAB_task1a_3 | Helin3 | Wang2020_t1 | 18 | 73.1 (72.3 - 73.9) | 0.889 | 84.5 | 0.611 | |
Helin_ADSPLAB_task1a_4 | Helin4 | Wang2020_t1 | 24 | 72.3 (71.5 - 73.1) | 0.899 | 84.2 | 0.601 | |
Hu_GT_task1a_1 | Hu_GT_1a_1 | Hu2020 | 6 | 75.7 (74.9 - 76.4) | 0.924 | |||
Hu_GT_task1a_2 | Hu_GT_1a_2 | Hu2020 | 4 | 75.9 (75.1 - 76.7) | 0.895 | 81.9 | 0.936 | |
Hu_GT_task1a_3 | Hu_GT_1a_3 | Hu2020 | 3 | 76.2 (75.4 - 77.0) | 0.898 | |||
Hu_GT_task1a_4 | Hu_GT_1a_4 | Hu2020 | 5 | 75.8 (75.0 - 76.5) | 0.900 | |||
JHKim_IVS_task1a_1 | EF5+SFA | Kim2020_t1 | 55 | 67.3 (66.5 - 68.2) | 5.219 | 70.1 | 0.013 | |
JHKim_IVS_task1a_2 | EF2+SFA | Kim2020_t1 | 60 | 66.2 (65.3 - 67.0) | 4.766 | 68.6 | 0.019 | |
Jie_Maxvision_task1a_1 | maxvision | Jie2020 | 10 | 75.0 (74.3 - 75.8) | 1.209 | 72.1 | 1.370 | |
Kim_SGU_task1a_1 | 5ch_m_2 | Changmin2020 | 33 | 71.6 (70.8 - 72.4) | 1.309 | 72.7 | 1.307 | |
Kim_SGU_task1a_2 | 7ch_m_2 | Changmin2020 | 38 | 70.7 (69.9 - 71.6) | 1.304 | 71.0 | 1.301 | |
Kim_SGU_task1a_3 | 7ch_m_4 | Changmin2020 | 39 | 70.7 (69.8 - 71.5) | 1.412 | 72.2 | 1.408 | |
Kim_SGU_task1a_4 | 9ch_2 | Changmin2020 | 57 | 66.4 (65.6 - 67.3) | 1.428 | 71.7 | 1.292 | |
Koutini_CPJKU_task1a_1 | fdamp | Koutini2020 | 29 | 71.9 (71.1 - 72.7) | 0.800 | 71.0 | 0.820 | |
Koutini_CPJKU_task1a_2 | FDswd | Koutini2020 | 32 | 71.6 (70.8 - 72.4) | 0.862 | 72.5 | 0.820 | |
Koutini_CPJKU_task1a_3 | ensemble | Koutini2020 | 13 | 73.6 (72.8 - 74.4) | 0.796 | 73.3 | 0.820 | |
Koutini_CPJKU_task1a_4 | DAensem | Koutini2020 | 15 | 73.4 (72.6 - 74.2) | 0.814 | 73.0 | 0.820 | |
Lee_CAU_task1a_1 | CAUET | Lee2020 | 47 | 69.2 (68.3 - 70.0) | 0.885 | 67.1 | 0.939 | |
Lee_CAU_task1a_2 | CAUET | Lee2020 | 41 | 69.6 (68.8 - 70.5) | 0.859 | 67.1 | 0.939 | |
Lee_CAU_task1a_3 | CAUET | Lee2020 | 27 | 72.0 (71.2 - 72.8) | 0.944 | 67.1 | 0.939 | |
Lee_CAU_task1a_4 | CAUET | Lee2020 | 20 | 72.9 (72.1 - 73.7) | 0.919 | 67.1 | 0.939 | |
Lee_GU_task1a_1 | PRML | Aryal2020 | 81 | 55.9 (55.0 - 56.8) | 1.969 | 59.6 | ||
Lee_GU_task1a_2 | PRML | Aryal2020 | 85 | 55.6 (54.7 - 56.5) | 1.818 | 59.6 | ||
Lee_GU_task1a_3 | PRML | Aryal2020 | 84 | 55.6 (54.7 - 56.5) | 2.987 | 59.6 | ||
Lee_GU_task1a_4 | PRML | Aryal2020 | 86 | 54.9 (54.1 - 55.8) | 2.847 | 59.6 | ||
Liu_SHNU_task1a_1 | ResNet | Liu2020 | 45 | 69.3 (68.5 - 70.1) | 1.396 | 70.2 | 0.939 | |
Liu_SHNU_task1a_2 | CNN-9 | Liu2020 | 50 | 68.0 (67.2 - 68.9) | 4.510 | 68.8 | 1.792 | |
Liu_SHNU_task1a_3 | E-T | Liu2020 | 83 | 55.7 (54.8 - 56.6) | 9.403 | 58.1 | ||
Liu_SHNU_task1a_4 | Fusion | Liu2020 | 26 | 72.0 (71.2 - 72.8) | 3.165 | 73.1 | ||
Liu_UESTC_task1a_1 | Averag_8 | Liu2020a | 16 | 73.2 (72.4 - 74.0) | 1.305 | 68.4 | 1.362 | |
Liu_UESTC_task1a_2 | Averag_18 | Liu2020a | 23 | 72.4 (71.6 - 73.2) | 1.303 | 69.0 | 1.367 | |
Liu_UESTC_task1a_3 | Rforest_8 | Liu2020a | 21 | 72.5 (71.7 - 73.3) | 0.755 | 68.6 | 0.841 | |
Liu_UESTC_task1a_4 | Rforest_18 | Liu2020a | 28 | 72.0 (71.2 - 72.8) | 0.767 | 68.4 | 0.839 | |
Lopez-Meyer_IL_task1a_1 | CNNensem | Lopez-Meyer2020_t1a | 68 | 64.3 (63.4 - 65.1) | 5.268 | 68.8 | ||
Lopez-Meyer_IL_task1a_2 | CNNensem | Lopez-Meyer2020_t1a | 70 | 64.1 (63.3 - 65.0) | 11.870 | 69.3 | ||
Lu_INTC_task1a_1 | city_cv | Hong2020 | 36 | 71.2 (70.4 - 72.0) | 0.809 | |||
Lu_INTC_task1a_2 | resnext | Hong2020 | 69 | 64.1 (63.3 - 65.0) | 1.383 | |||
Lu_INTC_task1a_3 | 2resnext | Hong2020 | 58 | 66.4 (65.5 - 67.2) | 1.192 | |||
Lu_INTC_task1a_4 | all | Hong2020 | 35 | 71.2 (70.4 - 72.1) | 0.806 | |||
Monteiro_INRS_task1a_1 | preResnet | Joao2020 | 74 | 61.7 (60.8 - 62.6) | 5.936 | |||
Monteiro_INRS_task1a_2 | TDNN | Joao2020 | 82 | 55.9 (55.0 - 56.8) | 5.198 | |||
Monteiro_INRS_task1a_3 | ModResNet | Joao2020 | 88 | 50.8 (49.9 - 51.7) | 2.766 | |||
Monteiro_INRS_task1a_4 | FuseCNN | Joao2020 | 59 | 66.3 (65.5 - 67.2) | 2.226 | |||
Naranjo-Alcazar_Vfy_task1a_1 | ASCCSSE | Naranjo-Alcazar2020_t1 | 73 | 61.9 (61.0 - 62.7) | 1.246 | 65.1 | 1.120 | |
Naranjo-Alcazar_Vfy_task1a_2 | ASCCSSE | Naranjo-Alcazar2020_t1 | 77 | 59.7 (58.8 - 60.6) | 1.314 | 65.1 | 1.120 | |
Paniagua_UPM_task1a_1 | Pan_UPM | Paniagua2020 | 92 | 43.8 (42.9 - 44.7) | 2.053 | 57.1 | ||
Shim_UOS_task1a_1 | UOS_totens | Shim2020 | 31 | 71.7 (70.9 - 72.5) | 1.190 | 71.9 | ||
Shim_UOS_task1a_2 | UOS_rbfens | Shim2020 | 34 | 71.5 (70.7 - 72.4) | 0.897 | 71.0 | ||
Shim_UOS_task1a_3 | UOS_lcnn | Shim2020 | 48 | 68.5 (67.6 - 69.3) | 0.911 | 70.5 | ||
Shim_UOS_task1a_4 | UOS_trgasc | Shim2020 | 37 | 71.0 (70.2 - 71.8) | 0.945 | 68.8 | ||
Suh_ETRI_task1a_1 | TRN_Dev | Suh2020 | 22 | 72.5 (71.7 - 73.3) | 1.290 | 73.7 | 1.285 | |
Suh_ETRI_task1a_2 | TRN_Eval | Suh2020 | 7 | 75.5 (74.7 - 76.2) | 1.221 | 73.7 | 1.285 | |
Suh_ETRI_task1a_3 | TRN_Ensem | Suh2020 | 1 | 76.5 (75.8 - 77.3) | 1.219 | 74.2 | 1.289 | |
Suh_ETRI_task1a_4 | TRN_wEnsem | Suh2020 | 2 | 76.5 (75.7 - 77.2) | 1.219 | 74.4 | 1.288 | |
Swiecicki_NON_task1a_1 | b3_train | Swiecicki2020 | 56 | 67.1 (66.2 - 67.9) | 0.926 | 69.3 | 0.846 | |
Swiecicki_NON_task1a_2 | b3_all | Swiecicki2020 | 42 | 69.5 (68.7 - 70.3) | 0.851 | 69.3 | 0.846 | |
Swiecicki_NON_task1a_3 | b3_all_lr | Swiecicki2020 | 40 | 70.3 (69.4 - 71.1) | 0.970 | 68.9 | 0.973 | |
Swiecicki_NON_task1a_4 | b3_all_mix | Swiecicki2020 | 30 | 71.8 (71.0 - 72.7) | 0.793 | 71.9 | 0.790 | |
Vilouras_AUTh_task1a_1 | VilEnsemb1 | Vilouras2020 | 53 | 67.7 (66.8 - 68.5) | 0.929 | 68.1 | 0.908 | |
Vilouras_AUTh_task1a_2 | VilEnsemb2 | Vilouras2020 | 52 | 67.8 (67.0 - 68.7) | 0.931 | 69.2 | 0.890 | |
Vilouras_AUTh_task1a_3 | VilEnsemb3 | Vilouras2020 | 44 | 69.3 (68.5 - 70.1) | 0.883 | 70.3 | 0.872 | |
Waldekar_IITKGP_task1a_1 | MFDWC20 | Waldekar2020 | 79 | 58.4 (57.5 - 59.2) | 1.427 | 55.0 | ||
Wang_RoyalFlush_task1a_1 | RoyalFlush | Wang2020a | 80 | 56.7 (55.8 - 57.6) | 1.576 | 63.9 | 1.826 | |
Wang_RoyalFlush_task1a_2 | RoyalFlush | Wang2020a | 65 | 65.2 (64.3 - 66.0) | 1.294 | 62.9 | 1.586 | |
Wang_RoyalFlush_task1a_3 | RoyalFlush | Wang2020a | 71 | 64.0 (63.1 - 64.8) | 1.239 | 62.1 | 1.334 | |
Wang_RoyalFlush_task1a_4 | RoyalFlush | Wang2020a | 91 | 45.5 (44.6 - 46.4) | 5.880 | 62.7 | 1.712 | |
Wu_CUHK_task1a_1 | CNN_RCE | Wu2020_t1a | 67 | 64.7 (63.9 - 65.6) | 1.148 | 65.2 | ||
Wu_CUHK_task1a_2 | ensemble_4 | Wu2020_t1a | 46 | 69.3 (68.4 - 70.1) | 1.070 | 67.3 | ||
Wu_CUHK_task1a_3 | ensemble_5 | Wu2020_t1a | 51 | 67.9 (67.1 - 68.8) | 1.100 | 67.6 | ||
Wu_CUHK_task1a_4 | ensemble_9 | Wu2020_t1a | 43 | 69.4 (68.6 - 70.2) | 1.080 | 68.3 | ||
Zhang_THUEE_task1a_1 | THUEE | Shao2020 | 19 | 73.0 (72.2 - 73.8) | 1.963 | 75.0 | 0.791 | |
Zhang_THUEE_task1a_2 | THUEE | Shao2020 | 17 | 73.2 (72.4 - 74.0) | 1.967 | 75.0 | 0.789 | |
Zhang_THUEE_task1a_3 | THUEE | Shao2020 | 25 | 72.3 (71.5 - 73.1) | 1.958 | 74.3 | 0.824 | |
Zhang_UESTC_task1a_1 | N1 | Zhang2020 | 89 | 50.4 (49.5 - 51.3) | 1.899 | 57.4 | 1.275 | |
Zhang_UESTC_task1a_2 | N2 | Zhang2020 | 87 | 51.7 (50.8 - 52.6) | 1.805 | 56.1 | 1.297 | |
Zhang_UESTC_task1a_3 | N3 | Zhang2020 | 90 | 47.4 (46.5 - 48.3) | 2.068 | 53.7 | 1.344 |
Teams ranking
Table including only the best performing system per submitting team.
Submission information | Evaluation dataset | Development dataset | |||||||
---|---|---|---|---|---|---|---|---|---|
Rank | Submission label | Name |
Technical Report |
Official system rank |
Team rank |
Accuracy with 95% confidence interval (Evaluation dataset) |
Logloss (Evaluation dataset) | Accuracy (Development dataset) | Logloss (Development dataset) |
Abbasi_ARI_task1a_2 | 1a_CNN | Abbasi2020 | 76 | 24 | 60.6 (59.7 - 61.5) | 1.063 | 62.1 | ||
Cao_JNU_task1a_3 | CaoJNU3 | Fei2020 | 61 | 20 | 66.0 (65.1 - 66.8) | 1.268 | 68.7 | 1.202 | |
FanVaf__task1a_4 | CRNN_ens | Fanioudakis2020 | 54 | 17 | 67.5 (66.6 - 68.3) | 1.240 | |||
Gao_UNISA_task1a_4 | ensemble | Gao2020 | 8 | 3 | 75.2 (74.4 - 76.0) | 1.230 | 72.5 | ||
DCASE2020 baseline | Baseline | 51.4 (50.5 - 52.3) | 1.902 | 51.6 | 1.405 | ||||
Helin_ADSPLAB_task1a_1 | Helin1 | Wang2020_t1 | 14 | 6 | 73.4 (72.6 - 74.2) | 0.850 | 84.2 | 0.569 | |
Hu_GT_task1a_3 | Hu_GT_1a_3 | Hu2020 | 3 | 2 | 76.2 (75.4 - 77.0) | 0.898 | |||
JHKim_IVS_task1a_1 | EF5+SFA | Kim2020_t1 | 55 | 18 | 67.3 (66.5 - 68.2) | 5.219 | 70.1 | 0.013 | |
Jie_Maxvision_task1a_1 | maxvision | Jie2020 | 10 | 4 | 75.0 (74.3 - 75.8) | 1.209 | 72.1 | 1.370 | |
Kim_SGU_task1a_1 | 5ch_m_2 | Changmin2020 | 33 | 13 | 71.6 (70.8 - 72.4) | 1.309 | 72.7 | 1.307 | |
Koutini_CPJKU_task1a_3 | ensemble | Koutini2020 | 13 | 5 | 73.6 (72.8 - 74.4) | 0.796 | 73.3 | 0.820 | |
Lee_CAU_task1a_4 | CAUET | Lee2020 | 20 | 9 | 72.9 (72.1 - 73.7) | 0.919 | 67.1 | 0.939 | |
Lee_GU_task1a_1 | PRML | Aryal2020 | 81 | 26 | 55.9 (55.0 - 56.8) | 1.969 | 59.6 | ||
Liu_SHNU_task1a_4 | Fusion | Liu2020 | 26 | 10 | 72.0 (71.2 - 72.8) | 3.165 | 73.1 | ||
Liu_UESTC_task1a_1 | Averag_8 | Liu2020a | 16 | 7 | 73.2 (72.4 - 74.0) | 1.305 | 68.4 | 1.362 | |
Lopez-Meyer_IL_task1a_1 | CNNensem | Lopez-Meyer2020_t1a | 68 | 22 | 64.3 (63.4 - 65.1) | 5.268 | 68.8 | ||
Lu_INTC_task1a_4 | all | Hong2020 | 35 | 14 | 71.2 (70.4 - 72.1) | 0.806 | |||
Monteiro_INRS_task1a_4 | FuseCNN | Joao2020 | 59 | 19 | 66.3 (65.5 - 67.2) | 2.226 | |||
Naranjo-Alcazar_Vfy_task1a_1 | ASCCSSE | Naranjo-Alcazar2020_t1 | 73 | 23 | 61.9 (61.0 - 62.7) | 1.246 | 65.1 | 1.120 | |
Paniagua_UPM_task1a_1 | Pan_UPM | Paniagua2020 | 92 | 28 | 43.8 (42.9 - 44.7) | 2.053 | 57.1 | ||
Shim_UOS_task1a_1 | UOS_totens | Shim2020 | 31 | 12 | 71.7 (70.9 - 72.5) | 1.190 | 71.9 | ||
Suh_ETRI_task1a_3 | TRN_Ensem | Suh2020 | 1 | 1 | 76.5 (75.8 - 77.3) | 1.219 | 74.2 | 1.289 | |
Swiecicki_NON_task1a_4 | b3_all_mix | Swiecicki2020 | 30 | 11 | 71.8 (71.0 - 72.7) | 0.793 | 71.9 | 0.790 | |
Vilouras_AUTh_task1a_3 | VilEnsemb3 | Vilouras2020 | 44 | 16 | 69.3 (68.5 - 70.1) | 0.883 | 70.3 | 0.872 | |
Waldekar_IITKGP_task1a_1 | MFDWC20 | Waldekar2020 | 79 | 25 | 58.4 (57.5 - 59.2) | 1.427 | 55.0 | ||
Wang_RoyalFlush_task1a_2 | RoyalFlush | Wang2020a | 65 | 21 | 65.2 (64.3 - 66.0) | 1.294 | 62.9 | 1.586 | |
Wu_CUHK_task1a_4 | ensemble_9 | Wu2020_t1a | 43 | 15 | 69.4 (68.6 - 70.2) | 1.080 | 68.3 | ||
Zhang_THUEE_task1a_2 | THUEE | Shao2020 | 17 | 8 | 73.2 (72.4 - 74.0) | 1.967 | 75.0 | 0.789 | |
Zhang_UESTC_task1a_2 | N2 | Zhang2020 | 87 | 27 | 51.7 (50.8 - 52.6) | 1.805 | 56.1 | 1.297 |
Generalization performance
All results with evaluation dataset.
Submission information | Overall | Devices | Cities | |||||
---|---|---|---|---|---|---|---|---|
Rank | Submission label |
Technical Report |
Official system rank |
Accuracy (Evaluation dataset) |
Accuracy / unseen devices (Evaluation dataset) |
Accuracy / seen devices (Evaluation dataset) |
Accuracy / unseen cities (Evaluation dataset) |
Accuracy / seen cities (Evaluation dataset) |
Abbasi_ARI_task1a_1 | Abbasi2020 | 78 | 59.7 | 56.1 | 62.7 | 60.2 | 59.5 | |
Abbasi_ARI_task1a_2 | Abbasi2020 | 76 | 60.6 | 58.9 | 62.0 | 58.9 | 60.8 | |
Cao_JNU_task1a_1 | Fei2020 | 63 | 65.7 | 63.0 | 68.0 | 65.4 | 66.0 | |
Cao_JNU_task1a_2 | Fei2020 | 64 | 65.7 | 62.9 | 68.0 | 65.6 | 65.9 | |
Cao_JNU_task1a_3 | Fei2020 | 61 | 66.0 | 62.9 | 68.5 | 64.6 | 66.4 | |
Cao_JNU_task1a_4 | Fei2020 | 62 | 65.9 | 63.0 | 68.4 | 65.2 | 66.3 | |
FanVaf__task1a_1 | Fanioudakis2020 | 72 | 63.4 | 57.3 | 68.5 | 59.1 | 64.3 | |
FanVaf__task1a_2 | Fanioudakis2020 | 75 | 60.7 | 53.4 | 66.9 | 60.4 | 61.4 | |
FanVaf__task1a_3 | Fanioudakis2020 | 66 | 64.8 | 58.1 | 70.3 | 61.9 | 65.8 | |
FanVaf__task1a_4 | Fanioudakis2020 | 54 | 67.5 | 60.8 | 73.1 | 64.6 | 68.3 | |
Gao_UNISA_task1a_1 | Gao2020 | 9 | 75.0 | 73.3 | 76.5 | 73.7 | 75.7 | |
Gao_UNISA_task1a_2 | Gao2020 | 12 | 74.1 | 71.9 | 75.9 | 73.0 | 74.9 | |
Gao_UNISA_task1a_3 | Gao2020 | 11 | 74.7 | 72.9 | 76.1 | 73.3 | 75.5 | |
Gao_UNISA_task1a_4 | Gao2020 | 8 | 75.2 | 73.1 | 77.0 | 73.9 | 75.9 | |
DCASE2020 baseline | 51.4 | 37.2 | 63.1 | 51.8 | 51.5 | |||
Helin_ADSPLAB_task1a_1 | Wang2020_t1 | 14 | 73.4 | 70.1 | 76.2 | 71.2 | 74.1 | |
Helin_ADSPLAB_task1a_2 | Wang2020_t1 | 49 | 68.4 | 63.8 | 72.3 | 66.5 | 69.5 | |
Helin_ADSPLAB_task1a_3 | Wang2020_t1 | 18 | 73.1 | 70.2 | 75.5 | 70.8 | 74.0 | |
Helin_ADSPLAB_task1a_4 | Wang2020_t1 | 24 | 72.3 | 68.8 | 75.2 | 70.1 | 73.2 | |
Hu_GT_task1a_1 | Hu2020 | 6 | 75.7 | 74.3 | 76.8 | 73.0 | 76.3 | |
Hu_GT_task1a_2 | Hu2020 | 4 | 75.9 | 74.4 | 77.2 | 73.8 | 76.4 | |
Hu_GT_task1a_3 | Hu2020 | 3 | 76.2 | 74.7 | 77.5 | 74.1 | 76.9 | |
Hu_GT_task1a_4 | Hu2020 | 5 | 75.8 | 74.3 | 77.0 | 74.0 | 76.3 | |
JHKim_IVS_task1a_1 | Kim2020_t1 | 55 | 67.3 | 64.5 | 69.7 | 67.7 | 67.2 | |
JHKim_IVS_task1a_2 | Kim2020_t1 | 60 | 66.2 | 64.3 | 67.7 | 65.4 | 66.5 | |
Jie_Maxvision_task1a_1 | Jie2020 | 10 | 75.0 | 73.2 | 76.5 | 73.2 | 76.0 | |
Kim_SGU_task1a_1 | Changmin2020 | 33 | 71.6 | 69.2 | 73.5 | 69.5 | 72.5 | |
Kim_SGU_task1a_2 | Changmin2020 | 38 | 70.7 | 68.4 | 72.7 | 69.4 | 71.7 | |
Kim_SGU_task1a_3 | Changmin2020 | 39 | 70.7 | 68.3 | 72.6 | 70.1 | 71.4 | |
Kim_SGU_task1a_4 | Changmin2020 | 57 | 66.4 | 63.5 | 68.9 | 62.7 | 67.1 | |
Koutini_CPJKU_task1a_1 | Koutini2020 | 29 | 71.9 | 68.4 | 74.9 | 73.1 | 72.2 | |
Koutini_CPJKU_task1a_2 | Koutini2020 | 32 | 71.6 | 66.9 | 75.5 | 71.4 | 72.2 | |
Koutini_CPJKU_task1a_3 | Koutini2020 | 13 | 73.6 | 69.8 | 76.8 | 72.6 | 74.1 | |
Koutini_CPJKU_task1a_4 | Koutini2020 | 15 | 73.4 | 69.4 | 76.7 | 72.4 | 73.9 | |
Lee_CAU_task1a_1 | Lee2020 | 47 | 69.2 | 66.2 | 71.6 | 67.5 | 69.8 | |
Lee_CAU_task1a_2 | Lee2020 | 41 | 69.6 | 66.5 | 72.3 | 68.4 | 70.2 | |
Lee_CAU_task1a_3 | Lee2020 | 27 | 72.0 | 69.3 | 74.3 | 70.7 | 72.4 | |
Lee_CAU_task1a_4 | Lee2020 | 20 | 72.9 | 69.8 | 75.5 | 71.7 | 73.3 | |
Lee_GU_task1a_1 | Aryal2020 | 81 | 55.9 | 46.4 | 63.8 | 55.1 | 56.2 | |
Lee_GU_task1a_2 | Aryal2020 | 85 | 55.6 | 45.7 | 63.8 | 54.9 | 55.7 | |
Lee_GU_task1a_3 | Aryal2020 | 84 | 55.6 | 45.5 | 64.0 | 53.5 | 56.3 | |
Lee_GU_task1a_4 | Aryal2020 | 86 | 54.9 | 44.7 | 63.5 | 53.8 | 55.4 | |
Liu_SHNU_task1a_1 | Liu2020 | 45 | 69.3 | 65.1 | 72.7 | 67.6 | 70.0 | |
Liu_SHNU_task1a_2 | Liu2020 | 50 | 68.0 | 64.9 | 70.6 | 67.1 | 68.7 | |
Liu_SHNU_task1a_3 | Liu2020 | 83 | 55.7 | 46.8 | 63.1 | 49.4 | 57.0 | |
Liu_SHNU_task1a_4 | Liu2020 | 26 | 72.0 | 67.5 | 75.8 | 69.6 | 72.7 | |
Liu_UESTC_task1a_1 | Liu2020a | 16 | 73.2 | 71.9 | 74.3 | 73.3 | 73.4 | |
Liu_UESTC_task1a_2 | Liu2020a | 23 | 72.4 | 71.1 | 73.5 | 73.1 | 72.5 | |
Liu_UESTC_task1a_3 | Liu2020a | 21 | 72.5 | 71.3 | 73.5 | 71.7 | 72.9 | |
Liu_UESTC_task1a_4 | Liu2020a | 28 | 72.0 | 70.3 | 73.4 | 71.7 | 72.3 | |
Lopez-Meyer_IL_task1a_1 | Lopez-Meyer2020_t1a | 68 | 64.3 | 60.9 | 67.1 | 62.9 | 64.4 | |
Lopez-Meyer_IL_task1a_2 | Lopez-Meyer2020_t1a | 70 | 64.1 | 61.1 | 66.7 | 62.2 | 64.2 | |
Lu_INTC_task1a_1 | Hong2020 | 36 | 71.2 | 68.8 | 73.2 | 69.1 | 72.0 | |
Lu_INTC_task1a_2 | Hong2020 | 69 | 64.1 | 60.8 | 66.9 | 62.4 | 64.5 | |
Lu_INTC_task1a_3 | Hong2020 | 58 | 66.4 | 63.3 | 68.9 | 64.5 | 66.5 | |
Lu_INTC_task1a_4 | Hong2020 | 35 | 71.2 | 68.8 | 73.3 | 68.6 | 71.9 | |
Monteiro_INRS_task1a_1 | Joao2020 | 74 | 61.7 | 59.4 | 63.6 | 59.4 | 62.0 | |
Monteiro_INRS_task1a_2 | Joao2020 | 82 | 55.9 | 51.8 | 59.3 | 52.5 | 56.4 | |
Monteiro_INRS_task1a_3 | Joao2020 | 88 | 50.8 | 44.5 | 56.1 | 47.4 | 51.5 | |
Monteiro_INRS_task1a_4 | Joao2020 | 59 | 66.3 | 63.2 | 69.0 | 63.7 | 66.8 | |
Naranjo-Alcazar_Vfy_task1a_1 | Naranjo-Alcazar2020_t1 | 73 | 61.9 | 55.9 | 66.9 | 59.6 | 62.8 | |
Naranjo-Alcazar_Vfy_task1a_2 | Naranjo-Alcazar2020_t1 | 77 | 59.7 | 54.0 | 64.5 | 54.7 | 60.8 | |
Paniagua_UPM_task1a_1 | Paniagua2020 | 92 | 43.8 | 36.0 | 50.3 | 45.7 | 43.5 | |
Shim_UOS_task1a_1 | Shim2020 | 31 | 71.7 | 69.0 | 74.0 | 71.4 | 71.9 | |
Shim_UOS_task1a_2 | Shim2020 | 34 | 71.5 | 68.4 | 74.2 | 70.5 | 71.7 | |
Shim_UOS_task1a_3 | Shim2020 | 48 | 68.5 | 64.9 | 71.4 | 67.4 | 68.7 | |
Shim_UOS_task1a_4 | Shim2020 | 37 | 71.0 | 68.2 | 73.3 | 68.9 | 71.6 | |
Suh_ETRI_task1a_1 | Suh2020 | 22 | 72.5 | 69.9 | 74.6 | 70.4 | 73.1 | |
Suh_ETRI_task1a_2 | Suh2020 | 7 | 75.5 | 73.6 | 77.0 | 75.0 | 76.0 | |
Suh_ETRI_task1a_3 | Suh2020 | 1 | 76.5 | 74.6 | 78.1 | 75.8 | 77.3 | |
Suh_ETRI_task1a_4 | Suh2020 | 2 | 76.5 | 74.7 | 77.9 | 75.8 | 77.2 | |
Swiecicki_NON_task1a_1 | Swiecicki2020 | 56 | 67.1 | 64.0 | 69.6 | 65.7 | 66.9 | |
Swiecicki_NON_task1a_2 | Swiecicki2020 | 42 | 69.5 | 66.5 | 72.0 | 68.9 | 69.7 | |
Swiecicki_NON_task1a_3 | Swiecicki2020 | 40 | 70.3 | 68.2 | 72.0 | 66.5 | 71.1 | |
Swiecicki_NON_task1a_4 | Swiecicki2020 | 30 | 71.8 | 69.0 | 74.2 | 69.4 | 72.4 | |
Vilouras_AUTh_task1a_1 | Vilouras2020 | 53 | 67.7 | 63.5 | 71.2 | 65.8 | 68.1 | |
Vilouras_AUTh_task1a_2 | Vilouras2020 | 52 | 67.8 | 63.0 | 71.8 | 65.6 | 68.4 | |
Vilouras_AUTh_task1a_3 | Vilouras2020 | 44 | 69.3 | 65.3 | 72.6 | 66.9 | 70.1 | |
Waldekar_IITKGP_task1a_1 | Waldekar2020 | 79 | 58.4 | 52.9 | 62.9 | 52.8 | 59.6 | |
Wang_RoyalFlush_task1a_1 | Wang2020a | 80 | 56.7 | 54.8 | 58.2 | 54.9 | 57.4 | |
Wang_RoyalFlush_task1a_2 | Wang2020a | 65 | 65.2 | 63.0 | 67.0 | 64.1 | 65.5 | |
Wang_RoyalFlush_task1a_3 | Wang2020a | 71 | 64.0 | 60.0 | 67.3 | 63.7 | 64.4 | |
Wang_RoyalFlush_task1a_4 | Wang2020a | 91 | 45.5 | 42.9 | 47.7 | 45.2 | 45.7 | |
Wu_CUHK_task1a_1 | Wu2020_t1a | 67 | 64.7 | 60.0 | 68.7 | 63.7 | 65.2 | |
Wu_CUHK_task1a_2 | Wu2020_t1a | 46 | 69.3 | 63.0 | 74.5 | 65.1 | 70.4 | |
Wu_CUHK_task1a_3 | Wu2020_t1a | 51 | 67.9 | 62.7 | 72.3 | 66.3 | 68.3 | |
Wu_CUHK_task1a_4 | Wu2020_t1a | 43 | 69.4 | 63.6 | 74.3 | 66.1 | 70.3 | |
Zhang_THUEE_task1a_1 | Shao2020 | 19 | 73.0 | 69.9 | 75.6 | 71.8 | 73.8 | |
Zhang_THUEE_task1a_2 | Shao2020 | 17 | 73.2 | 70.0 | 75.8 | 71.6 | 74.1 | |
Zhang_THUEE_task1a_3 | Shao2020 | 25 | 72.3 | 68.8 | 75.2 | 70.2 | 73.2 | |
Zhang_UESTC_task1a_1 | Zhang2020 | 89 | 50.4 | 35.8 | 62.5 | 50.3 | 50.7 | |
Zhang_UESTC_task1a_2 | Zhang2020 | 87 | 51.7 | 37.5 | 63.5 | 50.8 | 52.5 | |
Zhang_UESTC_task1a_3 | Zhang2020 | 90 | 47.4 | 32.2 | 60.1 | 46.7 | 47.8 |
Class-wise performance
Rank | Submission label |
Technical Report |
Official system rank |
Accuracy | Airport | Bus | Metro |
Metro station |
Park |
Public square |
Shopping mall |
Street pedestrian |
Street traffic |
Tram |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Abbasi_ARI_task1a_1 | Abbasi2020 | 78 | 59.7 | 38.0 | 68.9 | 56.1 | 57.3 | 73.7 | 44.2 | 69.4 | 36.7 | 80.1 | 72.6 | |
Abbasi_ARI_task1a_2 | Abbasi2020 | 76 | 60.6 | 39.1 | 63.4 | 53.5 | 59.2 | 70.9 | 40.7 | 70.6 | 43.4 | 84.1 | 81.0 | |
Cao_JNU_task1a_1 | Fei2020 | 63 | 65.7 | 56.1 | 74.6 | 72.3 | 70.4 | 85.9 | 47.4 | 70.3 | 30.1 | 79.8 | 70.6 | |
Cao_JNU_task1a_2 | Fei2020 | 64 | 65.7 | 52.0 | 72.1 | 71.8 | 70.6 | 84.1 | 47.5 | 70.7 | 32.9 | 82.2 | 72.7 | |
Cao_JNU_task1a_3 | Fei2020 | 61 | 66.0 | 54.2 | 74.4 | 73.2 | 69.8 | 85.0 | 46.2 | 70.8 | 32.0 | 82.2 | 71.8 | |
Cao_JNU_task1a_4 | Fei2020 | 62 | 65.9 | 51.8 | 71.9 | 72.1 | 69.9 | 84.1 | 47.0 | 71.3 | 33.8 | 83.7 | 74.0 | |
FanVaf__task1a_1 | Fanioudakis2020 | 72 | 63.4 | 52.0 | 89.8 | 68.7 | 50.1 | 78.1 | 36.6 | 62.7 | 37.0 | 78.8 | 80.0 | |
FanVaf__task1a_2 | Fanioudakis2020 | 75 | 60.7 | 50.4 | 77.6 | 68.1 | 49.0 | 76.7 | 39.0 | 55.8 | 37.7 | 72.6 | 80.4 | |
FanVaf__task1a_3 | Fanioudakis2020 | 66 | 64.8 | 53.7 | 89.1 | 71.9 | 52.4 | 78.7 | 39.4 | 62.2 | 38.9 | 78.0 | 83.2 | |
FanVaf__task1a_4 | Fanioudakis2020 | 54 | 67.5 | 55.8 | 90.7 | 79.0 | 57.2 | 79.5 | 44.1 | 63.0 | 43.8 | 75.9 | 85.9 | |
Gao_UNISA_task1a_1 | Gao2020 | 9 | 75.0 | 58.2 | 87.8 | 76.2 | 75.6 | 92.7 | 57.5 | 75.3 | 52.3 | 90.7 | 84.3 | |
Gao_UNISA_task1a_2 | Gao2020 | 12 | 74.1 | 59.3 | 88.2 | 79.9 | 72.7 | 92.3 | 51.4 | 72.4 | 52.7 | 90.3 | 81.9 | |
Gao_UNISA_task1a_3 | Gao2020 | 11 | 74.7 | 57.3 | 86.7 | 78.4 | 74.5 | 93.0 | 57.1 | 73.3 | 52.3 | 90.3 | 83.8 | |
Gao_UNISA_task1a_4 | Gao2020 | 8 | 75.2 | 58.5 | 88.0 | 79.4 | 74.7 | 92.8 | 56.4 | 74.2 | 53.4 | 90.5 | 84.3 | |
DCASE2020 baseline | 51.4 | 26.3 | 82.3 | 45.4 | 53.8 | 67.3 | 34.7 | 40.3 | 30.4 | 69.4 | 63.7 | |||
Helin_ADSPLAB_task1a_1 | Wang2020_t1 | 14 | 73.4 | 60.2 | 82.7 | 81.4 | 72.8 | 93.4 | 52.9 | 74.6 | 44.4 | 85.0 | 86.7 | |
Helin_ADSPLAB_task1a_2 | Wang2020_t1 | 49 | 68.4 | 51.8 | 75.2 | 76.8 | 68.6 | 89.0 | 48.5 | 61.4 | 44.1 | 83.0 | 86.2 | |
Helin_ADSPLAB_task1a_3 | Wang2020_t1 | 18 | 73.1 | 56.9 | 82.4 | 81.7 | 73.7 | 92.8 | 53.8 | 72.6 | 44.9 | 85.7 | 86.7 | |
Helin_ADSPLAB_task1a_4 | Wang2020_t1 | 24 | 72.3 | 57.2 | 82.0 | 80.6 | 72.1 | 92.5 | 52.2 | 71.0 | 44.3 | 84.6 | 86.6 | |
Hu_GT_task1a_1 | Hu2020 | 6 | 75.7 | 62.3 | 89.8 | 82.1 | 72.4 | 92.8 | 56.2 | 83.4 | 40.4 | 91.5 | 85.7 | |
Hu_GT_task1a_2 | Hu2020 | 4 | 75.9 | 60.7 | 91.8 | 83.2 | 75.5 | 93.8 | 52.4 | 81.1 | 39.4 | 92.3 | 88.6 | |
Hu_GT_task1a_3 | Hu2020 | 3 | 76.2 | 61.7 | 92.1 | 84.0 | 74.5 | 93.9 | 53.8 | 81.5 | 39.4 | 92.6 | 88.6 | |
Hu_GT_task1a_4 | Hu2020 | 5 | 75.8 | 59.5 | 91.6 | 83.4 | 75.0 | 93.9 | 52.4 | 81.4 | 39.1 | 92.6 | 88.7 | |
JHKim_IVS_task1a_1 | Kim2020_t1 | 55 | 67.3 | 68.9 | 83.1 | 79.2 | 65.2 | 84.6 | 32.7 | 56.6 | 40.1 | 85.4 | 77.7 | |
JHKim_IVS_task1a_2 | Kim2020_t1 | 60 | 66.2 | 57.6 | 80.9 | 67.8 | 63.6 | 86.1 | 41.9 | 63.9 | 46.2 | 82.3 | 71.2 | |
Jie_Maxvision_task1a_1 | Jie2020 | 10 | 75.0 | 62.4 | 88.6 | 75.2 | 70.4 | 93.8 | 58.4 | 76.8 | 48.3 | 90.4 | 86.1 | |
Kim_SGU_task1a_1 | Changmin2020 | 33 | 71.6 | 51.2 | 78.5 | 82.0 | 71.9 | 93.0 | 46.4 | 76.3 | 48.7 | 90.4 | 77.4 | |
Kim_SGU_task1a_2 | Changmin2020 | 38 | 70.7 | 47.7 | 79.6 | 79.2 | 72.2 | 92.2 | 50.2 | 72.9 | 47.1 | 89.7 | 76.5 | |
Kim_SGU_task1a_3 | Changmin2020 | 39 | 70.7 | 47.6 | 76.0 | 77.2 | 74.4 | 92.8 | 48.6 | 74.3 | 42.8 | 90.9 | 81.9 | |
Kim_SGU_task1a_4 | Changmin2020 | 57 | 66.4 | 31.8 | 75.8 | 80.0 | 66.9 | 95.9 | 36.4 | 77.4 | 42.5 | 88.5 | 69.1 | |
Koutini_CPJKU_task1a_1 | Koutini2020 | 29 | 71.9 | 60.2 | 90.6 | 75.8 | 72.5 | 89.5 | 51.9 | 64.9 | 44.9 | 84.0 | 85.1 | |
Koutini_CPJKU_task1a_2 | Koutini2020 | 32 | 71.6 | 60.4 | 90.2 | 74.9 | 70.9 | 86.1 | 52.4 | 66.7 | 43.9 | 84.3 | 86.0 | |
Koutini_CPJKU_task1a_3 | Koutini2020 | 13 | 73.6 | 59.3 | 92.5 | 77.0 | 75.9 | 89.8 | 53.9 | 66.9 | 47.5 | 84.7 | 88.6 | |
Koutini_CPJKU_task1a_4 | Koutini2020 | 15 | 73.4 | 59.9 | 92.3 | 75.8 | 75.5 | 89.8 | 53.0 | 67.5 | 46.1 | 84.8 | 89.1 | |
Lee_CAU_task1a_1 | Lee2020 | 47 | 69.2 | 53.5 | 82.2 | 69.5 | 69.4 | 83.9 | 50.1 | 66.8 | 47.9 | 84.4 | 84.1 | |
Lee_CAU_task1a_2 | Lee2020 | 41 | 69.6 | 54.0 | 82.7 | 68.5 | 70.4 | 85.7 | 48.9 | 66.5 | 50.1 | 84.6 | 85.2 | |
Lee_CAU_task1a_3 | Lee2020 | 27 | 72.0 | 62.8 | 87.5 | 69.4 | 71.2 | 87.5 | 53.0 | 65.7 | 50.2 | 86.4 | 86.2 | |
Lee_CAU_task1a_4 | Lee2020 | 20 | 72.9 | 65.4 | 87.3 | 72.8 | 72.1 | 87.2 | 56.1 | 65.5 | 49.7 | 85.9 | 87.1 | |
Lee_GU_task1a_1 | Aryal2020 | 81 | 55.9 | 37.7 | 83.2 | 50.4 | 49.3 | 82.9 | 36.4 | 56.8 | 37.0 | 62.5 | 62.7 | |
Lee_GU_task1a_2 | Aryal2020 | 85 | 55.6 | 29.7 | 80.1 | 43.9 | 50.1 | 81.1 | 34.5 | 54.7 | 49.8 | 62.8 | 68.9 | |
Lee_GU_task1a_3 | Aryal2020 | 84 | 55.6 | 40.5 | 81.4 | 51.6 | 48.0 | 77.1 | 38.3 | 58.6 | 32.9 | 68.0 | 59.6 | |
Lee_GU_task1a_4 | Aryal2020 | 86 | 54.9 | 30.0 | 77.9 | 52.9 | 51.1 | 76.3 | 34.3 | 59.4 | 36.4 | 68.8 | 62.4 | |
Liu_SHNU_task1a_1 | Liu2020 | 45 | 69.3 | 55.6 | 85.4 | 74.4 | 64.3 | 88.5 | 47.3 | 67.0 | 45.2 | 83.7 | 81.4 | |
Liu_SHNU_task1a_2 | Liu2020 | 50 | 68.0 | 55.6 | 84.3 | 74.5 | 64.5 | 90.1 | 46.9 | 63.5 | 36.9 | 82.0 | 82.0 | |
Liu_SHNU_task1a_3 | Liu2020 | 83 | 55.7 | 35.5 | 65.4 | 54.2 | 50.8 | 80.6 | 35.6 | 52.4 | 38.9 | 71.5 | 71.9 | |
Liu_SHNU_task1a_4 | Liu2020 | 26 | 72.0 | 57.6 | 90.3 | 77.2 | 69.7 | 91.8 | 51.4 | 68.7 | 42.3 | 85.1 | 86.1 | |
Liu_UESTC_task1a_1 | Liu2020a | 16 | 73.2 | 55.1 | 79.1 | 80.2 | 71.5 | 86.3 | 58.3 | 85.9 | 46.0 | 90.6 | 79.1 | |
Liu_UESTC_task1a_2 | Liu2020a | 23 | 72.4 | 55.3 | 78.1 | 79.7 | 69.8 | 85.6 | 57.5 | 84.8 | 45.9 | 90.4 | 76.9 | |
Liu_UESTC_task1a_3 | Liu2020a | 21 | 72.5 | 55.1 | 80.4 | 78.4 | 74.2 | 86.2 | 60.8 | 77.4 | 50.3 | 83.3 | 78.8 | |
Liu_UESTC_task1a_4 | Liu2020a | 28 | 72.0 | 55.1 | 78.6 | 77.9 | 73.0 | 87.3 | 58.8 | 78.3 | 49.7 | 82.6 | 78.6 | |
Lopez-Meyer_IL_task1a_1 | Lopez-Meyer2020_t1a | 68 | 64.3 | 46.5 | 74.2 | 74.5 | 63.4 | 84.7 | 41.9 | 67.3 | 39.4 | 81.0 | 69.9 | |
Lopez-Meyer_IL_task1a_2 | Lopez-Meyer2020_t1a | 70 | 64.1 | 48.8 | 75.3 | 75.5 | 61.1 | 85.4 | 42.9 | 65.6 | 38.9 | 79.8 | 68.0 | |
Lu_INTC_task1a_1 | Hong2020 | 36 | 71.2 | 51.1 | 79.6 | 77.9 | 73.1 | 89.7 | 43.6 | 75.4 | 49.6 | 87.3 | 84.7 | |
Lu_INTC_task1a_2 | Hong2020 | 69 | 64.1 | 50.6 | 77.2 | 69.6 | 65.4 | 85.1 | 31.6 | 65.9 | 41.0 | 82.8 | 72.2 | |
Lu_INTC_task1a_3 | Hong2020 | 58 | 66.4 | 49.6 | 79.1 | 71.1 | 69.4 | 84.8 | 33.7 | 72.0 | 44.9 | 85.2 | 73.8 | |
Lu_INTC_task1a_4 | Hong2020 | 35 | 71.2 | 51.5 | 80.1 | 77.9 | 73.8 | 90.0 | 42.1 | 75.5 | 50.5 | 87.3 | 83.7 | |
Monteiro_INRS_task1a_1 | Joao2020 | 74 | 61.7 | 46.1 | 71.5 | 55.7 | 56.2 | 83.2 | 32.1 | 72.3 | 47.3 | 85.6 | 67.0 | |
Monteiro_INRS_task1a_2 | Joao2020 | 82 | 55.9 | 42.1 | 60.4 | 53.4 | 47.9 | 77.2 | 36.6 | 54.5 | 37.3 | 83.4 | 66.2 | |
Monteiro_INRS_task1a_3 | Joao2020 | 88 | 50.8 | 43.1 | 49.5 | 46.0 | 48.1 | 72.9 | 33.0 | 52.5 | 24.5 | 80.4 | 58.3 | |
Monteiro_INRS_task1a_4 | Joao2020 | 59 | 66.3 | 49.2 | 73.1 | 64.8 | 61.5 | 84.8 | 41.5 | 77.3 | 47.6 | 88.1 | 75.4 | |
Naranjo-Alcazar_Vfy_task1a_1 | Naranjo-Alcazar2020_t1 | 73 | 61.9 | 47.8 | 75.5 | 60.1 | 60.3 | 83.9 | 35.4 | 60.9 | 46.2 | 70.6 | 77.9 | |
Naranjo-Alcazar_Vfy_task1a_2 | Naranjo-Alcazar2020_t1 | 77 | 59.7 | 51.1 | 49.6 | 68.8 | 59.3 | 82.5 | 39.1 | 64.8 | 29.5 | 75.4 | 77.2 | |
Paniagua_UPM_task1a_1 | Paniagua2020 | 92 | 43.8 | 32.5 | 80.0 | 65.8 | 46.5 | 62.4 | 28.6 | 45.9 | 31.6 | 44.4 | 0.0 | |
Shim_UOS_task1a_1 | Shim2020 | 31 | 71.7 | 53.7 | 82.2 | 72.1 | 67.4 | 89.5 | 49.9 | 74.7 | 49.4 | 89.6 | 88.3 | |
Shim_UOS_task1a_2 | Shim2020 | 34 | 71.5 | 53.9 | 82.4 | 71.9 | 71.0 | 89.3 | 51.9 | 73.1 | 49.3 | 87.1 | 85.5 | |
Shim_UOS_task1a_3 | Shim2020 | 48 | 68.5 | 49.1 | 77.9 | 69.2 | 69.2 | 85.8 | 53.0 | 66.9 | 49.2 | 86.3 | 78.2 | |
Shim_UOS_task1a_4 | Shim2020 | 37 | 71.0 | 56.2 | 82.8 | 74.2 | 65.0 | 89.9 | 46.6 | 73.2 | 49.2 | 87.8 | 85.4 | |
Suh_ETRI_task1a_1 | Suh2020 | 22 | 72.5 | 52.9 | 82.2 | 82.7 | 73.5 | 93.5 | 41.8 | 79.3 | 47.1 | 92.8 | 79.0 | |
Suh_ETRI_task1a_2 | Suh2020 | 7 | 75.5 | 59.2 | 88.0 | 83.4 | 76.0 | 93.4 | 49.8 | 78.5 | 51.3 | 92.1 | 82.7 | |
Suh_ETRI_task1a_3 | Suh2020 | 1 | 76.5 | 60.8 | 88.8 | 82.9 | 76.6 | 93.7 | 52.9 | 81.4 | 50.9 | 92.3 | 84.8 | |
Suh_ETRI_task1a_4 | Suh2020 | 2 | 76.5 | 60.7 | 88.6 | 83.2 | 76.5 | 93.7 | 52.4 | 81.2 | 51.2 | 92.3 | 84.9 | |
Swiecicki_NON_task1a_1 | Swiecicki2020 | 56 | 67.1 | 52.8 | 71.1 | 67.6 | 65.8 | 87.1 | 49.2 | 72.0 | 41.6 | 84.6 | 78.8 | |
Swiecicki_NON_task1a_2 | Swiecicki2020 | 42 | 69.5 | 54.7 | 77.9 | 72.1 | 67.6 | 85.5 | 52.9 | 73.7 | 45.3 | 85.7 | 79.9 | |
Swiecicki_NON_task1a_3 | Swiecicki2020 | 40 | 70.3 | 60.4 | 79.7 | 78.5 | 67.2 | 86.4 | 53.8 | 69.7 | 43.4 | 85.8 | 77.9 | |
Swiecicki_NON_task1a_4 | Swiecicki2020 | 30 | 71.8 | 59.4 | 81.6 | 78.2 | 69.9 | 88.6 | 55.6 | 72.7 | 44.2 | 86.8 | 81.5 | |
Vilouras_AUTh_task1a_1 | Vilouras2020 | 53 | 67.7 | 45.8 | 88.6 | 60.1 | 58.2 | 90.3 | 57.4 | 77.4 | 47.6 | 81.2 | 70.4 | |
Vilouras_AUTh_task1a_2 | Vilouras2020 | 52 | 67.8 | 50.5 | 87.0 | 59.8 | 67.3 | 85.9 | 39.6 | 79.0 | 49.8 | 84.3 | 75.1 | |
Vilouras_AUTh_task1a_3 | Vilouras2020 | 44 | 69.3 | 50.5 | 89.1 | 61.1 | 65.1 | 89.3 | 49.5 | 79.9 | 49.7 | 84.2 | 74.6 | |
Waldekar_IITKGP_task1a_1 | Waldekar2020 | 79 | 58.4 | 50.4 | 70.4 | 53.5 | 51.9 | 84.0 | 38.9 | 59.6 | 33.4 | 72.9 | 68.5 | |
Wang_RoyalFlush_task1a_1 | Wang2020a | 80 | 56.7 | 39.7 | 82.4 | 69.9 | 41.1 | 82.9 | 37.2 | 53.5 | 27.9 | 75.3 | 56.6 | |
Wang_RoyalFlush_task1a_2 | Wang2020a | 65 | 65.2 | 56.4 | 74.9 | 67.7 | 49.5 | 84.5 | 50.8 | 68.0 | 43.9 | 79.7 | 76.2 | |
Wang_RoyalFlush_task1a_3 | Wang2020a | 71 | 64.0 | 60.4 | 63.1 | 65.7 | 49.5 | 83.9 | 46.1 | 59.8 | 49.2 | 77.1 | 84.7 | |
Wang_RoyalFlush_task1a_4 | Wang2020a | 91 | 45.5 | 5.6 | 89.6 | 69.1 | 62.1 | 84.8 | 34.2 | 6.3 | 0.0 | 78.6 | 24.7 | |
Wu_CUHK_task1a_1 | Wu2020_t1a | 67 | 64.7 | 51.4 | 84.1 | 56.0 | 55.7 | 86.0 | 43.9 | 59.7 | 48.0 | 83.9 | 78.5 | |
Wu_CUHK_task1a_2 | Wu2020_t1a | 46 | 69.3 | 46.3 | 87.1 | 76.8 | 68.2 | 86.9 | 43.3 | 65.7 | 49.8 | 84.7 | 83.8 | |
Wu_CUHK_task1a_3 | Wu2020_t1a | 51 | 67.9 | 46.8 | 85.1 | 66.5 | 63.9 | 87.4 | 45.0 | 65.7 | 50.1 | 86.0 | 82.9 | |
Wu_CUHK_task1a_4 | Wu2020_t1a | 43 | 69.4 | 46.9 | 86.7 | 72.6 | 66.6 | 88.0 | 45.6 | 65.5 | 51.6 | 86.1 | 84.5 | |
Zhang_THUEE_task1a_1 | Shao2020 | 19 | 73.0 | 57.0 | 85.5 | 78.4 | 73.2 | 92.5 | 55.1 | 69.4 | 52.8 | 84.3 | 82.0 | |
Zhang_THUEE_task1a_2 | Shao2020 | 17 | 73.2 | 57.4 | 85.7 | 79.1 | 73.2 | 92.3 | 55.1 | 68.9 | 53.8 | 84.3 | 81.8 | |
Zhang_THUEE_task1a_3 | Shao2020 | 25 | 72.3 | 55.6 | 85.6 | 77.4 | 72.6 | 90.7 | 54.5 | 67.3 | 53.5 | 84.3 | 81.3 | |
Zhang_UESTC_task1a_1 | Zhang2020 | 89 | 50.4 | 30.1 | 66.6 | 48.5 | 51.9 | 72.3 | 28.7 | 43.4 | 28.5 | 62.0 | 71.6 | |
Zhang_UESTC_task1a_2 | Zhang2020 | 87 | 51.7 | 32.1 | 82.8 | 53.8 | 47.8 | 65.1 | 31.1 | 47.3 | 37.0 | 64.4 | 55.6 | |
Zhang_UESTC_task1a_3 | Zhang2020 | 90 | 47.4 | 33.2 | 84.0 | 39.8 | 37.0 | 66.8 | 29.8 | 44.2 | 28.6 | 52.9 | 57.8 |
Device-wise performance
Unseen devices | Seen devices | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Rank | Submission label |
Technical Report |
Official system rank |
Accuracy |
Accuracy / Unseen |
Accuracy / Seen |
D | S7 | S8 | S9 | S10 | A | B | C | S1 | S2 | S3 |
Abbasi_ARI_task1a_1 | Abbasi2020 | 78 | 59.7 | 56.1 | 62.7 | 42.4 | 61.1 | 61.9 | 62.8 | 52.3 | 69.3 | 59.8 | 64.0 | 60.5 | 61.3 | 61.3 | |
Abbasi_ARI_task1a_2 | Abbasi2020 | 76 | 60.6 | 58.9 | 62.0 | 52.4 | 63.2 | 59.2 | 63.9 | 55.6 | 65.5 | 61.0 | 60.8 | 59.0 | 64.6 | 61.2 | |
Cao_JNU_task1a_1 | Fei2020 | 63 | 65.7 | 63.0 | 68.0 | 56.8 | 70.0 | 66.0 | 64.3 | 58.0 | 74.4 | 65.7 | 70.5 | 65.5 | 65.6 | 66.6 | |
Cao_JNU_task1a_2 | Fei2020 | 64 | 65.7 | 62.9 | 68.0 | 56.8 | 69.9 | 66.0 | 63.7 | 57.9 | 73.9 | 66.6 | 70.3 | 66.1 | 65.8 | 65.5 | |
Cao_JNU_task1a_3 | Fei2020 | 61 | 66.0 | 62.9 | 68.5 | 56.7 | 69.8 | 66.3 | 63.3 | 58.5 | 74.4 | 66.5 | 71.1 | 66.3 | 65.2 | 67.4 | |
Cao_JNU_task1a_4 | Fei2020 | 62 | 65.9 | 63.0 | 68.4 | 57.3 | 70.0 | 65.0 | 63.3 | 59.2 | 73.9 | 66.1 | 70.7 | 66.9 | 65.4 | 67.5 | |
FanVaf__task1a_1 | Fanioudakis2020 | 72 | 63.4 | 57.3 | 68.5 | 40.9 | 66.6 | 62.6 | 60.6 | 55.7 | 75.1 | 64.7 | 70.2 | 67.7 | 64.9 | 68.1 | |
FanVaf__task1a_2 | Fanioudakis2020 | 75 | 60.7 | 53.4 | 66.9 | 40.7 | 61.9 | 59.8 | 54.6 | 49.7 | 74.7 | 63.2 | 68.3 | 66.4 | 64.1 | 64.5 | |
FanVaf__task1a_3 | Fanioudakis2020 | 66 | 64.8 | 58.1 | 70.3 | 41.1 | 67.5 | 64.0 | 61.9 | 56.2 | 75.4 | 65.4 | 72.6 | 70.8 | 67.4 | 70.2 | |
FanVaf__task1a_4 | Fanioudakis2020 | 54 | 67.5 | 60.8 | 73.1 | 40.5 | 69.7 | 69.4 | 63.3 | 61.0 | 78.1 | 69.5 | 73.9 | 71.7 | 72.7 | 72.7 | |
Gao_UNISA_task1a_1 | Gao2020 | 9 | 75.0 | 73.3 | 76.5 | 64.6 | 76.8 | 73.9 | 76.0 | 75.0 | 79.7 | 74.4 | 78.0 | 74.2 | 76.6 | 76.3 | |
Gao_UNISA_task1a_2 | Gao2020 | 12 | 74.1 | 71.9 | 75.9 | 61.0 | 76.6 | 72.9 | 74.6 | 74.5 | 80.1 | 73.5 | 78.1 | 73.3 | 75.8 | 74.8 | |
Gao_UNISA_task1a_3 | Gao2020 | 11 | 74.7 | 72.9 | 76.1 | 63.1 | 77.5 | 74.1 | 75.2 | 74.7 | 79.7 | 73.5 | 78.4 | 74.1 | 75.0 | 76.0 | |
Gao_UNISA_task1a_4 | Gao2020 | 8 | 75.2 | 73.1 | 77.0 | 62.5 | 77.6 | 74.4 | 75.6 | 75.4 | 80.3 | 74.9 | 78.9 | 74.7 | 76.5 | 76.7 | |
DCASE2020 baseline | 51.4 | 37.2 | 63.1 | 22.8 | 49.8 | 41.1 | 31.0 | 41.3 | 72.8 | 61.7 | 68.9 | 62.7 | 54.6 | 58.2 | |||
Helin_ADSPLAB_task1a_1 | Wang2020_t1 | 14 | 73.4 | 70.1 | 76.2 | 65.0 | 76.3 | 74.5 | 66.9 | 67.6 | 81.7 | 74.6 | 79.8 | 72.9 | 73.8 | 74.4 | |
Helin_ADSPLAB_task1a_2 | Wang2020_t1 | 49 | 68.4 | 63.8 | 72.3 | 62.0 | 72.2 | 69.1 | 57.8 | 58.0 | 80.7 | 72.9 | 76.9 | 67.6 | 66.9 | 68.9 | |
Helin_ADSPLAB_task1a_3 | Wang2020_t1 | 18 | 73.1 | 70.2 | 75.5 | 66.6 | 77.3 | 73.9 | 66.6 | 66.8 | 82.4 | 74.2 | 78.8 | 73.1 | 71.7 | 73.1 | |
Helin_ADSPLAB_task1a_4 | Wang2020_t1 | 24 | 72.3 | 68.8 | 75.2 | 65.6 | 75.9 | 73.1 | 64.4 | 65.1 | 81.8 | 73.9 | 79.3 | 71.8 | 70.9 | 73.7 | |
Hu_GT_task1a_1 | Hu2020 | 6 | 75.7 | 74.3 | 76.8 | 68.0 | 76.0 | 76.4 | 76.6 | 74.4 | 80.0 | 74.7 | 77.7 | 75.0 | 76.6 | 76.9 | |
Hu_GT_task1a_2 | Hu2020 | 4 | 75.9 | 74.4 | 77.2 | 67.6 | 76.7 | 77.7 | 77.2 | 72.7 | 81.6 | 75.6 | 79.7 | 74.6 | 75.6 | 75.8 | |
Hu_GT_task1a_3 | Hu2020 | 3 | 76.2 | 74.7 | 77.5 | 68.2 | 76.6 | 77.1 | 77.9 | 73.6 | 81.5 | 75.4 | 79.8 | 75.0 | 76.2 | 76.9 | |
Hu_GT_task1a_4 | Hu2020 | 5 | 75.8 | 74.3 | 77.0 | 67.4 | 76.2 | 77.2 | 77.4 | 73.2 | 81.3 | 75.4 | 79.5 | 74.7 | 75.4 | 75.6 | |
JHKim_IVS_task1a_1 | Kim2020_t1 | 55 | 67.3 | 64.5 | 69.7 | 55.2 | 70.7 | 68.4 | 64.8 | 63.4 | 74.4 | 67.4 | 72.4 | 68.1 | 67.9 | 67.8 | |
JHKim_IVS_task1a_2 | Kim2020_t1 | 60 | 66.2 | 64.3 | 67.7 | 58.7 | 68.1 | 66.2 | 66.5 | 61.9 | 73.6 | 64.3 | 69.6 | 64.9 | 66.8 | 67.2 | |
Jie_Maxvision_task1a_1 | Jie2020 | 10 | 75.0 | 73.2 | 76.5 | 65.8 | 76.8 | 75.0 | 74.6 | 74.0 | 78.5 | 74.7 | 79.1 | 73.8 | 76.1 | 76.9 | |
Kim_SGU_task1a_1 | Changmin2020 | 33 | 71.6 | 69.2 | 73.5 | 60.1 | 73.7 | 70.7 | 73.9 | 67.8 | 77.6 | 70.7 | 78.5 | 71.6 | 70.6 | 72.0 | |
Kim_SGU_task1a_2 | Changmin2020 | 38 | 70.7 | 68.4 | 72.7 | 56.9 | 75.1 | 70.0 | 72.8 | 67.4 | 77.7 | 69.3 | 75.6 | 69.2 | 71.9 | 72.3 | |
Kim_SGU_task1a_3 | Changmin2020 | 39 | 70.7 | 68.3 | 72.6 | 60.1 | 72.5 | 70.2 | 71.9 | 66.9 | 77.0 | 70.1 | 75.9 | 70.4 | 70.6 | 71.7 | |
Kim_SGU_task1a_4 | Changmin2020 | 57 | 66.4 | 63.5 | 68.9 | 54.7 | 70.5 | 67.4 | 67.5 | 57.5 | 74.7 | 66.0 | 72.4 | 64.4 | 67.6 | 68.0 | |
Koutini_CPJKU_task1a_1 | Koutini2020 | 29 | 71.9 | 68.4 | 74.9 | 54.4 | 74.4 | 72.0 | 70.6 | 70.5 | 79.7 | 72.7 | 73.9 | 74.1 | 73.4 | 75.6 | |
Koutini_CPJKU_task1a_2 | Koutini2020 | 32 | 71.6 | 66.9 | 75.5 | 49.4 | 74.3 | 71.1 | 71.5 | 68.1 | 78.1 | 72.9 | 77.4 | 74.8 | 74.9 | 74.8 | |
Koutini_CPJKU_task1a_3 | Koutini2020 | 13 | 73.6 | 69.8 | 76.8 | 52.9 | 77.6 | 74.0 | 74.3 | 70.1 | 80.6 | 74.8 | 77.6 | 76.2 | 75.4 | 76.4 | |
Koutini_CPJKU_task1a_4 | Koutini2020 | 15 | 73.4 | 69.4 | 76.7 | 52.9 | 76.8 | 73.9 | 73.4 | 70.1 | 80.5 | 74.9 | 78.1 | 75.4 | 75.1 | 76.5 | |
Lee_CAU_task1a_1 | Lee2020 | 47 | 69.2 | 66.2 | 71.6 | 53.7 | 71.6 | 69.4 | 68.8 | 67.7 | 76.1 | 69.4 | 74.2 | 70.4 | 70.6 | 69.1 | |
Lee_CAU_task1a_2 | Lee2020 | 41 | 69.6 | 66.5 | 72.3 | 54.5 | 71.0 | 70.4 | 67.8 | 68.8 | 77.5 | 69.4 | 74.4 | 71.4 | 70.6 | 70.4 | |
Lee_CAU_task1a_3 | Lee2020 | 27 | 72.0 | 69.3 | 74.3 | 61.1 | 75.4 | 71.2 | 68.1 | 70.6 | 78.1 | 71.6 | 76.0 | 74.4 | 71.8 | 73.8 | |
Lee_CAU_task1a_4 | Lee2020 | 20 | 72.9 | 69.8 | 75.5 | 60.2 | 76.1 | 72.5 | 69.7 | 70.4 | 79.8 | 73.2 | 77.2 | 74.5 | 74.2 | 74.2 | |
Lee_GU_task1a_1 | Aryal2020 | 81 | 55.9 | 46.4 | 63.8 | 30.4 | 52.0 | 52.6 | 44.0 | 53.0 | 73.9 | 64.1 | 68.8 | 62.7 | 55.6 | 58.1 | |
Lee_GU_task1a_2 | Aryal2020 | 85 | 55.6 | 45.7 | 63.8 | 29.8 | 50.4 | 52.8 | 43.7 | 52.0 | 75.4 | 62.5 | 69.6 | 61.4 | 56.1 | 57.6 | |
Lee_GU_task1a_3 | Aryal2020 | 84 | 55.6 | 45.5 | 64.0 | 28.1 | 54.1 | 55.5 | 40.7 | 48.9 | 74.4 | 64.0 | 67.5 | 61.1 | 57.9 | 59.4 | |
Lee_GU_task1a_4 | Aryal2020 | 86 | 54.9 | 44.7 | 63.5 | 26.6 | 52.4 | 55.0 | 41.7 | 47.9 | 74.3 | 63.1 | 68.8 | 60.4 | 56.9 | 57.5 | |
Liu_SHNU_task1a_1 | Liu2020 | 45 | 69.3 | 65.1 | 72.7 | 50.7 | 74.4 | 69.4 | 67.3 | 63.8 | 76.5 | 70.1 | 72.1 | 73.0 | 71.8 | 73.0 | |
Liu_SHNU_task1a_2 | Liu2020 | 50 | 68.0 | 64.9 | 70.6 | 55.9 | 70.0 | 66.1 | 66.3 | 66.0 | 74.7 | 69.8 | 71.3 | 68.6 | 70.1 | 69.3 | |
Liu_SHNU_task1a_3 | Liu2020 | 83 | 55.7 | 46.8 | 63.1 | 40.3 | 55.7 | 56.3 | 35.0 | 46.8 | 74.4 | 65.6 | 70.6 | 55.3 | 56.8 | 55.6 | |
Liu_SHNU_task1a_4 | Liu2020 | 26 | 72.0 | 67.5 | 75.8 | 52.6 | 76.4 | 71.8 | 70.0 | 66.8 | 79.0 | 73.7 | 76.4 | 76.3 | 74.1 | 75.3 | |
Liu_UESTC_task1a_1 | Liu2020a | 16 | 73.2 | 71.9 | 74.3 | 66.1 | 75.1 | 71.5 | 75.6 | 71.4 | 78.3 | 71.7 | 76.6 | 73.3 | 74.0 | 72.0 | |
Liu_UESTC_task1a_2 | Liu2020a | 23 | 72.4 | 71.1 | 73.5 | 64.4 | 74.2 | 71.1 | 74.8 | 71.0 | 77.4 | 70.6 | 75.3 | 73.3 | 72.6 | 71.7 | |
Liu_UESTC_task1a_3 | Liu2020a | 21 | 72.5 | 71.3 | 73.5 | 65.2 | 75.0 | 71.0 | 75.0 | 70.3 | 77.8 | 70.3 | 77.1 | 71.7 | 72.3 | 71.6 | |
Liu_UESTC_task1a_4 | Liu2020a | 28 | 72.0 | 70.3 | 73.4 | 63.8 | 73.2 | 70.9 | 73.3 | 70.4 | 78.0 | 70.2 | 76.8 | 71.0 | 72.8 | 71.6 | |
Lopez-Meyer_IL_task1a_1 | Lopez-Meyer2020_t1a | 68 | 64.3 | 60.9 | 67.1 | 54.2 | 68.2 | 65.1 | 60.0 | 57.0 | 76.4 | 64.4 | 68.5 | 63.8 | 62.7 | 66.8 | |
Lopez-Meyer_IL_task1a_2 | Lopez-Meyer2020_t1a | 70 | 64.1 | 61.1 | 66.7 | 54.1 | 67.8 | 64.5 | 60.6 | 58.6 | 75.5 | 64.4 | 68.9 | 62.9 | 62.1 | 66.2 | |
Lu_INTC_task1a_1 | Hong2020 | 36 | 71.2 | 68.8 | 73.2 | 66.0 | 74.4 | 71.0 | 62.6 | 69.7 | 79.2 | 72.5 | 73.2 | 70.5 | 70.7 | 73.2 | |
Lu_INTC_task1a_2 | Hong2020 | 69 | 64.1 | 60.8 | 66.9 | 61.2 | 65.6 | 64.4 | 53.2 | 59.6 | 73.1 | 68.1 | 68.8 | 63.3 | 62.4 | 65.6 | |
Lu_INTC_task1a_3 | Hong2020 | 58 | 66.4 | 63.3 | 68.9 | 63.1 | 68.1 | 67.1 | 56.5 | 61.8 | 74.7 | 68.0 | 70.8 | 66.7 | 64.4 | 68.9 | |
Lu_INTC_task1a_4 | Hong2020 | 35 | 71.2 | 68.8 | 73.3 | 66.0 | 74.4 | 70.7 | 63.6 | 69.4 | 79.6 | 73.1 | 73.8 | 70.6 | 70.3 | 72.2 | |
Monteiro_INRS_task1a_1 | Joao2020 | 74 | 61.7 | 59.4 | 63.6 | 46.8 | 63.0 | 60.6 | 65.6 | 60.9 | 68.0 | 61.5 | 63.2 | 62.3 | 64.2 | 62.7 | |
Monteiro_INRS_task1a_2 | Joao2020 | 82 | 55.9 | 51.8 | 59.3 | 32.1 | 59.0 | 56.9 | 58.7 | 52.2 | 65.6 | 56.1 | 57.3 | 57.9 | 59.9 | 59.0 | |
Monteiro_INRS_task1a_3 | Joao2020 | 88 | 50.8 | 44.5 | 56.1 | 37.5 | 47.4 | 45.6 | 46.8 | 45.2 | 64.8 | 57.7 | 60.4 | 51.8 | 51.0 | 51.1 | |
Monteiro_INRS_task1a_4 | Joao2020 | 59 | 66.3 | 63.2 | 69.0 | 50.2 | 66.5 | 65.5 | 68.3 | 65.6 | 74.7 | 66.7 | 68.5 | 68.0 | 68.9 | 67.0 | |
Naranjo-Alcazar_Vfy_task1a_1 | Naranjo-Alcazar2020_t1 | 73 | 61.9 | 55.9 | 66.9 | 45.6 | 62.5 | 60.3 | 53.3 | 57.6 | 74.9 | 65.9 | 70.8 | 63.3 | 63.2 | 63.1 | |
Naranjo-Alcazar_Vfy_task1a_2 | Naranjo-Alcazar2020_t1 | 77 | 59.7 | 54.0 | 64.5 | 52.4 | 60.0 | 61.2 | 54.1 | 42.3 | 73.6 | 66.0 | 68.2 | 59.5 | 59.7 | 59.8 | |
Paniagua_UPM_task1a_1 | Paniagua2020 | 92 | 43.8 | 36.0 | 50.3 | 28.1 | 42.2 | 40.1 | 33.9 | 35.5 | 60.6 | 46.9 | 52.0 | 47.2 | 47.0 | 47.8 | |
Shim_UOS_task1a_1 | Shim2020 | 31 | 71.7 | 69.0 | 74.0 | 57.4 | 72.8 | 71.4 | 72.8 | 70.6 | 78.7 | 72.2 | 75.4 | 73.1 | 72.9 | 71.5 | |
Shim_UOS_task1a_2 | Shim2020 | 34 | 71.5 | 68.4 | 74.2 | 56.2 | 72.4 | 70.8 | 72.5 | 69.8 | 79.4 | 71.7 | 76.1 | 73.4 | 73.1 | 71.6 | |
Shim_UOS_task1a_3 | Shim2020 | 48 | 68.5 | 64.9 | 71.4 | 49.6 | 69.1 | 68.1 | 69.8 | 68.0 | 75.7 | 69.7 | 72.9 | 71.0 | 70.3 | 69.0 | |
Shim_UOS_task1a_4 | Shim2020 | 37 | 71.0 | 68.2 | 73.3 | 56.2 | 71.9 | 71.9 | 72.4 | 68.8 | 79.3 | 71.1 | 76.2 | 71.7 | 72.0 | 69.8 | |
Suh_ETRI_task1a_1 | Suh2020 | 22 | 72.5 | 69.9 | 74.6 | 62.1 | 73.9 | 71.3 | 72.3 | 69.8 | 78.5 | 73.5 | 78.2 | 71.2 | 72.2 | 74.1 | |
Suh_ETRI_task1a_2 | Suh2020 | 7 | 75.5 | 73.6 | 77.0 | 66.0 | 76.1 | 75.5 | 76.1 | 74.5 | 79.7 | 74.6 | 79.0 | 75.3 | 76.5 | 76.7 | |
Suh_ETRI_task1a_3 | Suh2020 | 1 | 76.5 | 74.6 | 78.1 | 65.6 | 78.0 | 75.6 | 77.6 | 76.3 | 81.1 | 75.6 | 80.0 | 76.3 | 77.6 | 77.9 | |
Suh_ETRI_task1a_4 | Suh2020 | 2 | 76.5 | 74.7 | 77.9 | 65.8 | 78.2 | 75.6 | 77.6 | 76.4 | 81.1 | 75.6 | 79.6 | 76.4 | 77.4 | 77.5 | |
Swiecicki_NON_task1a_1 | Swiecicki2020 | 56 | 67.1 | 64.0 | 69.6 | 49.6 | 68.0 | 67.6 | 70.4 | 64.6 | 72.7 | 69.0 | 70.5 | 67.5 | 70.8 | 67.0 | |
Swiecicki_NON_task1a_2 | Swiecicki2020 | 42 | 69.5 | 66.5 | 72.0 | 55.0 | 70.4 | 71.5 | 71.1 | 64.5 | 77.3 | 71.1 | 71.9 | 70.7 | 71.3 | 69.9 | |
Swiecicki_NON_task1a_3 | Swiecicki2020 | 40 | 70.3 | 68.2 | 72.0 | 56.4 | 72.6 | 73.0 | 73.7 | 65.5 | 75.6 | 68.9 | 72.3 | 71.2 | 72.8 | 71.0 | |
Swiecicki_NON_task1a_4 | Swiecicki2020 | 30 | 71.8 | 69.0 | 74.2 | 55.9 | 74.4 | 74.4 | 73.6 | 66.9 | 78.1 | 72.2 | 74.0 | 73.1 | 75.0 | 72.7 | |
Vilouras_AUTh_task1a_1 | Vilouras2020 | 53 | 67.7 | 63.5 | 71.2 | 60.6 | 68.1 | 62.9 | 59.9 | 65.8 | 78.1 | 69.2 | 75.6 | 67.4 | 68.6 | 68.3 | |
Vilouras_AUTh_task1a_2 | Vilouras2020 | 52 | 67.8 | 63.0 | 71.8 | 54.4 | 68.4 | 63.4 | 63.0 | 65.9 | 77.4 | 69.8 | 74.4 | 69.7 | 70.6 | 68.8 | |
Vilouras_AUTh_task1a_3 | Vilouras2020 | 44 | 69.3 | 65.3 | 72.6 | 59.9 | 70.3 | 64.9 | 63.2 | 68.2 | 79.1 | 69.4 | 75.5 | 70.0 | 70.7 | 71.0 | |
Waldekar_IITKGP_task1a_1 | Waldekar2020 | 79 | 58.4 | 52.9 | 62.9 | 50.8 | 57.8 | 59.2 | 47.5 | 49.3 | 68.2 | 59.8 | 66.9 | 59.3 | 62.0 | 61.0 | |
Wang_RoyalFlush_task1a_1 | Wang2020a | 80 | 56.7 | 54.8 | 58.2 | 47.3 | 58.6 | 57.5 | 56.4 | 54.4 | 65.0 | 55.4 | 61.9 | 57.4 | 53.7 | 55.7 | |
Wang_RoyalFlush_task1a_2 | Wang2020a | 65 | 65.2 | 63.0 | 67.0 | 56.8 | 67.2 | 63.0 | 65.4 | 62.6 | 74.1 | 63.3 | 70.8 | 64.5 | 62.6 | 66.6 | |
Wang_RoyalFlush_task1a_3 | Wang2020a | 71 | 64.0 | 60.0 | 67.3 | 56.8 | 65.5 | 63.3 | 57.2 | 57.1 | 75.7 | 65.7 | 70.3 | 65.2 | 63.2 | 63.5 | |
Wang_RoyalFlush_task1a_4 | Wang2020a | 91 | 45.5 | 42.9 | 47.7 | 40.4 | 46.7 | 47.3 | 40.3 | 39.8 | 52.2 | 47.0 | 50.5 | 42.6 | 46.6 | 47.2 | |
Wu_CUHK_task1a_1 | Wu2020_t1a | 67 | 64.7 | 60.0 | 68.7 | 47.7 | 63.8 | 64.2 | 65.4 | 58.8 | 76.7 | 64.4 | 71.1 | 66.2 | 66.3 | 67.4 | |
Wu_CUHK_task1a_2 | Wu2020_t1a | 46 | 69.3 | 63.0 | 74.5 | 48.1 | 70.7 | 70.1 | 68.6 | 57.3 | 80.4 | 73.0 | 77.7 | 71.3 | 71.9 | 72.8 | |
Wu_CUHK_task1a_3 | Wu2020_t1a | 51 | 67.9 | 62.7 | 72.3 | 50.2 | 68.0 | 68.9 | 68.6 | 58.1 | 78.9 | 68.6 | 73.8 | 70.7 | 71.2 | 70.5 | |
Wu_CUHK_task1a_4 | Wu2020_t1a | 43 | 69.4 | 63.6 | 74.3 | 49.4 | 70.4 | 69.8 | 69.2 | 59.2 | 80.6 | 71.0 | 75.9 | 72.4 | 73.8 | 72.0 | |
Zhang_THUEE_task1a_1 | Shao2020 | 19 | 73.0 | 69.9 | 75.6 | 59.2 | 75.8 | 71.7 | 72.4 | 70.4 | 80.4 | 73.7 | 79.3 | 74.4 | 71.9 | 74.1 | |
Zhang_THUEE_task1a_2 | Shao2020 | 17 | 73.2 | 70.0 | 75.8 | 59.0 | 75.9 | 71.3 | 72.7 | 71.0 | 80.6 | 73.9 | 79.4 | 74.1 | 72.3 | 74.6 | |
Zhang_THUEE_task1a_3 | Shao2020 | 25 | 72.3 | 68.8 | 75.2 | 55.6 | 75.4 | 71.3 | 71.9 | 69.9 | 80.0 | 72.7 | 78.7 | 72.9 | 73.4 | 73.3 | |
Zhang_UESTC_task1a_1 | Zhang2020 | 89 | 50.4 | 35.8 | 62.5 | 20.9 | 46.8 | 45.1 | 28.3 | 38.0 | 73.5 | 62.7 | 68.5 | 58.1 | 52.4 | 59.7 | |
Zhang_UESTC_task1a_2 | Zhang2020 | 87 | 51.7 | 37.5 | 63.5 | 21.6 | 52.0 | 45.1 | 28.3 | 40.6 | 72.6 | 63.6 | 68.3 | 61.5 | 55.4 | 59.6 | |
Zhang_UESTC_task1a_3 | Zhang2020 | 90 | 47.4 | 32.2 | 60.1 | 19.1 | 44.0 | 37.2 | 22.4 | 38.2 | 71.0 | 61.5 | 64.8 | 55.0 | 53.9 | 54.4 |
System characteristics
General characteristics
Rank | Submission label |
Technical Report |
Official system rank |
Accuracy (Eval) |
Sampling rate |
Data augmentation |
Features | Embeddings |
---|---|---|---|---|---|---|---|---|
Abbasi_ARI_task1a_1 | Abbasi2020 | 78 | 59.7 | 44.1kHz | mixup | mel spectrogram | ||
Abbasi_ARI_task1a_2 | Abbasi2020 | 76 | 60.6 | 44.1kHz | mixup | mel spectrogram | ||
Cao_JNU_task1a_1 | Fei2020 | 63 | 65.7 | 22.05kHz | mixup | log-mel spectrogram, gamma-tone spectrogram, CQT | ||
Cao_JNU_task1a_2 | Fei2020 | 64 | 65.7 | 22.05kHz | mixup | log-mel spectrogram, gamma-tone spectrogram, CQT | ||
Cao_JNU_task1a_3 | Fei2020 | 61 | 66.0 | 22.05kHz | mixup | log-mel spectrogram, gamma-tone spectrogram, CQT | ||
Cao_JNU_task1a_4 | Fei2020 | 62 | 65.9 | 22.05kHz | mixup | log-mel spectrogram, gamma-tone spectrogram, CQT | ||
FanVaf__task1a_1 | Fanioudakis2020 | 72 | 63.4 | 4kHz | mixup, time shifting | spectrogram | ||
FanVaf__task1a_2 | Fanioudakis2020 | 75 | 60.7 | 8kHz | mixup, time shifting | spectrogram | ||
FanVaf__task1a_3 | Fanioudakis2020 | 66 | 64.8 | 4kHz, 8kHz | mixup, time shifting | spectrogram | ||
FanVaf__task1a_4 | Fanioudakis2020 | 54 | 67.5 | 4kHz, 8kHz | mixup, time shifting | spectrogram | ||
Gao_UNISA_task1a_1 | Gao2020 | 9 | 75.0 | 44.1kHz | mixup, temporal cropping | log-mel energies, deltas, delta-deltas | ||
Gao_UNISA_task1a_2 | Gao2020 | 12 | 74.1 | 44.1kHz | mixup, temporal cropping | log-mel energies, deltas, delta-deltas | ||
Gao_UNISA_task1a_3 | Gao2020 | 11 | 74.7 | 44.1kHz | mixup, temporal cropping | log-mel energies, deltas, delta-deltas | ||
Gao_UNISA_task1a_4 | Gao2020 | 8 | 75.2 | 44.1kHz | mixup, temporal cropping | log-mel energies, deltas, delta-deltas | ||
DCASE2020 baseline | 51.4 | 44.1kHz | OpenL3 | |||||
Helin_ADSPLAB_task1a_1 | Wang2020_t1 | 14 | 73.4 | 44.1kHz | mixup | MFCC, log-mel energies, CQT, Gammatone | ||
Helin_ADSPLAB_task1a_2 | Wang2020_t1 | 49 | 68.4 | 44.1kHz | mixup | MFCC, log-mel energies, CQT, Gammatone | ||
Helin_ADSPLAB_task1a_3 | Wang2020_t1 | 18 | 73.1 | 44.1kHz | mixup | MFCC, log-mel energies, CQT, Gammatone | ||
Helin_ADSPLAB_task1a_4 | Wang2020_t1 | 24 | 72.3 | 44.1kHz | mixup | MFCC, log-mel energies, CQT, Gammatone | ||
Hu_GT_task1a_1 | Hu2020 | 6 | 75.7 | 44.1kHz | mixup, random cropping, channel confusion, SpecAugment, spectrum correction, reverberation-drc, pitch shift, speed change, random noise, mix audios | log-mel energies | ||
Hu_GT_task1a_2 | Hu2020 | 4 | 75.9 | 44.1kHz | mixup, random cropping, channel confusion, SpecAugment, spectrum correction, reverberation-drc, pitch shift, speed change, random noise, mix audios | log-mel energies | ||
Hu_GT_task1a_3 | Hu2020 | 3 | 76.2 | 44.1kHz | mixup, random cropping, channel confusion, SpecAugment, spectrum correction, reverberation-drc, pitch shift, speed change, random noise, mix audios | log-mel energies | ||
Hu_GT_task1a_4 | Hu2020 | 5 | 75.8 | 44.1kHz | mixup, random cropping, channel confusion, SpecAugment, spectrum correction, reverberation-drc, pitch shift, speed change, random noise, mix audios | log-mel energies | ||
JHKim_IVS_task1a_1 | Kim2020_t1 | 55 | 67.3 | 44.1kHz | subtract filter | HPSS, log-mel energies | ||
JHKim_IVS_task1a_2 | Kim2020_t1 | 60 | 66.2 | 44.1kHz | subtract filter | HPSS, log-mel energies | ||
Jie_Maxvision_task1a_1 | Jie2020 | 10 | 75.0 | 44.1kHz | mixup, temporal cropping | log-mel energies | ||
Kim_SGU_task1a_1 | Changmin2020 | 33 | 71.6 | 44.1kHz | mixup, temporal cropping, class-wise random masking | log-mel energies, deltas, delta-deltas, multiple channel feature | ||
Kim_SGU_task1a_2 | Changmin2020 | 38 | 70.7 | 44.1kHz | mixup, temporal cropping, class-wise random masking | log-mel energies, deltas, delta-deltas, multiple channel feature | ||
Kim_SGU_task1a_3 | Changmin2020 | 39 | 70.7 | 44.1kHz | mixup, temporal cropping, class-wise random masking | log-mel energies, deltas, delta-deltas, multiple channel feature | ||
Kim_SGU_task1a_4 | Changmin2020 | 57 | 66.4 | 44.1kHz | mixup, temporal cropping, class-wise random masking | log-mel energies, deltas, delta-deltas, multiple channel feature | ||
Koutini_CPJKU_task1a_1 | Koutini2020 | 29 | 71.9 | 22.05kHz | mixup | Perceptually-weighted log-mel energies | ||
Koutini_CPJKU_task1a_2 | Koutini2020 | 32 | 71.6 | 22.05kHz | mixup | Perceptually-weighted log-mel energies | ||
Koutini_CPJKU_task1a_3 | Koutini2020 | 13 | 73.6 | 22.05kHz | mixup | Perceptually-weighted log-mel energies | ||
Koutini_CPJKU_task1a_4 | Koutini2020 | 15 | 73.4 | 22.05kHz | mixup | Perceptually-weighted log-mel energies | ||
Lee_CAU_task1a_1 | Lee2020 | 47 | 69.2 | 44.1kHz | mixup | log-mel energies, deltas, delta-deltas, HPSS | ||
Lee_CAU_task1a_2 | Lee2020 | 41 | 69.6 | 44.1kHz | mixup | log-mel energies, deltas, delta-deltas, HPSS | ||
Lee_CAU_task1a_3 | Lee2020 | 27 | 72.0 | 44.1kHz | mixup | log-mel energies, deltas, delta-deltas, HPSS | ||
Lee_CAU_task1a_4 | Lee2020 | 20 | 72.9 | 44.1kHz | mixup | log-mel energies, deltas, delta-deltas, HPSS | ||
Lee_GU_task1a_1 | Aryal2020 | 81 | 55.9 | 44.1kHz | mixup, time masking, frequency masking | OpenL3 (env) | ||
Lee_GU_task1a_2 | Aryal2020 | 85 | 55.6 | 44.1kHz | mixup, time masking, frequency masking | OpenL3 (env) | ||
Lee_GU_task1a_3 | Aryal2020 | 84 | 55.6 | 44.1kHz | mixup, time masking, frequency masking | OpenL3 (music) | ||
Lee_GU_task1a_4 | Aryal2020 | 86 | 54.9 | 44.1kHz | mixup, time masking, frequency masking | OpenL3 (music) | ||
Liu_SHNU_task1a_1 | Liu2020 | 45 | 69.3 | 22.05kHz | mixup, deviceaugment | perceptual weighted power spectrogram | ||
Liu_SHNU_task1a_2 | Liu2020 | 50 | 68.0 | 22.05kHz | mixup | perceptual weighted power spectrogram | ||
Liu_SHNU_task1a_3 | Liu2020 | 83 | 55.7 | 44.1kHz | SpecAugment | log-mel energies | ||
Liu_SHNU_task1a_4 | Liu2020 | 26 | 72.0 | 22.05kHz,44.1kHz | mixup, deviceaugment | perceptual weighted power spectrogram | OpenL3 | |
Liu_UESTC_task1a_1 | Liu2020a | 16 | 73.2 | 44.1kHz | HPSS,NNF,vocal separation,HRTF | log-mel energies | ||
Liu_UESTC_task1a_2 | Liu2020a | 23 | 72.4 | 44.1kHz | HPSS,NNF,vocal separation,HRTF | log-mel energies | ||
Liu_UESTC_task1a_3 | Liu2020a | 21 | 72.5 | 44.1kHz | HPSS,NNF,vocal separation,HRTF | log-mel energies | ||
Liu_UESTC_task1a_4 | Liu2020a | 28 | 72.0 | 44.1kHz | HPSS,NNF,vocal separation,HRTF | log-mel energies | ||
Lopez-Meyer_IL_task1a_1 | Lopez-Meyer2020_t1a | 68 | 64.3 | 16kHz | random noise, random gain, random cropping, mixup, SpecAugment | raw waveform, mel filterbank | ||
Lopez-Meyer_IL_task1a_2 | Lopez-Meyer2020_t1a | 70 | 64.1 | 16kHz | random noise, random gain, random cropping, mixup, SpecAugment | raw waveform, mel filterbank | ||
Lu_INTC_task1a_1 | Hong2020 | 36 | 71.2 | 32kHz | mixup, weight decay, dropout, SpecAugment | mel spectrogram, CQT | None | |
Lu_INTC_task1a_2 | Hong2020 | 69 | 64.1 | 32kHz | mixup, weight decay, dropout, SpecAugment | mel spectrogram, CQT | None | |
Lu_INTC_task1a_3 | Hong2020 | 58 | 66.4 | 32kHz | mixup, weight decay, dropout, SpecAugment | mel spectrogram, CQT | None | |
Lu_INTC_task1a_4 | Hong2020 | 35 | 71.2 | 32kHz | mixup, weight decay, dropout, SpecAugment | mel spectrogram, CQT | None | |
Monteiro_INRS_task1a_1 | Joao2020 | 74 | 61.7 | 44.1kHz | Sox distortions, SpecAugment | log-mel energies | ||
Monteiro_INRS_task1a_2 | Joao2020 | 82 | 55.9 | 44.1kHz | Sox distortions, SpecAugment | log-mel energies | ||
Monteiro_INRS_task1a_3 | Joao2020 | 88 | 50.8 | 44.1kHz | Sox distortions, SpecAugment | modulation spectra | ||
Monteiro_INRS_task1a_4 | Joao2020 | 59 | 66.3 | 44.1kHz | Sox distortions, SpecAugment | log-mel energies, modulation spectra | ||
Naranjo-Alcazar_Vfy_task1a_1 | Naranjo-Alcazar2020_t1 | 73 | 61.9 | 44.1kHz | mixup | Gammatone | ||
Naranjo-Alcazar_Vfy_task1a_2 | Naranjo-Alcazar2020_t1 | 77 | 59.7 | 44.1kHz | mixup | HPSS, log-mel energies | ||
Paniagua_UPM_task1a_1 | Paniagua2020 | 92 | 43.8 | 44.1kHz | LTAS, envelope modulation spectrum | |||
Shim_UOS_task1a_1 | Shim2020 | 31 | 71.7 | 44.1kHz | mixup, SpecAugment | mel spectrogram | ||
Shim_UOS_task1a_2 | Shim2020 | 34 | 71.5 | 44.1kHz | mixup, SpecAugment | mel spectrogram | ||
Shim_UOS_task1a_3 | Shim2020 | 48 | 68.5 | 44.1kHz | mixup, SpecAugment | mel spectrogram | ||
Shim_UOS_task1a_4 | Shim2020 | 37 | 71.0 | 44.1kHz | mixup | mel spectrogram | ||
Suh_ETRI_task1a_1 | Suh2020 | 22 | 72.5 | 44.1kHz | temporal cropping, mixup | log-mel energies, deltas, delta-deltas | ||
Suh_ETRI_task1a_2 | Suh2020 | 7 | 75.5 | 44.1kHz | temporal cropping, mixup | log-mel energies, deltas, delta-deltas | ||
Suh_ETRI_task1a_3 | Suh2020 | 1 | 76.5 | 44.1kHz | temporal cropping, mixup | log-mel energies, deltas, delta-deltas | ||
Suh_ETRI_task1a_4 | Suh2020 | 2 | 76.5 | 44.1kHz | temporal cropping, mixup | log-mel energies, deltas, delta-deltas | ||
Swiecicki_NON_task1a_1 | Swiecicki2020 | 56 | 67.1 | 44.1kHz | mixup, SpecAugment, random resize, random cropping | log-mel energies | ||
Swiecicki_NON_task1a_2 | Swiecicki2020 | 42 | 69.5 | 44.1kHz | mixup, SpecAugment, random resize, random cropping | log-mel energies | ||
Swiecicki_NON_task1a_3 | Swiecicki2020 | 40 | 70.3 | 44.1kHz | mixup, SpecAugment, random resize, random cropping | log-mel energies | ||
Swiecicki_NON_task1a_4 | Swiecicki2020 | 30 | 71.8 | 44.1kHz | mixup, SpecAugment, random resize, random cropping | log-mel energies | ||
Vilouras_AUTh_task1a_1 | Vilouras2020 | 53 | 67.7 | 44.1kHz | log-mel energies, PCEN | |||
Vilouras_AUTh_task1a_2 | Vilouras2020 | 52 | 67.8 | 44.1kHz | mixup, time stretching, frequency masking, shifting, clipping distortion | log-mel energies, PCEN | ||
Vilouras_AUTh_task1a_3 | Vilouras2020 | 44 | 69.3 | 44.1kHz | mixup, time stretching, frequency masking, shifting, clipping distortion | log-mel energies, PCEN | ||
Waldekar_IITKGP_task1a_1 | Waldekar2020 | 79 | 58.4 | 44.1kHz | MFDWC | |||
Wang_RoyalFlush_task1a_1 | Wang2020a | 80 | 56.7 | 44.1kHz | mixup, spectrum correction | log-mel energies | ||
Wang_RoyalFlush_task1a_2 | Wang2020a | 65 | 65.2 | 44.1kHz | mixup, spectrum correction | log-mel energies | ||
Wang_RoyalFlush_task1a_3 | Wang2020a | 71 | 64.0 | 44.1kHz | mixup, spectrum correction | log-mel energies | ||
Wang_RoyalFlush_task1a_4 | Wang2020a | 91 | 45.5 | 44.1kHz | mixup, spectrum correction | log-mel energies | ||
Wu_CUHK_task1a_1 | Wu2020_t1a | 67 | 64.7 | 44.1kHz | mixup | wavelet filter-bank features | ||
Wu_CUHK_task1a_2 | Wu2020_t1a | 46 | 69.3 | 44.1kHz | mixup | wavelet filter-bank features | ||
Wu_CUHK_task1a_3 | Wu2020_t1a | 51 | 67.9 | 44.1kHz | mixup | wavelet filter-bank features | ||
Wu_CUHK_task1a_4 | Wu2020_t1a | 43 | 69.4 | 44.1kHz | mixup | wavelet filter-bank features | ||
Zhang_THUEE_task1a_1 | Shao2020 | 19 | 73.0 | 44.1kHz | mixup, ImageDataGenerator, temporal cropping | log-mel energies | ||
Zhang_THUEE_task1a_2 | Shao2020 | 17 | 73.2 | 44.1kHz | mixup, ImageDataGenerator, temporal cropping | log-mel energies | ||
Zhang_THUEE_task1a_3 | Shao2020 | 25 | 72.3 | 44.1kHz | mixup, ImageDataGenerator, temporal cropping | log-mel energies | ||
Zhang_UESTC_task1a_1 | Zhang2020 | 89 | 50.4 | 44.1kHz | log-mel energies | OpenL3 | ||
Zhang_UESTC_task1a_2 | Zhang2020 | 87 | 51.7 | 44.1kHz | log-mel energies | OpenL3 | ||
Zhang_UESTC_task1a_3 | Zhang2020 | 90 | 47.4 | 44.1kHz | log-mel energies | OpenL3 |
Machine learning characteristics
Rank | Code |
Technical Report |
Official system rank |
Accuracy (Eval) |
External data usage |
External data sources |
Model complexity |
Classifier |
Ensemble subsystems |
Decision making |
---|---|---|---|---|---|---|---|---|---|---|
Abbasi_ARI_task1a_1 | Abbasi2020 | 78 | 59.7 | 180310 | CNN,ensemble | 5 | average | |||
Abbasi_ARI_task1a_2 | Abbasi2020 | 76 | 60.6 | 180310 | CNN, ensemble, XGBoost | 5 | average | |||
Cao_JNU_task1a_1 | Fei2020 | 63 | 65.7 | 2631282 | CNN, 2-DenseNet | 5 | majority vote | |||
Cao_JNU_task1a_2 | Fei2020 | 64 | 65.7 | 2631282 | CNN,2-DenseNet | 5 | majority vote | |||
Cao_JNU_task1a_3 | Fei2020 | 61 | 66.0 | 5094806 | CNN,2-DenseNet | 7 | majority vote | |||
Cao_JNU_task1a_4 | Fei2020 | 62 | 65.9 | 5094806 | CNN,2-DenseNet | 7 | majority vote | |||
FanVaf__task1a_1 | Fanioudakis2020 | 72 | 63.4 | 20477140 | CRNN | sample-based average | ||||
FanVaf__task1a_2 | Fanioudakis2020 | 75 | 60.7 | 20477140 | CRNN | sample-based average | ||||
FanVaf__task1a_3 | Fanioudakis2020 | 66 | 64.8 | 20477140 | CRNN | 2 | sample average with weights | |||
FanVaf__task1a_4 | Fanioudakis2020 | 54 | 67.5 | 20477140 | CRNN | 2 | sample average with weights | |||
Gao_UNISA_task1a_1 | Gao2020 | 9 | 75.0 | 4311732 | ResNet | |||||
Gao_UNISA_task1a_2 | Gao2020 | 12 | 74.1 | 4311732 | ResNet | |||||
Gao_UNISA_task1a_3 | Gao2020 | 11 | 74.7 | 4312628# embeddings (OpenL2)=4684224, classifier=328707 | ResNet | |||||
Gao_UNISA_task1a_4 | Gao2020 | 8 | 75.2 | 12936092 | ResNet, ensemble | 3 | average | |||
DCASE2020 baseline | 51.4 | embeddings | 5012931 | MLP | ||||||
Helin_ADSPLAB_task1a_1 | Wang2020_t1 | 14 | 73.4 | directly | AudioSet | 341229835 | CNN, ensemble | 9 | average | |
Helin_ADSPLAB_task1a_2 | Wang2020_t1 | 49 | 68.4 | 839596544 | CNN, ensemble | 8 | average | |||
Helin_ADSPLAB_task1a_3 | Wang2020_t1 | 18 | 73.1 | directly | AudioSet | 361028107 | CNN, ensemble | 13 | average | |
Helin_ADSPLAB_task1a_4 | Wang2020_t1 | 24 | 72.3 | directly | AudioSet | 380826379 | CNN, ensemble | 17 | average | |
Hu_GT_task1a_1 | Hu2020 | 6 | 75.7 | 62525968 | CNN, ResNet, ensemble | 4 | average | |||
Hu_GT_task1a_2 | Hu2020 | 4 | 75.9 | 67763768 | CNN, ResNet, ensemble | 4 | average | |||
Hu_GT_task1a_3 | Hu2020 | 3 | 76.2 | 130289736 | CNN, ResNet, ensemble | 8 | average | |||
Hu_GT_task1a_4 | Hu2020 | 5 | 75.8 | 91251960 | CNN, ResNet, ensemble | 5 | average | |||
JHKim_IVS_task1a_1 | Kim2020_t1 | 55 | 67.3 | pre-trained model | 115300 | CNN | ||||
JHKim_IVS_task1a_2 | Kim2020_t1 | 60 | 66.2 | pre-trained model | 31600 | CNN | ||||
Jie_Maxvision_task1a_1 | Jie2020 | 10 | 75.0 | 3584924 | CNN | |||||
Kim_SGU_task1a_1 | Changmin2020 | 33 | 71.6 | 3254028 | Residual CNN | 2 | average | |||
Kim_SGU_task1a_2 | Changmin2020 | 38 | 70.7 | 3254908 | Residual CNN | 2 | average | |||
Kim_SGU_task1a_3 | Changmin2020 | 39 | 70.7 | 6352740 | Residual CNN | 4 | average | |||
Kim_SGU_task1a_4 | Changmin2020 | 57 | 66.4 | 3255788 | Residual CNN | 2 | average | |||
Koutini_CPJKU_task1a_1 | Koutini2020 | 29 | 71.9 | 19702400 | RF-regularized CNNs | |||||
Koutini_CPJKU_task1a_2 | Koutini2020 | 32 | 71.6 | 36783360 | RF-regularized CNNs | |||||
Koutini_CPJKU_task1a_3 | Koutini2020 | 13 | 73.6 | 225943040 | RF-regularized CNNs | |||||
Koutini_CPJKU_task1a_4 | Koutini2020 | 15 | 73.4 | 225943040 | RF-regularized CNNs | |||||
Lee_CAU_task1a_1 | Lee2020 | 47 | 69.2 | 10088328 | CNN, ResNet, LCNN, InceptionLike, ensemble | 8 | average | |||
Lee_CAU_task1a_2 | Lee2020 | 41 | 69.6 | 10088328 | CNN | 8 | average | |||
Lee_CAU_task1a_3 | Lee2020 | 27 | 72.0 | 10088328 | CNN, ResNet, LCNN, InceptionLike, ensemble | 8 | average | |||
Lee_CAU_task1a_4 | Lee2020 | 20 | 72.9 | 10088328 | CNN, ResNet, LCNN, InceptionLike, ensemble | 8 | average | |||
Lee_GU_task1a_1 | Aryal2020 | 81 | 55.9 | embeddings | 15940046 | ResNet, Attention | ||||
Lee_GU_task1a_2 | Aryal2020 | 85 | 55.6 | embeddings | 15940046 | ResNet, Attention | ||||
Lee_GU_task1a_3 | Aryal2020 | 84 | 55.6 | embeddings | 15940046 | ResNet, Attention | ||||
Lee_GU_task1a_4 | Aryal2020 | 86 | 54.9 | embeddings | 15940046 | ResNet, Attention | ||||
Liu_SHNU_task1a_1 | Liu2020 | 45 | 69.3 | 3563412 | ResNet , Receptive Field Regularization | |||||
Liu_SHNU_task1a_2 | Liu2020 | 50 | 68.0 | 4691274 | CNN | |||||
Liu_SHNU_task1a_3 | Liu2020 | 83 | 55.7 | 8756749 | Self-attention | |||||
Liu_SHNU_task1a_4 | Liu2020 | 26 | 72.0 | embeddings | 13267617 | ResNet , Receptive Field Regularization, CNN , MLP | ||||
Liu_UESTC_task1a_1 | Liu2020a | 16 | 73.2 | 26023864 | ResNet | 8 | average | |||
Liu_UESTC_task1a_2 | Liu2020a | 23 | 72.4 | 58559744 | ResNet | 18 | average | |||
Liu_UESTC_task1a_3 | Liu2020a | 21 | 72.5 | 26023864 | ResNet | 8 | stacking | |||
Liu_UESTC_task1a_4 | Liu2020a | 28 | 72.0 | 58559744 | ResNet | 18 | average | |||
Lopez-Meyer_IL_task1a_1 | Lopez-Meyer2020_t1a | 68 | 64.3 | directly | AudioSet | 39998697 | CNN, ResNet, VGG, ensemble | 3 | average | |
Lopez-Meyer_IL_task1a_2 | Lopez-Meyer2020_t1a | 70 | 64.1 | directly | AudioSet | 39998697 | CNN, ResNet, VGG, ensemble | 3 | average | |
Lu_INTC_task1a_1 | Hong2020 | 36 | 71.2 | pre-trained model | AudioSet | 27184858 | ResNext | 10 | average | |
Lu_INTC_task1a_2 | Hong2020 | 69 | 64.1 | pre-trained model | AudioSet | 27184858 | ResNext | None | softmax | |
Lu_INTC_task1a_3 | Hong2020 | 58 | 66.4 | pre-trained model | AudioSet | 27184858 | ResNext | 2 | average | |
Lu_INTC_task1a_4 | Hong2020 | 35 | 71.2 | pre-trained model | AudioSet | 27184858 | ResNext | 12 | average | |
Monteiro_INRS_task1a_1 | Joao2020 | 74 | 61.7 | 4978634 | ResNet | |||||
Monteiro_INRS_task1a_2 | Joao2020 | 82 | 55.9 | 4522398 | TDNN | |||||
Monteiro_INRS_task1a_3 | Joao2020 | 88 | 50.8 | 20731100 | ResNet | |||||
Monteiro_INRS_task1a_4 | Joao2020 | 59 | 66.3 | 20731100 | CNN, ResNet12, ResNet18, TDNN | 5 | average | |||
Naranjo-Alcazar_Vfy_task1a_1 | Naranjo-Alcazar2020_t1 | 73 | 61.9 | 425294 | CNN | |||||
Naranjo-Alcazar_Vfy_task1a_2 | Naranjo-Alcazar2020_t1 | 77 | 59.7 | 528014 | CNN | |||||
Paniagua_UPM_task1a_1 | Paniagua2020 | 92 | 43.8 | 11264 | MLP | average log-likelihood | ||||
Shim_UOS_task1a_1 | Shim2020 | 31 | 71.7 | embeddings | 1115461 | ensemble | 16 | score-sum | ||
Shim_UOS_task1a_2 | Shim2020 | 34 | 71.5 | embeddings | 1115461 | ensemble | 8 | score-sum | ||
Shim_UOS_task1a_3 | Shim2020 | 48 | 68.5 | 856693 | LCNN | 4 | score-sum | |||
Shim_UOS_task1a_4 | Shim2020 | 37 | 71.0 | embeddings | 594923 | ResNet | 8 | score-sum | ||
Suh_ETRI_task1a_1 | Suh2020 | 22 | 72.5 | 13164184 | ResNet | |||||
Suh_ETRI_task1a_2 | Suh2020 | 7 | 75.5 | 13164184 | ResNet | |||||
Suh_ETRI_task1a_3 | Suh2020 | 1 | 76.5 | 39492555 | Snapshot | 3 | average | |||
Suh_ETRI_task1a_4 | Suh2020 | 2 | 76.5 | 39492555 | Snapshot | 3 | weighted score average | |||
Swiecicki_NON_task1a_1 | Swiecicki2020 | 56 | 67.1 | 10711602 | EfficientNet | average | ||||
Swiecicki_NON_task1a_2 | Swiecicki2020 | 42 | 69.5 | 10711602 | EfficientNet | average | ||||
Swiecicki_NON_task1a_3 | Swiecicki2020 | 40 | 70.3 | 10711602 | EfficientNet | average | ||||
Swiecicki_NON_task1a_4 | Swiecicki2020 | 30 | 71.8 | 21423204 | EfficientNet | 2 | average | |||
Vilouras_AUTh_task1a_1 | Vilouras2020 | 53 | 67.7 | 3343774 | ResNet, ensemble | 4 | average | |||
Vilouras_AUTh_task1a_2 | Vilouras2020 | 52 | 67.8 | 3343774 | ResNet, ensemble | 4 | average | |||
Vilouras_AUTh_task1a_3 | Vilouras2020 | 44 | 69.3 | 6687548 | ResNet, ensemble | 8 | average | |||
Waldekar_IITKGP_task1a_1 | Waldekar2020 | 79 | 58.4 | 32400 | SVM | |||||
Wang_RoyalFlush_task1a_1 | Wang2020a | 80 | 56.7 | 542190 | CNN, ensemble | 6 | average | |||
Wang_RoyalFlush_task1a_2 | Wang2020a | 65 | 65.2 | 542190 | CNN, ensemble | 5 | average | |||
Wang_RoyalFlush_task1a_3 | Wang2020a | 71 | 64.0 | 542190 | CNN, ensemble | 4 | average | |||
Wang_RoyalFlush_task1a_4 | Wang2020a | 91 | 45.5 | 650628 | CNN, ensemble | 6 | average | |||
Wu_CUHK_task1a_1 | Wu2020_t1a | 67 | 64.7 | 13143642 | CNN | |||||
Wu_CUHK_task1a_2 | Wu2020_t1a | 46 | 69.3 | 53300328 | CNN | 4 | average | |||
Wu_CUHK_task1a_3 | Wu2020_t1a | 51 | 67.9 | 65718210 | CNN | 5 | average | |||
Wu_CUHK_task1a_4 | Wu2020_t1a | 43 | 69.4 | 119018538 | CNN | 9 | average | |||
Zhang_THUEE_task1a_1 | Shao2020 | 19 | 73.0 | 3524258 | ResNet, Mini-SegNet | 11 | ||||
Zhang_THUEE_task1a_2 | Shao2020 | 17 | 73.2 | 2516564 | ResNet, Mini-SegNet | 13 | ||||
Zhang_THUEE_task1a_3 | Shao2020 | 25 | 72.3 | 2196170 | ResNet, Mini-SegNet | 8 | ||||
Zhang_UESTC_task1a_1 | Zhang2020 | 89 | 50.4 | embeddings | 329610 | MLP , CNN | maximum likelihood | |||
Zhang_UESTC_task1a_2 | Zhang2020 | 87 | 51.7 | embeddings | 329610 | MLP , CNN | maximum likelihood | |||
Zhang_UESTC_task1a_3 | Zhang2020 | 90 | 47.4 | embeddings | 518090 | MLP , CNN | maximum likelihood |
Technical reports
Acoustic Scene Classification by the Snapshot Ensemble of CNNs with XGBoost
Reyhaneh Abbasi and Peter Balazs
Mathematics and Signal Processing in Acoustics, acoustic research institute of OEAW, Vienna, Austria
Abbasi_ARI_task1a_1 Abbasi_ARI_task1a_2
Acoustic Scene Classification by the Snapshot Ensemble of CNNs with XGBoost
Reyhaneh Abbasi and Peter Balazs
Mathematics and Signal Processing in Acoustics, acoustic research institute of OEAW, Vienna, Austria
Abstract
This is the report for the DCASE challenge task 1A. The aim is to implement acoustic scene classification of audio recordings into 10 predefined classes including Airport, shopping mall, metro station, street pedestrian, public square, street traffic, tram, bus, metro, and park. Two main drawbacks of this task are that recordings are provided by five devices with different quality and that some of these classes are very close in terms of acoustic information. To bias correct all instruments against the reference (here the instrument A), we have used XGboost algorithm fed by standaridiezd Mel spectrogram. Our classifier consists of a CNN, mix-up augmentation, and snapshot ensemble (to decrease the total number of parameters and, consequently, the variance of model prediction). Our model has yielded an accuracy of 62.1% and cross-entropy loss of 1.06. Whereas the baseline model has yielded the accuracy and cross-entropy loss of 54.1% and 1.36, respectively.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup |
Features | mel spectrogram |
Classifier | CNN,ensemble; CNN, ensemble, XGBoost |
Decision making | average |
Attention-Based Resnet-18 Model for Acoustic Scene Classification
Nisan Aryal and Sang Woong Lee
Gachon University, South Korea
Lee_GU_task1a_1 Lee_GU_task1a_2 Lee_GU_task1a_3 Lee_GU_task1a_4
Attention-Based Resnet-18 Model for Acoustic Scene Classification
Nisan Aryal and Sang Woong Lee
Gachon University, South Korea
Abstract
This technical report describes our approach to solve Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 challenge task1a. Resnet-18 with attention model and Openl3 embedding are used to solve the acoustic scene classification problem. The model shows 59.6% accuracy in the training and validation split of the development set, which is 5.5% higher than that of the baseline network.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, time masking, frequency masking |
Embeddings | OpenL3 (env); OpenL3 (music) |
Classifier | ResNet, Attention |
Multi-Channel Feature Using Inter-Class and Inter-Device Standard Deviations for Acoustic Scene Classification
Kim Changmin, Seo Soonshin and Kim Ji-Hwan
Dept. of Computer Scinece and Engineering, Sogang University, Seoul, South Korea
Kim_SGU_task1a_1 Kim_SGU_task1a_2 Kim_SGU_task1a_3 Kim_SGU_task1a_4
Multi-Channel Feature Using Inter-Class and Inter-Device Standard Deviations for Acoustic Scene Classification
Kim Changmin, Seo Soonshin and Kim Ji-Hwan
Dept. of Computer Scinece and Engineering, Sogang University, Seoul, South Korea
Abstract
In this technical report, we describe our acoustic scene classification methods submitted to detection and classification of acoustic scenes and events challenge 2020 task 1a. Our proposed methods aim to maximize the differences between acoustic scene classes and minimize the differences between various devices. We obtained the inter-class and inter-device standard deviations of the training data and applied them to the log-mel spectrogram features. These features are added to the channel of the original log-mel spectrogram. In addition, we applied class-wise random masking for the frequency domain with small standard deviations. Then, masked features are divided into quarters on the frequency axis. They are trained using four-pathway residual convolutional neural networks. Our proposed methods achieved an overall accuracy of 72.7% for the official development dataset, which was an improvement by 18.6% over the official baseline.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, temporal cropping, class-wise random masking |
Features | log-mel energies, deltas, delta-deltas, multiple channel feature |
Classifier | Residual CNN |
Decision making | average |
Investigating Temporal and Spectral Sequences Combining GRU-RNNS for Acoustic Scene Classification
Eleftherios Fanioudakis and Anastasios Vafeiadis
Greece
FanVaf__task1a_1 FanVaf__task1a_2 FanVaf__task1a_3 FanVaf__task1a_4
Investigating Temporal and Spectral Sequences Combining GRU-RNNS for Acoustic Scene Classification
Eleftherios Fanioudakis and Anastasios Vafeiadis
Greece
Abstract
This report describes our contribution to Task 1A of the 2020 Detection and Classification of Acoustic Scenes and Events (DCASE) challenge. We investigated the use of bi-directional Gated Recurrent Unit (GRU) - Recurrent Neural Networks (RNNs) in order to capture the spectral and temporal information of the input signal. The GRU-RNNs are used as an ensemble during training, having equal weights for the time and the frequency sequences. Our architecture is based on a Convolutional Recurrent Neural Network (CRNN), where the short-time Fourier magnitude spectrogram is used as an input to the network. By exploiting the mixup augmentation technique, randomly selecting the mixup coefficient α for every sample, and down-sampling the original signal from 44.1 kHz to 4 kHz, we achieved an average class accuracy of 65.4%. Since most of the information of the environmental sound signals was found in the lower frequencies, a CRNN model ensemble was performed, combining 4 and 8 kHz as the sampling frequencies. The latter system’s accuracy was boosted to 67.3%, a 24.4% increase over the development set baseline.
System characteristics
Sampling rate | 4kHz; 8kHz; 4kHz, 8kHz |
Data augmentation | mixup, time shifting |
Features | spectrogram |
Classifier | CRNN |
Decision making | sample-based average; sample average with weights |
Acoustic Scene Classification Based on 2-Order Dense Convolutional Network
Hongbo Fei, Zilong Huang, Yi Cao and Chen Liu
Mechanical engineering, Jiangnan University, Wuxi, China
Cao_JNU_task1a_1 Cao_JNU_task1a_2 Cao_JNU_task1a_3 Cao_JNU_task1a_4
Acoustic Scene Classification Based on 2-Order Dense Convolutional Network
Hongbo Fei, Zilong Huang, Yi Cao and Chen Liu
Mechanical engineering, Jiangnan University, Wuxi, China
Abstract
In this technical report, we describe our acoustic scene classification algorithm submitted in DCASE 2020 Task 1a. We focus on network innovation, a novel acoustic scene classification model based on 2-order dense convolutional network is proposed, which aims at the problems of insufficient classification accuracy and adaptability of current models. Based on the dense convolutional neural network, combined with the N-order Markov model, the traditional dense connection is improved to the N-order correlation connection, and then the N-order dense convolutional network model is proposed. In terms of audio feature extraction, we use Log-Mel spectrograms and Gamma-Tone spectrograms to stitch together. In order to further improve system performance, virtual data generation technology is adopted. Finally, use the trained model for transfer learning. By using proposed systems, we achieved a classification accuracy of 69.16% on the officially provided evaluation dataset, which is 15.06% over than the baseline system.
System characteristics
Sampling rate | 22.05kHz |
Data augmentation | mixup |
Features | log-mel spectrogram, gamma-tone spectrogram, CQT |
Classifier | CNN, 2-DenseNet; CNN,2-DenseNet |
Decision making | majority vote |
Acoustic Scene Classification Using Deep Residual Networks with Focal Loss and Mild Domain Adaptation
Wei Gao and Mark McDonnell
UniSA STEM, University of South Australia, Adelaide, Australia
Abstract
This technical report describes our approach to Tasks 1a in the 2020 DCASE acoustic scene classification challenge. We have incorporated few more training techniques based on our previous contest entries. One was replacing cross-entropy loss with focal loss which aims to focus on poor-classified samples while reducing the loss on well-classified samples with high probability; another methods used was to add an auxiliary binary classifier to serve the purpose of domain adaptation.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, temporal cropping |
Features | log-mel energies, deltas, delta-deltas |
Classifier | ResNet; ResNet, ensemble |
Decision making | average |
Acoustic Scene Classification Using Mel-Spectrum and CQT Based Neural Network Ensemble
Lu Hong
Intel Labs, Intel Corporation, Santa Clara, USA
Lu_INTC_task1a_1 Lu_INTC_task1a_2 Lu_INTC_task1a_3 Lu_INTC_task1a_4
Acoustic Scene Classification Using Mel-Spectrum and CQT Based Neural Network Ensemble
Lu Hong
Intel Labs, Intel Corporation, Santa Clara, USA
Abstract
In our submission to the DCASE 2020 Task1a, we have explored the use of ResNeXt-50 architecture with Log-Mel-spectrum and Constant-Q transform(CQT) based frontend. In order to improve performance, we use transfer learning technique. The neural networks were pre-trained with AudioSet data, and then fine-tuned over the DCASE task1a dataset. With DCASE 2020 task1a default train/validation split, we got about 70% average accuracy across all the 10 classes. To further improve the performance, we applied a leave-one-city out cross validation(CV) method to train 10 more models, with one city’s data as holdout set for each of the CV fold. These models were combined together with different ensemble strategies to produce 4 final submission entries.
System characteristics
Sampling rate | 32kHz |
Data augmentation | mixup, weight decay, dropout, SpecAugment |
Features | mel spectrogram, CQT |
Embeddings | None |
Classifier | ResNext |
Decision making | average; softmax |
Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation
Hu Hu1, Chao-Han Huck Yang1, Xianjun Xia2, Xue Bai3, Xin Tang3, Yajian Wang3, Shutong Niu3, Li Chai3, Juanjuan Li2, Hongning Zhu2, Feng Bao4, Yuanjun Zhao2, Sabato Marco Siniscalchi5, Yannan Wang2, Jun Du3 and Chin-Hui Lee1
1School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, USA, 2Tencent Media Lab, Shenzhen, China, 3University of Science and Technology of China, HeFei, China, 4Tencent Media Lab, Beijing, China, 5Computer Engineering School, University of Enna Kore, Italy
Hu_GT_task1a_1 Hu_GT_task1a_2 Hu_GT_task1a_3 Hu_GT_task1a_4
Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation
Hu Hu1, Chao-Han Huck Yang1, Xianjun Xia2, Xue Bai3, Xin Tang3, Yajian Wang3, Shutong Niu3, Li Chai3, Juanjuan Li2, Hongning Zhu2, Feng Bao4, Yuanjun Zhao2, Sabato Marco Siniscalchi5, Yannan Wang2, Jun Du3 and Chin-Hui Lee1
1School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, USA, 2Tencent Media Lab, Shenzhen, China, 3University of Science and Technology of China, HeFei, China, 4Tencent Media Lab, Beijing, China, 5Computer Engineering School, University of Enna Kore, Italy
Abstract
In this technical report, we present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge. Task 1 comprises two different sub-tasks: (i) Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes, and (ii) Task 1b concerns with classification of data into three higher-level classes using lowcomplexity solutions. For Task 1a, we propose a novel two-stage ASC system leveraging upon ad-hoc score combination of two convolutional neural networks (CNNs), classifying the acoustic input according to three classes, and then ten classes, respectively. Four different CNN-based architectures are explored to implement the two-stage classifiers, and several data augmentation techniques are also investigated. For Task 1b, we leverage upon a quantization method to reduce the complexity of two of our top-accuracy three-classes CNN-based architectures. On Task 1a development data set, an ASC accuracy of 76.9% is attained using our best single classifier and data augmentation. An accuracy of 81.9% is then attained by a final model fusion of our two-stage ASC classifiers. On Task 1b development data set, we achieve an accuracy of 96.7% with a model size smaller than 500KB
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, random cropping, channel confusion, SpecAugment, spectrum correction, reverberation-drc, pitch shift, speed change, random noise, mix audios |
Features | log-mel energies |
Classifier | CNN, ResNet, ensemble |
Decision making | average |
Acoustic Scene Classification with Residual Networks and Attention Mechanism
Liu Jie
Maxvision, Wuhan, China
Jie_Maxvision_task1a_1
Acoustic Scene Classification with Residual Networks and Attention Mechanism
Liu Jie
Maxvision, Wuhan, China
Abstract
This technical report describes our submission for TASK1A of DCASE2020 challenge. We use log-mel spectrograms and a residual network. We follow the idea of McDonnell [1] in DCASE2019 and do not downsample in the frequency axis. Besides, we use attention mechanism to improve the performance of the system.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, temporal cropping |
Features | log-mel energies |
Classifier | CNN |
Development of the INRS-EMT Scene Classification Systems for the 2020 Edition of the DCASE Challenge
Monteiro Joao, Shruti Kshirsagar, Anderson Avila, Amr Aaballah, Parth Tiwari and Tiago Falk
EMT, Institut National de la Recherche Scientifique, Montreal, Canada
Monteiro_INRS_task1a_1 Monteiro_INRS_task1a_2 Monteiro_INRS_task1a_3 Monteiro_INRS_task1a_4
Development of the INRS-EMT Scene Classification Systems for the 2020 Edition of the DCASE Challenge
Monteiro Joao, Shruti Kshirsagar, Anderson Avila, Amr Aaballah, Parth Tiwari and Tiago Falk
EMT, Institut National de la Recherche Scientifique, Montreal, Canada
Abstract
In this report we provide a brief overview of a set of submissions for the scene classification sub-tasks of the 2020 edition of the DCASE challenge. Our submissions comprise efforts at the feature representation level, where we explored the use of modulation spectra and i-vectors (extracted from mel cepstral coefficients, as well as modulation spectra) and modeling strategies, where recent convolutional deep neural network models were used. Results on the Challenge validation set show several of the submitted methods outperforming the baseline model.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | Sox distortions, SpecAugment |
Features | log-mel energies; modulation spectra; log-mel energies, modulation spectra |
Classifier | ResNet; TDNN; CNN, ResNet12, ResNet18, TDNN |
Decision making | average |
Acoustic Scene Classification Using Multi-Channel Audio Feature with Convolutional Neural Networks and Subtract Filter Augmentation
Jaehun Kim
AI Research Lab, IVS Inc, Seoul, South Korea
JHKim_IVS_task1a_1 JHKim_IVS_task1a_2
Acoustic Scene Classification Using Multi-Channel Audio Feature with Convolutional Neural Networks and Subtract Filter Augmentation
Jaehun Kim
AI Research Lab, IVS Inc, Seoul, South Korea
Abstract
This paper presents a multi-channel audio feature using imagenet model based on convolutional neural networks for DCASE 2020 Task1-A Acoustic scene classification with multiple devices. We use the TAU Urban Acoustic Scenes 2020 Mobile Dataset. It consists of 10 seconds of audio clips about 10 scenes. We proposed a multi-channel audio feature to use imagenet pre-trained model weight. also, we proposed filtered augmentation for other devices' recorded audio. the multichannel feature consists of raw and harmonic, percussive (HPSS) data’s Log-Mel-Spectrogram. Also, we use EfficientNet pre-trained model weight.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | subtract filter |
Features | HPSS, log-mel energies |
Classifier | CNN |
CP-JKU Submissions to DCASE’20: Low-Complexity Cross-Device Acoustic Scene Classification with RF-Regularized CNNs
Khaled Koutini, Florian Henkel, Hamid Eghbal-zadeh and Gerhard Widmer
Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria
Koutini_CPJKU_task1a_1 Koutini_CPJKU_task1a_2 Koutini_CPJKU_task1a_3 Koutini_CPJKU_task1a_4
CP-JKU Submissions to DCASE’20: Low-Complexity Cross-Device Acoustic Scene Classification with RF-Regularized CNNs
Khaled Koutini, Florian Henkel, Hamid Eghbal-zadeh and Gerhard Widmer
Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria
Abstract
This technical report describes the CP-JKU team’s submission for Task 1 - Subtask A (Acoustic Scene Classification with Multiple Devices) and Subtask B (Low-Complexity Acoustic Scene Classification) of the DCASE-2020 challenge. For Subtask 1A, we provide our Receptive Field (RF) regularized CNN model as a baseline, and additionally explore the use of two different domain adaption objectives in the form of the Maximum Mean Discrepancy (MMD) and the Sliced Wasserstein Distance (SWD). For Subtask 1B, we investigate different parameter reduction methods such as Pruning and Knowledge Distillation (KD). Additionally, we incorporate a decomposed convolutional layer that reduces the number of nonezero parameters in our models while only slightly decreasing the accuracy compared to full-parameter baseline.
System characteristics
Sampling rate | 22.05kHz |
Data augmentation | mixup |
Features | Perceptually-weighted log-mel energies |
Classifier | RF-regularized CNNs |
The CAU-ET Acoustic Scenery Classification System for DCASE 2020 Challenge
Yerin Lee1, Soyoung Lim1 and Il-Youp Kwak2
1Statistics Dept., Chung-Ang University, Seoul, South Korea, 2Department of Applied Statistics, Chung-Ang University, Seoul, South Korea
Lee_CAU_task1a_1 Lee_CAU_task1a_2 Lee_CAU_task1a_3 Lee_CAU_task1a_4
The CAU-ET Acoustic Scenery Classification System for DCASE 2020 Challenge
Yerin Lee1, Soyoung Lim1 and Il-Youp Kwak2
1Statistics Dept., Chung-Ang University, Seoul, South Korea, 2Department of Applied Statistics, Chung-Ang University, Seoul, South Korea
Abstract
The acoustic scenry classification problem is an interesting topic that has been studied for a long time through the DCASE competition. This technical report presents the CAU-ET’s submitted scenery detection system to the DCASE 2020 challenge, Task 1. In our method we generate mel-spectrogram from audio. From log-mel spectrogram, we got Deltas, Delta-deltas and Harmonic-percussive source seperation(HPSS) feature as inputs of our deep neural network models. The classification result of the proposed system was 66.26% for development dataset in subtask A and 95.27% in subtask B
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup |
Features | log-mel energies, deltas, delta-deltas, HPSS |
Classifier | CNN, ResNet, LCNN, InceptionLike, ensemble; CNN |
Decision making | average |
Acoustic Scene Classification with Various Deep Classifiers
Yue Liu, XinYuan Zhou and YanHua Long
The College of Information,Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai, China
Liu_SHNU_task1a_1 Liu_SHNU_task1a_2 Liu_SHNU_task1a_3 Liu_SHNU_task1a_4
Acoustic Scene Classification with Various Deep Classifiers
Yue Liu, XinYuan Zhou and YanHua Long
The College of Information,Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai, China
Abstract
In this report, we describes the SHNU team’s submission to the DCASE-2020 challenge Task1-A (Acoustic Scene Classification with Multiple Devices). In our submissions, three different deep models are investigated. The first one is a ResNet-based model with receptive-field regularization. The second one is a common two-dimensional CNN model with perceptual weighted power spectrogram as input. The third one is a self-attention based model with only Transformer encoder architecture which is specially designed for acoustic scene classification. In addition, we proposed a deviceenhancement data augmentation method, together with the conventional mix-up and specAugment to improve the model robustness to multiple devices. Experimental results on the fold1 validation set show that these models are complementary in some extent. We prepared all of our submissions without the use of any external data except for the official baseline embeddings. The logistic regression score fusion is used to fuse the softmax outputs of single-systems.
System characteristics
Sampling rate | 22.05kHz; 44.1kHz; 22.05kHz,44.1kHz |
Data augmentation | mixup, deviceaugment; mixup; SpecAugment |
Features | perceptual weighted power spectrogram; log-mel energies |
Embeddings | OpenL3 |
Classifier | ResNet , Receptive Field Regularization; CNN; Self-attention; ResNet , Receptive Field Regularization, CNN , MLP |
Acoustic Scene Classification Using Ensembles of Deep Residual Networks and Spectrogram Decompositions
Yingzi Liu, Shengwang Jiang, Chuang Shi and Huiyong Li
School of imformation and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China
Liu_UESTC_task1a_1 Liu_UESTC_task1a_2 Liu_UESTC_task1a_3 Liu_UESTC_task1a_4
Acoustic Scene Classification Using Ensembles of Deep Residual Networks and Spectrogram Decompositions
Yingzi Liu, Shengwang Jiang, Chuang Shi and Huiyong Li
School of imformation and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China
Abstract
This technical report describes ensembles of convolutional neural networks (CNNs) for the task 1 / subtask B of the DACSE 2020 challenge, with emphasis on the use of a deep residual network applied to different spectrogram decompositions. The harmonic percussive source separation (HPSS), nearest neighbor filter (NNF), vocal separation and Head-related transfer function (HRTF) are used to augment the acoustic features. Our system achieves higher classification accuracies and lower log loss in the development dataset than baseline system.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | HPSS,NNF,vocal separation,HRTF |
Features | log-mel energies |
Classifier | ResNet |
Decision making | average; stacking |
Ensemble of Convolutional Neural Networks for the DCASE 2020 Acoustic Scene Classification Challenge
Paulo Lopez-Meyer1, Juan Antonio Del Hoyo Ontiveros1, Georg Stemmer2, Lama Nachman3 and Jonathan Huang4
1Intel Labs, Intel Corporation, Jalisco, Mexico, 2Intel Labs, Intel Corporation, Neubiberg, Germany, 3Intel Labs, Intel Corporation, California, USA, 4California, USA
Lopez-Meyer_IL_task1a_1 Lopez-Meyer_IL_task1a_2
Ensemble of Convolutional Neural Networks for the DCASE 2020 Acoustic Scene Classification Challenge
Paulo Lopez-Meyer1, Juan Antonio Del Hoyo Ontiveros1, Georg Stemmer2, Lama Nachman3 and Jonathan Huang4
1Intel Labs, Intel Corporation, Jalisco, Mexico, 2Intel Labs, Intel Corporation, Neubiberg, Germany, 3Intel Labs, Intel Corporation, California, USA, 4California, USA
Abstract
For the DCASE 2020 Task 1a, we propose the use of three different deep learning based convolutional neural networks architectures: AclNet, AclResNet50, and Vgg12. These three neural network architectures were pre-trained with Audioset data for embedding generation, and then fine-tuned with an added classification layer, though the development dataset provided by the task. The outputs produced by these trained models proved to be complementary when ensemble, as expected, due to the different nature of the feature front-end, and of architecture diversity. The ensemble average of these models’ outputs improved significantly from best single model classification accuracy of 67.55% to 69.74% on the evaluation dataset, when trained with the challenge suggested development partitioning.
System characteristics
Sampling rate | 16kHz |
Data augmentation | random noise, random gain, random cropping, mixup, SpecAugment |
Features | raw waveform, mel filterbank |
Classifier | CNN, ResNet, VGG, ensemble |
Decision making | average |
Task 1 DCASE 2020: ASC with Mismatch Devices and Reduced Size Model Using Residual Squeeze-Excitation CNNs
Javier Naranjo-Alcazar1,2, Sergi Perez-Castanos3, Pedro Zuccarello3 and Maximo Cobos2
1AI department, Visualfy, Benisano, Spain, 2Computer Science Department, Universitat de Valencia, Burjassot, Spain, 3AI department, Visualfy, Benisano, Valencia
Naranjo-Alcazar_Vfy_task1a_1 Naranjo-Alcazar_Vfy_task1a_2
Task 1 DCASE 2020: ASC with Mismatch Devices and Reduced Size Model Using Residual Squeeze-Excitation CNNs
Javier Naranjo-Alcazar1,2, Sergi Perez-Castanos3, Pedro Zuccarello3 and Maximo Cobos2
1AI department, Visualfy, Benisano, Spain, 2Computer Science Department, Universitat de Valencia, Burjassot, Spain, 3AI department, Visualfy, Benisano, Valencia
Abstract
Acoustic Scene Classification (ASC) is a problem related to the field of machine listening whose objective is to classify/tag an audio clip in a predefined label describing a scene location such as park, airport among others. Due to the emergence of more extensive audio datasets, solutions based on Deep Learning techniques have become the state-of-the-art. The most common choice are those that implement a convolutional neural network (CNN) having previously transformed the audio signal into a 2D representation. This twodimensional audio representation is currently a subject of research. In addition, there are solutions that propose several concatenated 2D representations, thus creating a representation with several input channels. This article proposes two novel stereo audio representations to maximize the accuracy of an ASC framework. These representations correspond to the 3-channel representations such as the left channel, the right channel and the difference between channels (L − R) using the Gammatone filter bank and the harmonic, percussive and difference between channels sources using the Mel filter bank. Both representations are also concatenated creating a 6-channel with different audio filter banks. Furthermore, the proposed CNN is a residual network that employs squeeze-excitation techniques in its residual blocks in a novel way to force the network to extract meaningful features from the audio representation. The proposed network is used in both subtasks with different modifications to meet the requirements of each one. However, since stereo audio is not available in Subtask A, the representations are slightly modified in that task. This technical report first presents the overlaps of the two tasks and then makes the relevant changes to each task in one section per task. The baselines are surpassed in both tasks by approximately 10 percentage points.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup |
Features | Gammatone; HPSS, log-mel energies |
Classifier | CNN |
Classification of Acoustic Scenes Based on Modulation Spectra and the Cepstrum of the Cross Correlation Between Binarual Audio Channels
Arturo Paniagua, Rubén Fraile, Juana M. Gutiérrez-Arriola, Nicolás Sáenz-lechón and Víctor J- Osma-Ruiz
CITSEM, Universidad Politéctica de Madrid, Madrid, Spain
Paniagua_UPM_task1a_1
Classification of Acoustic Scenes Based on Modulation Spectra and the Cepstrum of the Cross Correlation Between Binarual Audio Channels
Arturo Paniagua, Rubén Fraile, Juana M. Gutiérrez-Arriola, Nicolás Sáenz-lechón and Víctor J- Osma-Ruiz
CITSEM, Universidad Politéctica de Madrid, Madrid, Spain
Abstract
A system for the automatic classification of acoustic scenes is proposed that uses one audio channel for calculating the spectral distribution of energy across auditory-relevant frequency bands, and some descriptors of the envelope modulation spectrum (EMS) obtained by means of the discrete cosine transform. When the stereophonic signal captured by a binaural microphone is available, this parameter set is augmented by including the first coefficients of the cepstrum of the cross-correlation between both audio channels. This cross-correlation contains information on the angular distribution of acoustic sources. These three types of features (energy spectrum, EMS and cepstrum of cross-correlation) are used as inputs for a multilayer perceptron with two hidden layers and a number of adjustable parameters below 15,000.
System characteristics
Sampling rate | 44.1kHz |
Features | LTAS, envelope modulation spectrum |
Classifier | MLP |
Decision making | average log-likelihood |
Thuee Submission for DCASE 2020 Challenge Task1a
Yunfei Shao1, Xinxin Ma2, Yong Ma2 and Wei-Qiang Zhang1
1Department of Electronic Engineering, Tsinghua University, Beijing, China, 2School of Physics and Electronic Engineering, Jiangsu Normal University, Xuzhou, China
Zhang_THUEE_task1a_1 Zhang_THUEE_task1a_2 Zhang_THUEE_task1a_3
Thuee Submission for DCASE 2020 Challenge Task1a
Yunfei Shao1, Xinxin Ma2, Yong Ma2 and Wei-Qiang Zhang1
1Department of Electronic Engineering, Tsinghua University, Beijing, China, 2School of Physics and Electronic Engineering, Jiangsu Normal University, Xuzhou, China
Abstract
In this report, we described our submission for the task1a of Detection and Classification of Acoustic Scenes and Events (DACSE) 2020 Challenge: Acoustic Scene Classification with Multiple Devices. Our methods are mainly based on two types of deep learning models: ResNet and Mini-SegNet. In our submissions, we designed two classification systems. Firstly, we applied spectrum correction to combat mismatched frequency responses, and further proposed in log-mel domain. Then these features are fed to ResNet or Mini-SegNet models for feature learning. In order to prevent overfitting, we adopted mixup augmentation, ImageDataGenrator and temporal crop augmentation for data augmentation. Besides, we tried an ensemble of multiple subsystems to enhance the generalization capability of our system. In our work, our final system achieved an average of 75.02% on different devices in the Development dataset.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, ImageDataGenerator, temporal cropping |
Features | log-mel energies |
Classifier | ResNet, Mini-SegNet |
Audio Tagging and Deep Architectures for Acoustic Scene Classification: Uos Submission for the DCASE 2020 Challenge
Hye-jin Shim, Ju-ho Kim, Jee-weon Jung and Ha-jin Yu
School of Computer Science, University of Seoul, Seoul, South Korea
Shim_UOS_task1a_1 Shim_UOS_task1a_2 Shim_UOS_task1a_3 Shim_UOS_task1a_4
Audio Tagging and Deep Architectures for Acoustic Scene Classification: Uos Submission for the DCASE 2020 Challenge
Hye-jin Shim, Ju-ho Kim, Jee-weon Jung and Ha-jin Yu
School of Computer Science, University of Seoul, Seoul, South Korea
Abstract
In this technical report, we address the UOS submission for the Detection and Classification of Acoustic Scenes and Events 2020 Challenge Task 1-a. We propose to utilize the representation vectors, extracted from a pre-trained audio tagging system, for the acoustic scene classification task. Audio tagging denotes the existence of various sound events and is known to help the classification of acoustic scene. To select suitable feature for the acoustic scene classification task, we also explore deep architectures such as light convolutional neural networks and convolutional block attention module. Experiments are conducted using the official fold-1 configuration test set. Results using audio tagging representation and deep architectures demonstrate accuracies of 68.8% and 70.5%, compared to that of 65.3% of the baseline. Additionally, score-sum ensemble of the two proposed systems has an accuracy of 71.9% which shows 10.1% relative improvement.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, SpecAugment; mixup |
Features | mel spectrogram |
Classifier | ensemble; LCNN; ResNet |
Decision making | score-sum |
Designing Acoustic Scene Classification Models with CNN Variants
Sangwon Suh, Sooyoung Park, Youngho Jeong and Taejin Lee
Media Coding Research Section, Electronics and Telecommunications Research Institute, Daejeon, South Korea
Suh_ETRI_task1a_1 Suh_ETRI_task1a_2 Suh_ETRI_task1a_3 Suh_ETRI_task1a_4
Designing Acoustic Scene Classification Models with CNN Variants
Sangwon Suh, Sooyoung Park, Youngho Jeong and Taejin Lee
Media Coding Research Section, Electronics and Telecommunications Research Institute, Daejeon, South Korea
Abstract
This technical report describes our Acoustic Scene Classification systems for DCASE2020 challenge Task1. For subtask A, we designed a single model implemented with three parallel ResNets, which is named Trident ResNet. We have confirmed that this structure is beneficial when analyzing samples collected from minority or unseen devices, and confirmed 73.7% classification accuracy for the test split. For subtask B, we used the Inception module to build a Shallow Inception model that has fewer parameters than the CNN of the DCASE baseline system. Due to the sparse structure of the Inception module, we have enhanced the accuracy of the model up to 97.6%, while reducing the number of parameters.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | temporal cropping, mixup |
Features | log-mel energies, deltas, delta-deltas |
Classifier | ResNet; Snapshot |
Decision making | average; weighted score average |
Acoustic Scene Classification Using Efficientnet
Jakub Swiecicki
None, Warsaw, Poland
Swiecicki_NON_task1a_1 Swiecicki_NON_task1a_2 Swiecicki_NON_task1a_3 Swiecicki_NON_task1a_4
Acoustic Scene Classification Using Efficientnet
Jakub Swiecicki
None, Warsaw, Poland
Abstract
This technical report describes our solution to task 1b of the DCASE 2020 acoustic scene classification challenge. Our primary focus was to develop a single efficient model. We decided to concentrate on a single model in order to reflect the typical business situation. In our solution we chose to use log-mel spectrograms with deltas and delta-deltas features as a sound sample representation. We augmented the data with multiple techniques - mixup, specaugment, and spectrogram resizing. Our final model used EfficientNet [1] architecture.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, SpecAugment, random resize, random cropping |
Features | log-mel energies |
Classifier | EfficientNet |
Decision making | average |
Acoustic Scene Classification Using Fully Convolutional Neural Networks and Per-Channel Energy Normalization
Konstantinos Vilouras
Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece
Vilouras_AUTh_task1a_1 Vilouras_AUTh_task1a_2 Vilouras_AUTh_task1a_3
Acoustic Scene Classification Using Fully Convolutional Neural Networks and Per-Channel Energy Normalization
Konstantinos Vilouras
Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece
Abstract
This technical report describes our approach to Task 1 ''Acoustic Scene Classification'' of the DCASE 2020 challenge. For subtask A, we introduce per-channel energy normalization (PCEN) as an additional preprocessing step along with log-Mel spectrograms. We also propose two residual network architectures utilizing “Shake-Shake” regularization and the “Squeeze-and-Excitation” block, respectively. Our best submission (ensemble of 8 classifiers) outperforms the corresponding baseline system by 16.2% in terms of macro-average accuracy. For subtask B, we mainly focus on a low complexity, fully convolutional neural network architecture, which leads to 5% relative improvement over baseline accuracy.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, time stretching, frequency masking, shifting, clipping distortion |
Features | log-mel energies, PCEN |
Classifier | ResNet, ensemble |
Decision making | average |
Mel-Scaled Wavelet-Based Features for Sub-Task A and Texture Features for Sub-Task B of DCASE 2020 Task 1
Shefali Waldekar, Kishore Kumar A and Goutam Saha
Electronics and Electrical Communication Engineering Dept., Indian Institute of Technology Kharagpur, Kharagpur, India
Waldekar_IITKGP_task1a_1
Mel-Scaled Wavelet-Based Features for Sub-Task A and Texture Features for Sub-Task B of DCASE 2020 Task 1
Shefali Waldekar, Kishore Kumar A and Goutam Saha
Electronics and Electrical Communication Engineering Dept., Indian Institute of Technology Kharagpur, Kharagpur, India
Abstract
This report describes a submission for IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 for Task 1 (acoustic scene classification (ASC)), sub-task A (ASC with Multiple Devices) and sub-task B (LowComplexity ASC). The systems exploit time-frequency representation of audio to obtain the scene labels. The system for Task1A follows a simple pattern classification framework employing wavelet transform based mel-scaled features along with support vector machine (SVM) as classifier. Texture features, namely Local Binary Pattern (LBP) extracted from log of mel-band energies is used in a similar classification framework for Task 1B. The proposed systems outperform the deep-learning based baseline system with the development dataset provided for the respective sub-tasks.
System characteristics
Sampling rate | 44.1kHz |
Features | MFDWC |
Classifier | SVM |
Acoustic Scene Classification with Multiple Decision Schemes
Helin Wang, Dading Chong and Yuexian Zou
School of ECE, Peking University, Shenzhen, China
Helin_ADSPLAB_task1a_1 Helin_ADSPLAB_task1a_2 Helin_ADSPLAB_task1a_3 Helin_ADSPLAB_task1a_4
Acoustic Scene Classification with Multiple Decision Schemes
Helin Wang, Dading Chong and Yuexian Zou
School of ECE, Peking University, Shenzhen, China
Abstract
This technical report describes the ADSPLAB team’s submission for Task1 of DCASE2020 challenge. Our acoustic scene classifi- cation (ASC) system is based on the convolutional neural networks (CNN). Multiple decision schemes are proposed in our system, in- cluding the decision schemes in multiple representations, multiple frequency bands, and multiple temporal frames. The final system is the fusion of models with multiple decision schemes and mod- els pre-trained on AudioSet. The experimental results show that our system could achieve the accuracy of 84.5 %(official baseline: 54.1%) and 92.1% (official baseline: 87.3%) on the officially provided fold 1 evaluation dataset of Task1A and Task1B, respectively.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup |
Features | MFCC, log-mel energies, CQT, Gammatone |
Classifier | CNN, ensemble |
Decision making | average |
Acoustic Scene Classification with Device Mismatch Using Data Augmentation by Spectrum Correction
Peiyao Wang, Zhiyuan Cheng and Xinkang Xu
Speech Group, Hithink RoyalFlush Information Network Co.,Ltd, Hangzhou, China
Wang_RoyalFlush_task1a_1 Wang_RoyalFlush_task1a_2 Wang_RoyalFlush_task1a_3 Wang_RoyalFlush_task1a_4
Acoustic Scene Classification with Device Mismatch Using Data Augmentation by Spectrum Correction
Peiyao Wang, Zhiyuan Cheng and Xinkang Xu
Speech Group, Hithink RoyalFlush Information Network Co.,Ltd, Hangzhou, China
Abstract
This report describes the submissions by RoyalFlush of DCASE2020 task1a. Our aim is to find an audio scene classification system that is robust against multiple devices. We use logMel and its first and second derivatives as input features. We use the fully convolutional deep neural networks as classification model, and some strategies such as pre-Act, L2 regularization, dropout and feature normalization were applied. For improving the data imbalance caused by the different device, we tried to generate more training data by using device-related spectrum correction method
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup, spectrum correction |
Features | log-mel energies |
Classifier | CNN, ensemble |
Decision making | average |
Robust Feature Learning for Acoustic Scene Classification with Multiple Devices
Yuzhong Wu and Tan Lee
Electronic Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
Abstract
This technical report describes our submission for Task 1A of DCASE2020 challenge. The objective of the task is to identify acoustic scenes from audios recorded by various recording devices. In our ASC systems, we use sound-duration based decomposition method to decompose the time-frequency (TF) features into 3 components. Our observation shows that low frequency bins of the longduration component image are most easily affected by the change of recording devices. We use an AlexNet-like CNN model with the decomposed TF features to build ASC systems. To prevent the CNN classifier from over-fitting to the seen recording devices in the training dataset, we apply an auxiliary classifier on the embedding feature extracted from long-duration component image. We propose the regularized cross-entropy (RCE) loss to train the auxiliary classifier. Experiment results on development dataset shows that the use of regularized cross-entropy loss significantly improves the CNN accuracy on audios from unseen devices.
System characteristics
Sampling rate | 44.1kHz |
Data augmentation | mixup |
Features | wavelet filter-bank features |
Classifier | CNN |
Decision making | average |
Simple Convolutional Networks Attempting Acoustic Scene Classification Cross Devices
Chi Zhang1, Hanxin Zhu2 and Cheng Ting3
1Electronic Information Engineering, University of Electronic Science and Technology of China, Chengdu, China, 2Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China, 3University of Electronic Science and Technology of China, Chengdu, China
Zhang_UESTC_task1a_1 Zhang_UESTC_task1a_2 Zhang_UESTC_task1a_3
Simple Convolutional Networks Attempting Acoustic Scene Classification Cross Devices
Chi Zhang1, Hanxin Zhu2 and Cheng Ting3
1Electronic Information Engineering, University of Electronic Science and Technology of China, Chengdu, China, 2Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China, 3University of Electronic Science and Technology of China, Chengdu, China
Abstract
This technical report describes our submission for task1a (Acoustic Scene Classification with Multiple Devices) of the DCASE 2020 Challenge. The results of the DCASE 2019 show that the convolution neural networks (CNNs) can acquire excellent classification accuracies. Our work will still be based on the convolution neural networks. We consider two feature extraction methods that are provided by OpenL3 library. Finally, our method improves the accuracy of classification by 2% as compared to the baseline system.
System characteristics
Sampling rate | 44.1kHz |
Features | log-mel energies |
Embeddings | OpenL3 |
Classifier | MLP , CNN |
Decision making | maximum likelihood |