Task description
This subtask is concerned with the classification of data into three higher-level classes while focusing on low-complexity solutions. All submitted systems had to comply with the task rules by limiting the size of the acoustic model size to be under 500 KB (only classification related non-zero parameters counted). See model size calculation examples here.
The development set contains data from 10 cities. The total amount of audio in the development set is 40 hours. The evaluation set contains data from 12 cities (2 cities unseen in the development set). Evaluation data contains 30 hours of audio.
More detailed task description can be found in the task description page
Systems ranking
Submission information | Evaluation dataset | Development dataset | ||||||
---|---|---|---|---|---|---|---|---|
Rank | Submission label | Name |
Technical Report |
Official system rank |
Accuracy with 95% confidence interval (Evaluation dataset) |
Logloss (Evaluation dataset) | Accuracy (Development dataset) | Logloss (Development dataset) |
Chang_QTI_task1b_1 | QTI1 | Chang2020 | 12 | 95.0 (94.6 - 95.5) | 0.228 | 97.9 | 0.172 | |
Chang_QTI_task1b_2 | QTI2 | Chang2020 | 30 | 93.2 (92.9 - 93.5) | 0.232 | 97.8 | 0.159 | |
Chang_QTI_task1b_3 | QTI3 | Chang2020 | 15 | 94.8 (94.2 - 95.3) | 0.224 | 98.0 | 0.144 | |
Chang_QTI_task1b_4 | QTI4 | Chang2020 | 19 | 94.4 (93.8 - 95.1) | 0.237 | 97.8 | 0.170 | |
Dat_HCMUni_task1b_1 | HCM_Group | Dat2020 | 57 | 89.5 (89.5 - 89.5) | 0.648 | 94.5 | ||
Farrugia_IMT-Atlantique-BRAIn_task1b_1 | IMT_BRAINa | Pajusco2020 | 77 | 85.4 (84.9 - 85.8) | 0.379 | 87.6 | 0.360 | |
Farrugia_IMT-Atlantique-BRAIn_task1b_2 | IMT_BRAINb | Pajusco2020 | 48 | 90.6 (90.0 - 91.2) | 0.270 | 90.9 | 0.288 | |
Farrugia_IMT-Atlantique-BRAIn_task1b_3 | IMT_BRAINc | Pajusco2020 | 73 | 86.6 (85.9 - 87.3) | 0.384 | 87.6 | 0.380 | |
Farrugia_IMT-Atlantique-BRAIn_task1b_4 | IMT_BRAINd | Pajusco2020 | 66 | 88.4 (87.9 - 88.9) | 0.286 | 91.2 | 0.269 | |
Feng_TJU_task1b_1 | CNN-BDG | Feng2020 | 86 | 72.3 (73.9 - 70.7) | 1.728 | 91.8 | 0.469 | |
Feng_TJU_task1b_2 | CNN-BDG | Feng2020 | 83 | 81.9 (82.2 - 81.6) | 1.189 | 90.6 | 0.594 | |
Feng_TJU_task1b_3 | CNN-BDG | Feng2020 | 84 | 80.7 (81.0 - 80.4) | 1.302 | 90.6 | 0.644 | |
Feng_TJU_task1b_4 | CNN-BDG | Feng2020 | 85 | 79.9 (80.4 - 79.3) | 1.281 | 90.2 | 0.629 | |
DCASE2020 baseline | Baseline | 89.5 (88.8 - 90.2) | 0.401 | 88.0 | 0.481 | |||
Helin_ADSPLAB_task1b_1 | Helin1b | Wang2020_t1 | 42 | 91.6 (91.1 - 92.0) | 0.227 | 92.1 | 0.312 | |
Helin_ADSPLAB_task1b_2 | Helin2b | Wang2020_t1 | 41 | 91.6 (91.2 - 92.0) | 0.233 | 92.1 | 0.312 | |
Helin_ADSPLAB_task1b_3 | Helin3b | Wang2020_t1 | 43 | 91.6 (91.1 - 92.0) | 0.230 | 92.1 | 0.312 | |
Helin_ADSPLAB_task1b_4 | Helin4b | Wang2020_t1 | 44 | 91.3 (91.0 - 91.6) | 0.264 | 92.1 | 0.312 | |
Hu_GT_task1b_1 | Hu_GT_1b_1 | Hu2020 | 7 | 95.8 (95.5 - 96.1) | 0.357 | 96.3 | 0.349 | |
Hu_GT_task1b_2 | Hu_GT_1b_2 | Hu2020 | 10 | 95.5 (95.1 - 95.8) | 0.367 | |||
Hu_GT_task1b_3 | Hu_GT_1b_3 | Hu2020 | 3 | 96.0 (95.5 - 96.5) | 0.122 | 96.7 | ||
Hu_GT_task1b_4 | Hu_GT_1b_4 | Hu2020 | 5 | 95.8 (95.3 - 96.3) | 0.131 | 96.3 | ||
Kalinowski_SRPOL_task1b_4 | kalinowski | Kalinowski2020 | 31 | 93.1 (92.7 - 93.5) | 1.532 | 95.4 | 0.217 | |
Koutini_CPJKU_task1b_1 | decomposed | Koutini2020 | 16 | 94.7 (94.5 - 94.9) | 0.164 | 96.1 | 0.140 | |
Koutini_CPJKU_task1b_2 | RFD-prune | Koutini2020 | 1 | 96.5 (96.2 - 96.8) | 0.101 | 97.3 | 0.080 | |
Koutini_CPJKU_task1b_3 | RFDsmall | Koutini2020 | 8 | 95.7 (95.5 - 95.9) | 0.113 | 97.1 | 0.090 | |
Koutini_CPJKU_task1b_4 | RFensem | Koutini2020 | 2 | 96.2 (95.9 - 96.5) | 0.105 | 97.0 | 0.090 | |
Kowaleczko_SRPOL_task1b_3 | pkowdcase | Kalinowski2020 | 52 | 90.1 (89.6 - 90.7) | 0.356 | 92.8 | 0.256 | |
Kwiatkowska_SRPOL_task1b_1 | ens3-10mix | Kalinowski2020 | 36 | 92.6 (92.0 - 93.2) | 0.200 | 94.6 | 0.175 | |
Kwiatkowska_SRPOL_task1b_2 | ens3to10 | Kalinowski2020 | 27 | 93.5 (93.0 - 94.0) | 0.168 | 94.5 | 0.170 | |
LamPham_Kent_task1b_1 | LamPham | Pham2020 | 59 | 89.4 (89.2 - 89.7) | 0.332 | 93.0 | ||
LamPham_Kent_task1b_2 | LamPham | Pham2020 | 71 | 87.0 (86.1 - 87.8) | 0.349 | 91.9 | ||
LamPham_Kent_task1b_3 | LamPham | Pham2020 | 79 | 84.7 (85.0 - 84.5) | 0.402 | 90.5 | ||
Lee_CAU_task1b_1 | CAUET | Lee2020 | 47 | 90.7 (90.7 - 90.7) | 0.302 | 95.3 | 0.167 | |
Lee_CAU_task1b_2 | CAUET | Lee2020 | 23 | 93.9 (93.7 - 94.1) | 0.156 | 95.3 | 0.167 | |
Lee_CAU_task1b_3 | CAUET | Lee2020 | 46 | 91.1 (91.0 - 91.2) | 0.246 | 93.7 | 0.193 | |
Lee_CAU_task1b_4 | CAUET | Lee2020 | 45 | 91.2 (91.2 - 91.2) | 0.864 | 92.8 | 0.500 | |
Lopez-Meyer_IL_task1b_1 | INT8CNN | Lopez-Meyer2020_t1b | 50 | 90.4 (89.6 - 91.1) | 0.681 | 90.9 | 0.645 | |
Lopez-Meyer_IL_task1b_2 | PrunCNN | Lopez-Meyer2020_t1b | 53 | 90.1 (89.7 - 90.5) | 0.677 | 91.5 | 0.637 | |
Lopez-Meyer_IL_task1b_3 | KD-CNN | Lopez-Meyer2020_t1b | 49 | 90.5 (89.8 - 91.2) | 0.276 | 90.3 | 0.673 | |
Lopez-Meyer_IL_task1b_4 | GCC-CNN | Lopez-Meyer2020_t1b | 56 | 89.7 (88.8 - 90.5) | 0.983 | 91.2 | 0.510 | |
McDonnell_USA_task1b_1 | UniSA_1b1 | McDonnell2020 | 13 | 94.9 (94.9 - 95.0) | 0.135 | 97.1 | 0.094 | |
McDonnell_USA_task1b_2 | UniSA_1b2 | McDonnell2020 | 9 | 95.5 (95.3 - 95.7) | 0.118 | 97.1 | 0.094 | |
McDonnell_USA_task1b_3 | UniSA_1b3 | McDonnell2020 | 4 | 95.9 (95.7 - 96.1) | 0.117 | 97.1 | 0.094 | |
McDonnell_USA_task1b_4 | UniSA_1b4 | McDonnell2020 | 6 | 95.8 (95.6 - 96.0) | 0.119 | 97.1 | 0.094 | |
Monteiro_INRS_task1b_1 | MelCNN | Joao2020 | 69 | 87.4 (86.5 - 88.3) | 0.327 | |||
Naranjo-Alcazar_Vfy_task1b_1 | ASCCSSE | Naranjo-Alcazar2020_t1 | 24 | 93.6 (93.4 - 93.7) | 0.202 | 97.1 | 0.132 | |
Naranjo-Alcazar_Vfy_task1b_2 | ASCCSSE | Naranjo-Alcazar2020_t1 | 25 | 93.6 (93.4 - 93.8) | 0.190 | 97.0 | 0.104 | |
NguyenHongDuc_SU_task1b_1 | NHD_1B_1 | Nguyen_Hong_Duc2020 | 32 | 93.1 (92.6 - 93.5) | 0.215 | 92.4 | 0.230 | |
NguyenHongDuc_SU_task1b_2 | NHD_1B_2 | Nguyen_Hong_Duc2020 | 37 | 92.3 (91.9 - 92.6) | 0.214 | 92.3 | 0.230 | |
Ooi_NTU_task1b_1 | Ooi_model1 | Ooi2020 | 67 | 87.8 (87.1 - 88.6) | 0.337 | 89.4 | ||
Ooi_NTU_task1b_2 | Ooi_model2 | Ooi2020 | 70 | 87.3 (86.6 - 88.1) | 0.367 | 88.6 | ||
Ooi_NTU_task1b_3 | Ooi_model3 | Ooi2020 | 55 | 89.8 (89.0 - 90.5) | 0.257 | 91.5 | ||
Ooi_NTU_task1b_4 | Ooi_model4 | Ooi2020 | 54 | 89.8 (89.1 - 90.5) | 0.305 | 90.5 | ||
Paniagua_UPM_task1b_1 | Pan_UPM | Paniagua2020 | 60 | 89.4 (89.0 - 89.8) | 0.347 | 87.8 | ||
Patki_SELF_task1b_1 | PATKI | Patki2020 | 76 | 86.0 (85.8 - 86.3) | 1.372 | 88.7 | 0.000 | |
Patki_SELF_task1b_2 | PATKI | Patki2020 | 61 | 89.4 (89.0 - 89.7) | 0.951 | 88.9 | 0.000 | |
Patki_SELF_task1b_3 | PATKI | Patki2020 | 82 | 83.7 (81.8 - 85.7) | 1.837 | 87.0 | 0.170 | |
Phan_UIUC_task1b_1 | DD_1b_1 | Phan2020_t1 | 65 | 88.5 (87.8 - 89.2) | 0.319 | 89.5 | 0.289 | |
Phan_UIUC_task1b_2 | DD_1b_2 | Phan2020_t1 | 62 | 89.2 (88.7 - 89.8) | 0.283 | 89.5 | 0.292 | |
Phan_UIUC_task1b_3 | DD_1b_3 | Phan2020_t1 | 63 | 89.0 (88.7 - 89.3) | 0.301 | 90.3 | 0.254 | |
Phan_UIUC_task1b_4 | DD_1b_4 | Phan2020_t1 | 58 | 89.5 (88.9 - 90.0) | 0.282 | 90.4 | 0.275 | |
Sampathkumar_TUC_task1b_1 | AALNet-94 | Sampathkumar2020 | 68 | 87.5 (87.1 - 87.9) | 0.864 | 89.4 | 0.635 | |
Singh_IITMandi_task1b_1 | IITMandi | Singh2020 | 81 | 84.5 (83.5 - 85.6) | 0.418 | 84.9 | 0.422 | |
Singh_IITMandi_task1b_2 | IITMandi | Singh2020 | 80 | 84.7 (83.5 - 85.9) | 0.420 | 85.9 | 0.416 | |
Singh_IITMandi_task1b_3 | IITMandi | Singh2020 | 78 | 85.2 (84.6 - 85.8) | 0.402 | 86.8 | 0.399 | |
Singh_IITMandi_task1b_4 | IITMandi | Singh2020 | 75 | 86.4 (85.0 - 87.8) | 0.385 | 87.2 | 0.378 | |
Suh_ETRI_task1b_1 | Incep_Dev | Suh2020 | 29 | 93.3 (93.2 - 93.4) | 0.302 | 97.6 | 0.259 | |
Suh_ETRI_task1b_2 | Incep_Eval | Suh2020 | 18 | 94.6 (94.4 - 94.7) | 0.270 | 97.6 | 0.259 | |
Suh_ETRI_task1b_3 | Incep_Ensb | Suh2020 | 11 | 95.1 (94.9 - 95.2) | 0.277 | 97.5 | 0.271 | |
Suh_ETRI_task1b_4 | Incep_wEsb | Suh2020 | 17 | 94.6 (94.5 - 94.8) | 0.271 | 97.7 | 0.260 | |
Vilouras_AUTh_task1b_1 | VilFCN | Vilouras2020 | 40 | 91.8 (91.2 - 92.5) | 0.215 | 92.3 | 0.211 | |
Waldekar_IITKGP_task1b_1 | LogMBE-LBP | Waldekar2020 | 64 | 88.6 (88.2 - 89.1) | 7.923 | 90.0 | ||
Wu_CUHK_task1b_1 | CNN4Blocks | Wu2020_t1b | 22 | 94.2 (94.0 - 94.3) | 0.188 | 95.8 | ||
Wu_CUHK_task1b_2 | ensemble_2 | Wu2020_t1b | 21 | 94.2 (94.1 - 94.3) | 0.201 | 96.2 | ||
Wu_CUHK_task1b_3 | ensemble_3 | Wu2020_t1b | 20 | 94.3 (94.3 - 94.4) | 0.185 | 96.3 | ||
Wu_CUHK_task1b_4 | diff_feat2 | Wu2020_t1b | 14 | 94.9 (94.7 - 95.1) | 0.218 | 96.5 | ||
Yang_UESTC_task1b_1 | CNNs | Haocong2020 | 38 | 92.1 (91.7 - 92.4) | 0.272 | 94.9 | 0.237 | |
Yang_UESTC_task1b_2 | CNNs_PAE | Haocong2020 | 28 | 93.5 (93.3 - 93.7) | 0.247 | 95.9 | 0.200 | |
Yang_UESTC_task1b_3 | CNNs_Cyc | Haocong2020 | 26 | 93.5 (93.3 - 93.8) | 0.228 | 96.0 | 0.187 | |
Yang_UESTC_task1b_4 | CNNs_4CV | Haocong2020 | 51 | 90.4 (88.7 - 92.0) | 0.327 | 92.0 | 0.305 | |
Zhang_BUPT_task1b_1 | BUPTSystem | Zhang2020 | 39 | 92.0 (91.6 - 92.4) | 0.346 | 93.5 | 0.481 | |
Zhang_BUPT_task1b_2 | BUPTSystem | Zhang2020 | 35 | 92.7 (92.1 - 93.2) | 0.334 | 93.5 | 0.481 | |
Zhang_BUPT_task1b_3 | BUPTSystem | Zhang2020 | 34 | 92.9 (92.3 - 93.5) | 0.316 | 93.5 | 0.481 | |
Zhang_BUPT_task1b_4 | BUPTSystem | Zhang2020 | 33 | 93.0 (92.4 - 93.6) | 0.316 | 93.5 | 0.481 | |
Zhao_JNU_task1b_1 | DD-CNN | Zhao2020 | 74 | 86.6 (86.5 - 86.7) | 0.867 | 92.0 | 0.257 | |
Zhao_JNU_task1b_2 | DD-CNN | Zhao2020 | 72 | 86.9 (86.8 - 87.0) | 0.873 | 91.1 | 0.343 |
Teams ranking
Table including only the best performing system per submitting team.
Submission information | Evaluation dataset | Development dataset | |||||||
---|---|---|---|---|---|---|---|---|---|
Rank | Submission label | Name |
Technical Report |
Official system rank |
Team rank |
Accuracy with 95% confidence interval (Evaluation dataset) |
Logloss (Evaluation dataset) | Accuracy (Development dataset) | Logloss (Development dataset) |
Chang_QTI_task1b_1 | QTI1 | Chang2020 | 12 | 5 | 95.0 (94.6 - 95.5) | 0.228 | 97.9 | 0.172 | |
Dat_HCMUni_task1b_1 | HCM_Group | Dat2020 | 57 | 20 | 89.5 (89.5 - 89.5) | 0.648 | 94.5 | ||
Farrugia_IMT-Atlantique-BRAIn_task1b_2 | IMT_BRAINb | Pajusco2020 | 48 | 16 | 90.6 (90.0 - 91.2) | 0.270 | 90.9 | 0.288 | |
Feng_TJU_task1b_2 | CNN-BDG | Feng2020 | 83 | 30 | 81.9 (82.2 - 81.6) | 1.189 | 90.6 | 0.594 | |
DCASE2020 baseline | Baseline | 89.5 (88.8 - 90.2) | 0.401 | 88.0 | 0.481 | ||||
Helin_ADSPLAB_task1b_2 | Helin2b | Wang2020_t1 | 41 | 15 | 91.6 (91.2 - 92.0) | 0.233 | 92.1 | 0.312 | |
Hu_GT_task1b_3 | Hu_GT_1b_3 | Hu2020 | 3 | 2 | 96.0 (95.5 - 96.5) | 0.122 | 96.7 | ||
Kalinowski_SRPOL_task1b_4 | kalinowski | Kalinowski2020 | 31 | 11 | 93.1 (92.7 - 93.5) | 1.532 | 95.4 | 0.217 | |
Koutini_CPJKU_task1b_2 | RFD-prune | Koutini2020 | 1 | 1 | 96.5 (96.2 - 96.8) | 0.101 | 97.3 | 0.080 | |
Kowaleczko_SRPOL_task1b_3 | pkowdcase | Kalinowski2020 | 52 | 18 | 90.1 (89.6 - 90.7) | 0.356 | 92.8 | 0.256 | |
Kwiatkowska_SRPOL_task1b_2 | ens3to10 | Kalinowski2020 | 27 | 10 | 93.5 (93.0 - 94.0) | 0.168 | 94.5 | 0.170 | |
LamPham_Kent_task1b_1 | LamPham | Pham2020 | 59 | 22 | 89.4 (89.2 - 89.7) | 0.332 | 93.0 | ||
Lee_CAU_task1b_2 | CAUET | Lee2020 | 23 | 7 | 93.9 (93.7 - 94.1) | 0.156 | 95.3 | 0.167 | |
Lopez-Meyer_IL_task1b_3 | KD-CNN | Lopez-Meyer2020_t1b | 49 | 17 | 90.5 (89.8 - 91.2) | 0.276 | 90.3 | 0.673 | |
McDonnell_USA_task1b_3 | UniSA_1b3 | McDonnell2020 | 4 | 3 | 95.9 (95.7 - 96.1) | 0.117 | 97.1 | 0.094 | |
Monteiro_INRS_task1b_1 | MelCNN | Joao2020 | 69 | 27 | 87.4 (86.5 - 88.3) | 0.327 | |||
Naranjo-Alcazar_Vfy_task1b_1 | ASCCSSE | Naranjo-Alcazar2020_t1 | 24 | 8 | 93.6 (93.4 - 93.7) | 0.202 | 97.1 | 0.132 | |
NguyenHongDuc_SU_task1b_1 | NHD_1B_1 | Nguyen_Hong_Duc2020 | 32 | 12 | 93.1 (92.6 - 93.5) | 0.215 | 92.4 | 0.230 | |
Ooi_NTU_task1b_4 | Ooi_model4 | Ooi2020 | 54 | 19 | 89.8 (89.1 - 90.5) | 0.305 | 90.5 | ||
Paniagua_UPM_task1b_1 | Pan_UPM | Paniagua2020 | 60 | 23 | 89.4 (89.0 - 89.8) | 0.347 | 87.8 | ||
Patki_SELF_task1b_2 | PATKI | Patki2020 | 61 | 24 | 89.4 (89.0 - 89.7) | 0.951 | 88.9 | 0.000 | |
Phan_UIUC_task1b_4 | DD_1b_4 | Phan2020_t1 | 58 | 21 | 89.5 (88.9 - 90.0) | 0.282 | 90.4 | 0.275 | |
Sampathkumar_TUC_task1b_1 | AALNet-94 | Sampathkumar2020 | 68 | 26 | 87.5 (87.1 - 87.9) | 0.864 | 89.4 | 0.635 | |
Singh_IITMandi_task1b_4 | IITMandi | Singh2020 | 75 | 29 | 86.4 (85.0 - 87.8) | 0.385 | 87.2 | 0.378 | |
Suh_ETRI_task1b_3 | Incep_Ensb | Suh2020 | 11 | 4 | 95.1 (94.9 - 95.2) | 0.277 | 97.5 | 0.271 | |
Vilouras_AUTh_task1b_1 | VilFCN | Vilouras2020 | 40 | 14 | 91.8 (91.2 - 92.5) | 0.215 | 92.3 | 0.211 | |
Waldekar_IITKGP_task1b_1 | LogMBE-LBP | Waldekar2020 | 64 | 25 | 88.6 (88.2 - 89.1) | 7.923 | 90.0 | ||
Wu_CUHK_task1b_4 | diff_feat2 | Wu2020_t1b | 14 | 6 | 94.9 (94.7 - 95.1) | 0.218 | 96.5 | ||
Yang_UESTC_task1b_3 | CNNs_Cyc | Haocong2020 | 26 | 9 | 93.5 (93.3 - 93.8) | 0.228 | 96.0 | 0.187 | |
Zhang_BUPT_task1b_4 | BUPTSystem | Zhang2020 | 33 | 13 | 93.0 (92.4 - 93.6) | 0.316 | 93.5 | 0.481 | |
Zhao_JNU_task1b_2 | DD-CNN | Zhao2020 | 72 | 28 | 86.9 (86.8 - 87.0) | 0.873 | 91.1 | 0.343 |
System complexity
Submission information | Evaluation dataset | Acoustic model | System | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Rank | Submission label |
Technical Report |
Official system rank |
Accuracy (Eval) | Logloss (Eval) | Parameters |
Non-zero parameters |
Sparsity |
Size (KB) * |
Complexity management |
Chang_QTI_task1b_1 | Chang2020 | 12 | 95.0 | 0.228 | 601866 | 245591 | 0.5919506999896986 | 491.2 | sparsity | |
Chang_QTI_task1b_2 | Chang2020 | 30 | 93.2 | 0.232 | 601866 | 245591 | 0.5919506999896986 | 491.2 | sparsity | |
Chang_QTI_task1b_3 | Chang2020 | 15 | 94.8 | 0.224 | 601866 | 245591 | 0.5919506999896986 | 491.2 | sparsity | |
Chang_QTI_task1b_4 | Chang2020 | 19 | 94.4 | 0.237 | 601866 | 245591 | 0.5919506999896986 | 491.2 | sparsity | |
Dat_HCMUni_task1b_1 | Dat2020 | 57 | 89.5 | 0.648 | 111366 | 111366 | 0.0 | 445.0 | ||
Farrugia_IMT-Atlantique-BRAIn_task1b_1 | Pajusco2020 | 77 | 85.4 | 0.379 | 13632 | 12160 | 0.107981220657277 | 23.8 | float16, quantization, pruning | |
Farrugia_IMT-Atlantique-BRAIn_task1b_2 | Pajusco2020 | 48 | 90.6 | 0.270 | 29888 | 29888 | 0.0 | 58.4 | float16, quantization | |
Farrugia_IMT-Atlantique-BRAIn_task1b_3 | Pajusco2020 | 73 | 86.6 | 0.384 | 398400 | 130730 | 0.6718624497991967 | 255.3 | float16, quantization, pruning | |
Farrugia_IMT-Atlantique-BRAIn_task1b_4 | Pajusco2020 | 66 | 88.4 | 0.286 | 373696 | 238896 | 0.3607210138722384 | 466.6 | float16, quantization, pruning | |
Feng_TJU_task1b_1 | Feng2020 | 86 | 72.3 | 1.728 | 35059 | 35059 | 0.0 | 136.9 | optimize the convolution operation and the network structure | |
Feng_TJU_task1b_2 | Feng2020 | 83 | 81.9 | 1.189 | 60403 | 60403 | 0.0 | 235.9 | optimize the convolution operation and the network structure | |
Feng_TJU_task1b_3 | Feng2020 | 84 | 80.7 | 1.302 | 85747 | 85747 | 0.0 | 334.9 | optimize the convolution operation and the network structure | |
Feng_TJU_task1b_4 | Feng2020 | 85 | 79.9 | 1.281 | 111091 | 111091 | 0.0 | 433.9 | optimize the convolution operation and the network structure | |
DCASE2020 baseline | 89.5 | 0.401 | 115219 | 115219 | 0.0 | 450.1 | ||||
Helin_ADSPLAB_task1b_1 | Wang2020_t1 | 42 | 91.6 | 0.227 | 123576 | 123576 | 0.0 | 490.8 | ||
Helin_ADSPLAB_task1b_2 | Wang2020_t1 | 41 | 91.6 | 0.233 | 123576 | 123576 | 0.0 | 490.8 | ||
Helin_ADSPLAB_task1b_3 | Wang2020_t1 | 43 | 91.6 | 0.230 | 123576 | 123576 | 0.0 | 490.8 | ||
Helin_ADSPLAB_task1b_4 | Wang2020_t1 | 44 | 91.3 | 0.264 | 123576 | 123576 | 0.0 | 490.8 | ||
Hu_GT_task1b_1 | Hu2020 | 7 | 95.8 | 0.357 | 94028 | 94028 | 0.0 | 375.0 | int8, quantization | |
Hu_GT_task1b_2 | Hu2020 | 10 | 95.5 | 0.367 | 122900 | 122900 | 0.0 | 490.0 | int8, quantization | |
Hu_GT_task1b_3 | Hu2020 | 3 | 96.0 | 0.122 | 122900 | 122900 | 0.0 | 490.0 | int8, quantization | |
Hu_GT_task1b_4 | Hu2020 | 5 | 95.8 | 0.131 | 125121 | 125121 | 0.0 | 499.0 | int8, quantization | |
Kalinowski_SRPOL_task1b_4 | Kalinowski2020 | 31 | 93.1 | 1.532 | 110899 | 110899 | 0.0 | 433.2 | ||
Koutini_CPJKU_task1b_1 | Koutini2020 | 16 | 94.7 | 0.164 | 17520 | 17520 | 0.0 | 34.2 | float16, conv layers decomposition | |
Koutini_CPJKU_task1b_2 | Koutini2020 | 1 | 96.5 | 0.101 | 345990 | 247562 | 0.28448221046851063 | 483.5 | pruning, float16 | |
Koutini_CPJKU_task1b_3 | Koutini2020 | 8 | 95.7 | 0.113 | 242592 | 242592 | 0.0 | 473.8 | float16, smaller width/depth | |
Koutini_CPJKU_task1b_4 | Koutini2020 | 2 | 96.2 | 0.105 | 556480 | 249386 | 0.5518509200690052 | 487.1 | float16, smaller width/depth | |
Kowaleczko_SRPOL_task1b_3 | Kalinowski2020 | 52 | 90.1 | 0.356 | 110899 | 110899 | 0.0 | 433.2 | using rectangular convolution kernels | |
Kwiatkowska_SRPOL_task1b_1 | Kalinowski2020 | 36 | 92.6 | 0.200 | 107494 | 107494 | 0.0 | 421.0 | constraints-aware modelling | |
Kwiatkowska_SRPOL_task1b_2 | Kalinowski2020 | 27 | 93.5 | 0.168 | 107494 | 107494 | 0.0 | 421.0 | constraints-aware modelling | |
LamPham_Kent_task1b_1 | Pham2020 | 59 | 89.4 | 0.332 | 61636 | 61636 | 0.0 | 246.6 | ||
LamPham_Kent_task1b_2 | Pham2020 | 71 | 87.0 | 0.349 | 61636 | 61636 | 0.0 | 61.6 | quantization | |
LamPham_Kent_task1b_3 | Pham2020 | 79 | 84.7 | 0.402 | 61636 | 30818 | 0.5 | 123.2 | pruning | |
Lee_CAU_task1b_1 | Lee2020 | 47 | 90.7 | 0.302 | 126979 | 126528 | 0.003551768402649258 | 494.2 | ||
Lee_CAU_task1b_2 | Lee2020 | 23 | 93.9 | 0.156 | 126979 | 126528 | 0.003551768402649258 | 494.2 | ||
Lee_CAU_task1b_3 | Lee2020 | 46 | 91.1 | 0.246 | 125827 | 125376 | 0.003584286361432709 | 489.8 | ||
Lee_CAU_task1b_4 | Lee2020 | 45 | 91.2 | 0.864 | 127539 | 126864 | 0.005292498765083642 | 495.6 | ||
Lopez-Meyer_IL_task1b_1 | Lopez-Meyer2020_t1b | 50 | 90.4 | 0.681 | 317038 | 317038 | 0.0 | 309.6 | quantization | |
Lopez-Meyer_IL_task1b_2 | Lopez-Meyer2020_t1b | 53 | 90.1 | 0.677 | 317038 | 255740 | 0.1933459080614942 | 499.5 | pruning, quantization | |
Lopez-Meyer_IL_task1b_3 | Lopez-Meyer2020_t1b | 49 | 90.5 | 0.276 | 252712 | 252712 | 0.0 | 493.6 | knowledge distillation, quantization | |
Lopez-Meyer_IL_task1b_4 | Lopez-Meyer2020_t1b | 56 | 89.7 | 0.983 | 252491 | 252491 | 0.0 | 493.1 | quantization | |
McDonnell_USA_task1b_1 | McDonnell2020 | 13 | 94.9 | 0.135 | 3987000 | 3987000 | 0.0 | 486.7 | 1-bit quantization | |
McDonnell_USA_task1b_2 | McDonnell2020 | 9 | 95.5 | 0.118 | 3987000 | 3987000 | 0.0 | 486.7 | 1-bit quantization | |
McDonnell_USA_task1b_3 | McDonnell2020 | 4 | 95.9 | 0.117 | 3987000 | 3987000 | 0.0 | 486.7 | 1-bit quantization | |
McDonnell_USA_task1b_4 | McDonnell2020 | 6 | 95.8 | 0.119 | 3987000 | 3987000 | 0.0 | 486.7 | 1-bit quantization | |
Monteiro_INRS_task1b_1 | Joao2020 | 69 | 87.4 | 0.327 | 54468 | 54468 | 0.0 | 218.8 | ||
Naranjo-Alcazar_Vfy_task1b_1 | Naranjo-Alcazar2020_t1 | 24 | 93.6 | 0.202 | 127055 | 127055 | 0.0 | 496.3 | ||
Naranjo-Alcazar_Vfy_task1b_2 | Naranjo-Alcazar2020_t1 | 25 | 93.6 | 0.190 | 126927 | 126927 | 0.0 | 495.8 | ||
NguyenHongDuc_SU_task1b_1 | Nguyen_Hong_Duc2020 | 32 | 93.1 | 0.215 | 122999 | 122493 | 0.004113854584183563 | 478.5 | ||
NguyenHongDuc_SU_task1b_2 | Nguyen_Hong_Duc2020 | 37 | 92.3 | 0.214 | 77028 | 77028 | 0.0 | 300.9 | ||
Ooi_NTU_task1b_1 | Ooi2020 | 67 | 87.8 | 0.337 | 80839 | 17115 | 0.788282883261792 | 66.9 | sparsity | |
Ooi_NTU_task1b_2 | Ooi2020 | 70 | 87.3 | 0.367 | 167571 | 34181 | 0.7960207911870192 | 133.5 | sparsity | |
Ooi_NTU_task1b_3 | Ooi2020 | 55 | 89.8 | 0.257 | 571766 | 119756 | 0.7905506798235642 | 467.8 | sparsity | |
Ooi_NTU_task1b_4 | Ooi2020 | 54 | 89.8 | 0.305 | 571766 | 119756 | 0.7905506798235642 | 467.8 | sparsity | |
Paniagua_UPM_task1b_1 | Paniagua2020 | 60 | 89.4 | 0.347 | 13197 | 13197 | 0.0 | 103.1 | ||
Patki_SELF_task1b_1 | Patki2020 | 76 | 86.0 | 1.372 | 9010 | 9010 | 0.0 | 17.5 | ||
Patki_SELF_task1b_2 | Patki2020 | 61 | 89.4 | 0.951 | 18020 | 18020 | 0.0 | 26.3 | ||
Patki_SELF_task1b_3 | Patki2020 | 82 | 83.7 | 1.837 | 9010 | 9010 | 0.0 | 8.8 | ||
Phan_UIUC_task1b_1 | Phan2020_t1 | 65 | 88.5 | 0.319 | 6979 | 6944 | 0.005015045135406182 | 27.3 | ||
Phan_UIUC_task1b_2 | Phan2020_t1 | 62 | 89.2 | 0.283 | 6979 | 6944 | 0.005015045135406182 | 27.3 | ||
Phan_UIUC_task1b_3 | Phan2020_t1 | 63 | 89.0 | 0.301 | 17859 | 17792 | 0.003751609832577385 | 69.8 | ||
Phan_UIUC_task1b_4 | Phan2020_t1 | 58 | 89.5 | 0.282 | 17859 | 17792 | 0.003751609832577385 | 69.8 | ||
Sampathkumar_TUC_task1b_1 | Sampathkumar2020 | 68 | 87.5 | 0.864 | 123487 | 122387 | 0.008907820256383259 | 489.4 | ||
Singh_IITMandi_task1b_1 | Singh2020 | 81 | 84.5 | 0.418 | 52467 | 52467 | 0.0 | 204.9 | ||
Singh_IITMandi_task1b_2 | Singh2020 | 80 | 84.7 | 0.420 | 18611 | 18611 | 0.0 | 72.7 | ||
Singh_IITMandi_task1b_3 | Singh2020 | 78 | 85.2 | 0.402 | 19763 | 19763 | 0.0 | 77.2 | ||
Singh_IITMandi_task1b_4 | Singh2020 | 75 | 86.4 | 0.385 | 70947 | 70947 | 0.0 | 277.1 | ||
Suh_ETRI_task1b_1 | Suh2020 | 29 | 93.3 | 0.302 | 103778 | 103778 | 0.0 | 405.4 | ||
Suh_ETRI_task1b_2 | Suh2020 | 18 | 94.6 | 0.270 | 103778 | 103778 | 0.0 | 405.4 | ||
Suh_ETRI_task1b_3 | Suh2020 | 11 | 95.1 | 0.277 | 207556 | 207556 | 0.0 | 413.0 | float16 | |
Suh_ETRI_task1b_4 | Suh2020 | 17 | 94.6 | 0.271 | 207556 | 207556 | 0.0 | 413.0 | float16 | |
Vilouras_AUTh_task1b_1 | Vilouras2020 | 40 | 91.8 | 0.215 | 127467 | 127021 | 0.0034989448249350685 | 496.2 | ||
Waldekar_IITKGP_task1b_1 | Waldekar2020 | 64 | 88.6 | 7.923 | 10092 | 10092 | 0.0 | 40.0 | ||
Wu_CUHK_task1b_1 | Wu2020_t1b | 22 | 94.2 | 0.188 | 76611 | 76611 | 0.0 | 299.3 | ||
Wu_CUHK_task1b_2 | Wu2020_t1b | 21 | 94.2 | 0.201 | 187917 | 187917 | 0.0 | 367.0 | float16 | |
Wu_CUHK_task1b_3 | Wu2020_t1b | 20 | 94.3 | 0.185 | 229883 | 229883 | 0.0 | 449.0 | float16 | |
Wu_CUHK_task1b_4 | Wu2020_t1b | 14 | 94.9 | 0.218 | 153222 | 153222 | 0.0 | 299.3 | float16 | |
Yang_UESTC_task1b_1 | Haocong2020 | 38 | 92.1 | 0.272 | 119382 | 119382 | 0.0 | 258.0 | float16 | |
Yang_UESTC_task1b_2 | Haocong2020 | 28 | 93.5 | 0.247 | 119382 | 119382 | 0.0 | 258.0 | float16 | |
Yang_UESTC_task1b_3 | Haocong2020 | 26 | 93.5 | 0.228 | 119382 | 119382 | 0.0 | 258.0 | float16 | |
Yang_UESTC_task1b_4 | Haocong2020 | 51 | 90.4 | 0.327 | 182184 | 182184 | 0.0 | 448.0 | float16 | |
Zhang_BUPT_task1b_1 | Zhang2020 | 39 | 92.0 | 0.346 | 83974 | 83974 | 0.0 | 83.4 | 8-bit quantization | |
Zhang_BUPT_task1b_2 | Zhang2020 | 35 | 92.7 | 0.334 | 83974 | 83974 | 0.0 | 83.4 | 8-bit quantization | |
Zhang_BUPT_task1b_3 | Zhang2020 | 34 | 92.9 | 0.316 | 83974 | 83974 | 0.0 | 83.4 | 8-bit quantization | |
Zhang_BUPT_task1b_4 | Zhang2020 | 33 | 93.0 | 0.316 | 83974 | 83974 | 0.0 | 83.4 | 8-bit quantization | |
Zhao_JNU_task1b_1 | Zhao2020 | 74 | 86.6 | 0.867 | 127491 | 127491 | 0.0 | 498.0 | disout | |
Zhao_JNU_task1b_2 | Zhao2020 | 72 | 86.9 | 0.873 | 127491 | 127491 | 0.0 | 498.0 | disout |
*) Model size is calculated accordingly to the task specific rules, and will differ from a real model storage size. See model size calculation examples here.
Generalization performance
All results with evaluation dataset.
Submission information | Overall | Cities | ||||
---|---|---|---|---|---|---|
Rank | Submission label |
Technical Report |
Official system rank |
Accuracy (Evaluation dataset) |
Accuracy / unseen cities (Evaluation dataset) |
Accuracy / seen cities (Evaluation dataset) |
Chang_QTI_task1b_1 | Chang2020 | 12 | 95.0 | 91.3 | 95.8 | |
Chang_QTI_task1b_2 | Chang2020 | 30 | 93.2 | 91.8 | 93.5 | |
Chang_QTI_task1b_3 | Chang2020 | 15 | 94.8 | 91.6 | 95.4 | |
Chang_QTI_task1b_4 | Chang2020 | 19 | 94.4 | 90.8 | 95.2 | |
Dat_HCMUni_task1b_1 | Dat2020 | 57 | 89.5 | 88.0 | 89.8 | |
Farrugia_IMT-Atlantique-BRAIn_task1b_1 | Pajusco2020 | 77 | 85.4 | 79.8 | 86.5 | |
Farrugia_IMT-Atlantique-BRAIn_task1b_2 | Pajusco2020 | 48 | 90.6 | 87.3 | 91.3 | |
Farrugia_IMT-Atlantique-BRAIn_task1b_3 | Pajusco2020 | 73 | 86.6 | 83.3 | 87.3 | |
Farrugia_IMT-Atlantique-BRAIn_task1b_4 | Pajusco2020 | 66 | 88.4 | 81.4 | 89.8 | |
Feng_TJU_task1b_1 | Feng2020 | 86 | 72.3 | 74.9 | 71.8 | |
Feng_TJU_task1b_2 | Feng2020 | 83 | 81.9 | 82.4 | 81.8 | |
Feng_TJU_task1b_3 | Feng2020 | 84 | 80.7 | 79.1 | 81.0 | |
Feng_TJU_task1b_4 | Feng2020 | 85 | 79.9 | 77.8 | 80.3 | |
DCASE2020 baseline | 89.5 | 84.9 | 90.4 | |||
Helin_ADSPLAB_task1b_1 | Wang2020_t1 | 42 | 91.6 | 85.9 | 92.7 | |
Helin_ADSPLAB_task1b_2 | Wang2020_t1 | 41 | 91.6 | 86.1 | 92.7 | |
Helin_ADSPLAB_task1b_3 | Wang2020_t1 | 43 | 91.6 | 86.1 | 92.6 | |
Helin_ADSPLAB_task1b_4 | Wang2020_t1 | 44 | 91.3 | 85.9 | 92.4 | |
Hu_GT_task1b_1 | Hu2020 | 7 | 95.8 | 93.3 | 96.3 | |
Hu_GT_task1b_2 | Hu2020 | 10 | 95.5 | 92.1 | 96.1 | |
Hu_GT_task1b_3 | Hu2020 | 3 | 96.0 | 93.0 | 96.7 | |
Hu_GT_task1b_4 | Hu2020 | 5 | 95.8 | 93.5 | 96.3 | |
Kalinowski_SRPOL_task1b_4 | Kalinowski2020 | 31 | 93.1 | 90.1 | 93.7 | |
Koutini_CPJKU_task1b_1 | Koutini2020 | 16 | 94.7 | 91.1 | 95.4 | |
Koutini_CPJKU_task1b_2 | Koutini2020 | 1 | 96.5 | 95.3 | 96.7 | |
Koutini_CPJKU_task1b_3 | Koutini2020 | 8 | 95.7 | 94.7 | 95.9 | |
Koutini_CPJKU_task1b_4 | Koutini2020 | 2 | 96.2 | 94.4 | 96.6 | |
Kowaleczko_SRPOL_task1b_3 | Kalinowski2020 | 52 | 90.1 | 86.8 | 90.8 | |
Kwiatkowska_SRPOL_task1b_1 | Kalinowski2020 | 36 | 92.6 | 88.7 | 93.4 | |
Kwiatkowska_SRPOL_task1b_2 | Kalinowski2020 | 27 | 93.5 | 88.9 | 94.4 | |
LamPham_Kent_task1b_1 | Pham2020 | 59 | 89.4 | 85.5 | 90.2 | |
LamPham_Kent_task1b_2 | Pham2020 | 71 | 87.0 | 84.6 | 87.4 | |
LamPham_Kent_task1b_3 | Pham2020 | 79 | 84.7 | 82.1 | 85.3 | |
Lee_CAU_task1b_1 | Lee2020 | 47 | 90.7 | 87.4 | 91.3 | |
Lee_CAU_task1b_2 | Lee2020 | 23 | 93.9 | 90.0 | 94.6 | |
Lee_CAU_task1b_3 | Lee2020 | 46 | 91.1 | 87.3 | 91.8 | |
Lee_CAU_task1b_4 | Lee2020 | 45 | 91.2 | 87.5 | 91.9 | |
Lopez-Meyer_IL_task1b_1 | Lopez-Meyer2020_t1b | 50 | 90.4 | 88.2 | 90.8 | |
Lopez-Meyer_IL_task1b_2 | Lopez-Meyer2020_t1b | 53 | 90.1 | 87.1 | 90.7 | |
Lopez-Meyer_IL_task1b_3 | Lopez-Meyer2020_t1b | 49 | 90.5 | 85.2 | 91.6 | |
Lopez-Meyer_IL_task1b_4 | Lopez-Meyer2020_t1b | 56 | 89.7 | 89.1 | 89.8 | |
McDonnell_USA_task1b_1 | McDonnell2020 | 13 | 94.9 | 93.8 | 95.1 | |
McDonnell_USA_task1b_2 | McDonnell2020 | 9 | 95.5 | 92.9 | 96.0 | |
McDonnell_USA_task1b_3 | McDonnell2020 | 4 | 95.9 | 94.7 | 96.2 | |
McDonnell_USA_task1b_4 | McDonnell2020 | 6 | 95.8 | 93.8 | 96.2 | |
Monteiro_INRS_task1b_1 | Joao2020 | 69 | 87.4 | 83.9 | 88.1 | |
Naranjo-Alcazar_Vfy_task1b_1 | Naranjo-Alcazar2020_t1 | 24 | 93.6 | 90.8 | 94.1 | |
Naranjo-Alcazar_Vfy_task1b_2 | Naranjo-Alcazar2020_t1 | 25 | 93.6 | 91.3 | 94.0 | |
NguyenHongDuc_SU_task1b_1 | Nguyen_Hong_Duc2020 | 32 | 93.1 | 90.0 | 93.7 | |
NguyenHongDuc_SU_task1b_2 | Nguyen_Hong_Duc2020 | 37 | 92.3 | 90.0 | 92.7 | |
Ooi_NTU_task1b_1 | Ooi2020 | 67 | 87.8 | 81.1 | 89.1 | |
Ooi_NTU_task1b_2 | Ooi2020 | 70 | 87.3 | 85.7 | 87.7 | |
Ooi_NTU_task1b_3 | Ooi2020 | 55 | 89.8 | 86.0 | 90.5 | |
Ooi_NTU_task1b_4 | Ooi2020 | 54 | 89.8 | 87.7 | 90.2 | |
Paniagua_UPM_task1b_1 | Paniagua2020 | 60 | 89.4 | 89.4 | 89.4 | |
Patki_SELF_task1b_1 | Patki2020 | 76 | 86.0 | 89.7 | 85.3 | |
Patki_SELF_task1b_2 | Patki2020 | 61 | 89.4 | 89.7 | 89.3 | |
Patki_SELF_task1b_3 | Patki2020 | 82 | 83.7 | 84.9 | 83.5 | |
Phan_UIUC_task1b_1 | Phan2020_t1 | 65 | 88.5 | 84.2 | 89.4 | |
Phan_UIUC_task1b_2 | Phan2020_t1 | 62 | 89.2 | 86.8 | 89.7 | |
Phan_UIUC_task1b_3 | Phan2020_t1 | 63 | 89.0 | 85.4 | 89.7 | |
Phan_UIUC_task1b_4 | Phan2020_t1 | 58 | 89.5 | 85.4 | 90.3 | |
Sampathkumar_TUC_task1b_1 | Sampathkumar2020 | 68 | 87.5 | 85.7 | 87.8 | |
Singh_IITMandi_task1b_1 | Singh2020 | 81 | 84.5 | 81.8 | 85.1 | |
Singh_IITMandi_task1b_2 | Singh2020 | 80 | 84.7 | 81.1 | 85.4 | |
Singh_IITMandi_task1b_3 | Singh2020 | 78 | 85.2 | 80.8 | 86.1 | |
Singh_IITMandi_task1b_4 | Singh2020 | 75 | 86.4 | 82.8 | 87.1 | |
Suh_ETRI_task1b_1 | Suh2020 | 29 | 93.3 | 89.9 | 94.0 | |
Suh_ETRI_task1b_2 | Suh2020 | 18 | 94.6 | 91.6 | 95.2 | |
Suh_ETRI_task1b_3 | Suh2020 | 11 | 95.1 | 92.8 | 95.5 | |
Suh_ETRI_task1b_4 | Suh2020 | 17 | 94.6 | 91.6 | 95.2 | |
Vilouras_AUTh_task1b_1 | Vilouras2020 | 40 | 91.8 | 89.6 | 92.3 | |
Waldekar_IITKGP_task1b_1 | Waldekar2020 | 64 | 88.6 | 84.6 | 89.4 | |
Wu_CUHK_task1b_1 | Wu2020_t1b | 22 | 94.2 | 92.9 | 94.4 | |
Wu_CUHK_task1b_2 | Wu2020_t1b | 21 | 94.2 | 93.5 | 94.3 | |
Wu_CUHK_task1b_3 | Wu2020_t1b | 20 | 94.3 | 93.1 | 94.5 | |
Wu_CUHK_task1b_4 | Wu2020_t1b | 14 | 94.9 | 94.1 | 95.1 | |
Yang_UESTC_task1b_1 | Haocong2020 | 38 | 92.1 | 86.1 | 93.2 | |
Yang_UESTC_task1b_2 | Haocong2020 | 28 | 93.5 | 89.3 | 94.3 | |
Yang_UESTC_task1b_3 | Haocong2020 | 26 | 93.5 | 89.5 | 94.3 | |
Yang_UESTC_task1b_4 | Haocong2020 | 51 | 90.4 | 86.3 | 91.2 | |
Zhang_BUPT_task1b_1 | Zhang2020 | 39 | 92.0 | 86.3 | 93.1 | |
Zhang_BUPT_task1b_2 | Zhang2020 | 35 | 92.7 | 87.1 | 93.8 | |
Zhang_BUPT_task1b_3 | Zhang2020 | 34 | 92.9 | 87.5 | 94.0 | |
Zhang_BUPT_task1b_4 | Zhang2020 | 33 | 93.0 | 87.5 | 94.1 | |
Zhao_JNU_task1b_1 | Zhao2020 | 74 | 86.6 | 84.3 | 87.0 | |
Zhao_JNU_task1b_2 | Zhao2020 | 72 | 86.9 | 84.9 | 87.4 |
Class-wise performance
Rank | Submission label |
Technical Report |
Official system rank |
Accuracy | Indoor | Outdoor | Transportation |
---|---|---|---|---|---|---|---|
Chang_QTI_task1b_1 | Chang2020 | 12 | 95.0 | 91.5 | 95.3 | 98.3 | |
Chang_QTI_task1b_2 | Chang2020 | 30 | 93.2 | 86.1 | 95.3 | 98.1 | |
Chang_QTI_task1b_3 | Chang2020 | 15 | 94.8 | 91.2 | 94.3 | 98.8 | |
Chang_QTI_task1b_4 | Chang2020 | 19 | 94.4 | 92.2 | 92.8 | 98.2 | |
Dat_HCMUni_task1b_1 | Dat2020 | 57 | 89.5 | 78.9 | 95.5 | 94.1 | |
Farrugia_IMT-Atlantique-BRAIn_task1b_1 | Pajusco2020 | 77 | 85.4 | 75.8 | 88.5 | 91.9 | |
Farrugia_IMT-Atlantique-BRAIn_task1b_2 | Pajusco2020 | 48 | 90.6 | 87.7 | 90.7 | 93.5 | |
Farrugia_IMT-Atlantique-BRAIn_task1b_3 | Pajusco2020 | 73 | 86.6 | 79.8 | 87.2 | 92.7 | |
Farrugia_IMT-Atlantique-BRAIn_task1b_4 | Pajusco2020 | 66 | 88.4 | 81.1 | 90.2 | 93.9 | |
Feng_TJU_task1b_1 | Feng2020 | 86 | 72.3 | 41.8 | 97.5 | 77.5 | |
Feng_TJU_task1b_2 | Feng2020 | 83 | 81.9 | 68.7 | 92.4 | 84.6 | |
Feng_TJU_task1b_3 | Feng2020 | 84 | 80.7 | 66.2 | 91.8 | 84.1 | |
Feng_TJU_task1b_4 | Feng2020 | 85 | 79.9 | 59.0 | 93.5 | 87.2 | |
DCASE2020 baseline | 89.5 | 84.5 | 89.1 | 94.9 | |||
Helin_ADSPLAB_task1b_1 | Wang2020_t1 | 42 | 91.6 | 85.0 | 93.0 | 96.8 | |
Helin_ADSPLAB_task1b_2 | Wang2020_t1 | 41 | 91.6 | 84.7 | 93.5 | 96.7 | |
Helin_ADSPLAB_task1b_3 | Wang2020_t1 | 43 | 91.6 | 84.5 | 93.3 | 96.9 | |
Helin_ADSPLAB_task1b_4 | Wang2020_t1 | 44 | 91.3 | 83.1 | 94.3 | 96.6 | |
Hu_GT_task1b_1 | Hu2020 | 7 | 95.8 | 91.9 | 96.6 | 98.8 | |
Hu_GT_task1b_2 | Hu2020 | 10 | 95.5 | 92.8 | 96.3 | 97.3 | |
Hu_GT_task1b_3 | Hu2020 | 3 | 96.0 | 95.3 | 95.3 | 97.5 | |
Hu_GT_task1b_4 | Hu2020 | 5 | 95.8 | 94.8 | 95.1 | 97.5 | |
Kalinowski_SRPOL_task1b_4 | Kalinowski2020 | 31 | 93.1 | 88.7 | 94.5 | 96.2 | |
Koutini_CPJKU_task1b_1 | Koutini2020 | 16 | 94.7 | 89.1 | 97.0 | 98.0 | |
Koutini_CPJKU_task1b_2 | Koutini2020 | 1 | 96.5 | 92.7 | 97.4 | 99.2 | |
Koutini_CPJKU_task1b_3 | Koutini2020 | 8 | 95.7 | 90.1 | 97.8 | 99.2 | |
Koutini_CPJKU_task1b_4 | Koutini2020 | 2 | 96.2 | 92.2 | 97.3 | 99.0 | |
Kowaleczko_SRPOL_task1b_3 | Kalinowski2020 | 52 | 90.1 | 91.0 | 90.6 | 88.9 | |
Kwiatkowska_SRPOL_task1b_1 | Kalinowski2020 | 36 | 92.6 | 89.0 | 91.9 | 96.9 | |
Kwiatkowska_SRPOL_task1b_2 | Kalinowski2020 | 27 | 93.5 | 89.0 | 93.7 | 97.8 | |
LamPham_Kent_task1b_1 | Pham2020 | 59 | 89.4 | 77.2 | 93.2 | 97.9 | |
LamPham_Kent_task1b_2 | Pham2020 | 71 | 87.0 | 84.8 | 85.9 | 90.2 | |
LamPham_Kent_task1b_3 | Pham2020 | 79 | 84.7 | 67.4 | 94.9 | 91.9 | |
Lee_CAU_task1b_1 | Lee2020 | 47 | 90.7 | 78.1 | 96.9 | 97.0 | |
Lee_CAU_task1b_2 | Lee2020 | 23 | 93.9 | 86.7 | 97.1 | 97.9 | |
Lee_CAU_task1b_3 | Lee2020 | 46 | 91.1 | 81.1 | 96.3 | 95.9 | |
Lee_CAU_task1b_4 | Lee2020 | 45 | 91.2 | 80.3 | 96.7 | 96.5 | |
Lopez-Meyer_IL_task1b_1 | Lopez-Meyer2020_t1b | 50 | 90.4 | 87.9 | 88.9 | 94.3 | |
Lopez-Meyer_IL_task1b_2 | Lopez-Meyer2020_t1b | 53 | 90.1 | 82.5 | 92.6 | 95.2 | |
Lopez-Meyer_IL_task1b_3 | Lopez-Meyer2020_t1b | 49 | 90.5 | 85.8 | 89.6 | 96.2 | |
Lopez-Meyer_IL_task1b_4 | Lopez-Meyer2020_t1b | 56 | 89.7 | 84.4 | 88.0 | 96.6 | |
McDonnell_USA_task1b_1 | McDonnell2020 | 13 | 94.9 | 87.7 | 98.8 | 98.3 | |
McDonnell_USA_task1b_2 | McDonnell2020 | 9 | 95.5 | 90.4 | 97.5 | 98.7 | |
McDonnell_USA_task1b_3 | McDonnell2020 | 4 | 95.9 | 90.7 | 97.9 | 99.2 | |
McDonnell_USA_task1b_4 | McDonnell2020 | 6 | 95.8 | 90.2 | 98.1 | 99.1 | |
Monteiro_INRS_task1b_1 | Joao2020 | 69 | 87.4 | 82.6 | 85.8 | 93.7 | |
Naranjo-Alcazar_Vfy_task1b_1 | Naranjo-Alcazar2020_t1 | 24 | 93.6 | 85.0 | 97.0 | 98.7 | |
Naranjo-Alcazar_Vfy_task1b_2 | Naranjo-Alcazar2020_t1 | 25 | 93.6 | 86.0 | 96.7 | 98.0 | |
NguyenHongDuc_SU_task1b_1 | Nguyen_Hong_Duc2020 | 32 | 93.1 | 88.8 | 93.8 | 96.6 | |
NguyenHongDuc_SU_task1b_2 | Nguyen_Hong_Duc2020 | 37 | 92.3 | 85.8 | 94.4 | 96.5 | |
Ooi_NTU_task1b_1 | Ooi2020 | 67 | 87.8 | 82.6 | 87.2 | 93.6 | |
Ooi_NTU_task1b_2 | Ooi2020 | 70 | 87.3 | 86.5 | 86.8 | 88.6 | |
Ooi_NTU_task1b_3 | Ooi2020 | 55 | 89.8 | 86.1 | 89.1 | 94.1 | |
Ooi_NTU_task1b_4 | Ooi2020 | 54 | 89.8 | 85.6 | 89.3 | 94.4 | |
Paniagua_UPM_task1b_1 | Paniagua2020 | 60 | 89.4 | 82.5 | 91.7 | 94.0 | |
Patki_SELF_task1b_1 | Patki2020 | 76 | 86.0 | 75.9 | 90.7 | 91.6 | |
Patki_SELF_task1b_2 | Patki2020 | 61 | 89.4 | 84.2 | 92.4 | 91.4 | |
Patki_SELF_task1b_3 | Patki2020 | 82 | 83.7 | 82.1 | 72.3 | 96.9 | |
Phan_UIUC_task1b_1 | Phan2020_t1 | 65 | 88.5 | 82.8 | 88.6 | 94.1 | |
Phan_UIUC_task1b_2 | Phan2020_t1 | 62 | 89.2 | 84.0 | 89.9 | 93.9 | |
Phan_UIUC_task1b_3 | Phan2020_t1 | 63 | 89.0 | 78.8 | 92.5 | 95.6 | |
Phan_UIUC_task1b_4 | Phan2020_t1 | 58 | 89.5 | 82.4 | 90.2 | 95.8 | |
Sampathkumar_TUC_task1b_1 | Sampathkumar2020 | 68 | 87.5 | 76.5 | 90.2 | 95.7 | |
Singh_IITMandi_task1b_1 | Singh2020 | 81 | 84.5 | 77.0 | 81.8 | 94.7 | |
Singh_IITMandi_task1b_2 | Singh2020 | 80 | 84.7 | 79.9 | 80.3 | 93.7 | |
Singh_IITMandi_task1b_3 | Singh2020 | 78 | 85.2 | 75.4 | 86.7 | 93.6 | |
Singh_IITMandi_task1b_4 | Singh2020 | 75 | 86.4 | 85.0 | 79.9 | 94.3 | |
Suh_ETRI_task1b_1 | Suh2020 | 29 | 93.3 | 83.5 | 97.2 | 99.2 | |
Suh_ETRI_task1b_2 | Suh2020 | 18 | 94.6 | 87.0 | 97.6 | 99.2 | |
Suh_ETRI_task1b_3 | Suh2020 | 11 | 95.1 | 88.3 | 97.9 | 99.1 | |
Suh_ETRI_task1b_4 | Suh2020 | 17 | 94.6 | 87.0 | 97.7 | 99.2 | |
Vilouras_AUTh_task1b_1 | Vilouras2020 | 40 | 91.8 | 87.2 | 91.1 | 97.2 | |
Waldekar_IITKGP_task1b_1 | Waldekar2020 | 64 | 88.6 | 82.9 | 90.8 | 92.2 | |
Wu_CUHK_task1b_1 | Wu2020_t1b | 22 | 94.2 | 86.1 | 97.9 | 98.5 | |
Wu_CUHK_task1b_2 | Wu2020_t1b | 21 | 94.2 | 86.1 | 97.8 | 98.6 | |
Wu_CUHK_task1b_3 | Wu2020_t1b | 20 | 94.3 | 85.9 | 98.5 | 98.5 | |
Wu_CUHK_task1b_4 | Wu2020_t1b | 14 | 94.9 | 88.8 | 97.3 | 98.6 | |
Yang_UESTC_task1b_1 | Haocong2020 | 38 | 92.1 | 86.6 | 94.1 | 95.5 | |
Yang_UESTC_task1b_2 | Haocong2020 | 28 | 93.5 | 89.4 | 96.3 | 94.8 | |
Yang_UESTC_task1b_3 | Haocong2020 | 26 | 93.5 | 89.9 | 96.0 | 94.7 | |
Yang_UESTC_task1b_4 | Haocong2020 | 51 | 90.4 | 95.0 | 80.8 | 95.3 | |
Zhang_BUPT_task1b_1 | Zhang2020 | 39 | 92.0 | 84.3 | 93.5 | 98.2 | |
Zhang_BUPT_task1b_2 | Zhang2020 | 35 | 92.7 | 88.0 | 92.6 | 97.4 | |
Zhang_BUPT_task1b_3 | Zhang2020 | 34 | 92.9 | 88.3 | 92.6 | 97.8 | |
Zhang_BUPT_task1b_4 | Zhang2020 | 33 | 93.0 | 88.9 | 92.5 | 97.7 | |
Zhao_JNU_task1b_1 | Zhao2020 | 74 | 86.6 | 70.4 | 92.6 | 96.7 | |
Zhao_JNU_task1b_2 | Zhao2020 | 72 | 86.9 | 71.0 | 92.9 | 96.9 |
System characteristics
General characteristics
Rank | Submission label |
Technical Report |
Official system rank |
Accuracy (Eval) |
Input |
Sampling rate |
Data augmentation |
Features |
---|---|---|---|---|---|---|---|---|
Chang_QTI_task1b_1 | Chang2020 | 12 | 95.0 | binaural | 22.05kHz | mixup+FreqMix | perceptual weighted power spectrogram | |
Chang_QTI_task1b_2 | Chang2020 | 30 | 93.2 | binaural | 22.05kHz | mixup+FreqMix | perceptual weighted power spectrogram | |
Chang_QTI_task1b_3 | Chang2020 | 15 | 94.8 | binaural | 22.05kHz | mixup+FreqMix | perceptual weighted power spectrogram | |
Chang_QTI_task1b_4 | Chang2020 | 19 | 94.4 | binaural | 22.05kHz | mixup+FreqMix | perceptual weighted power spectrogram | |
Dat_HCMUni_task1b_1 | Dat2020 | 57 | 89.5 | left, right, average of left+right | 48kHz | Random oversample & mixup | Gammatone energy | |
Farrugia_IMT-Atlantique-BRAIn_task1b_1 | Pajusco2020 | 77 | 85.4 | binaural | 18kHz | temporal masking, filtering, additive noise | raw waveform | |
Farrugia_IMT-Atlantique-BRAIn_task1b_2 | Pajusco2020 | 48 | 90.6 | binaural | 18kHz | temporal masking, filtering, additive noise | raw waveform | |
Farrugia_IMT-Atlantique-BRAIn_task1b_3 | Pajusco2020 | 73 | 86.6 | binaural | 18kHz | cutmix | raw waveform | |
Farrugia_IMT-Atlantique-BRAIn_task1b_4 | Pajusco2020 | 66 | 88.4 | binaural | 18kHz | cutmix | raw waveform | |
Feng_TJU_task1b_1 | Feng2020 | 86 | 72.3 | mono | 48kHz | same class mix | mel spectrogram | |
Feng_TJU_task1b_2 | Feng2020 | 83 | 81.9 | mono | 48kHz | same class mix | mel spectrogram | |
Feng_TJU_task1b_3 | Feng2020 | 84 | 80.7 | mono | 48kHz | same class mix | mel spectrogram | |
Feng_TJU_task1b_4 | Feng2020 | 85 | 79.9 | mono | 48kHz | same class mix | mel spectrogram | |
DCASE2020 baseline | 89.5 | mono | 48kHz | log-mel energies | ||||
Helin_ADSPLAB_task1b_1 | Wang2020_t1 | 42 | 91.6 | mono | 44.1kHz | mixup | log-mel energies, CQT, Gammatone | |
Helin_ADSPLAB_task1b_2 | Wang2020_t1 | 41 | 91.6 | mono | 44.1kHz | mixup | log-mel energies, CQT, Gammatone | |
Helin_ADSPLAB_task1b_3 | Wang2020_t1 | 43 | 91.6 | mono | 44.1kHz | mixup | log-mel energies, CQT, Gammatone | |
Helin_ADSPLAB_task1b_4 | Wang2020_t1 | 44 | 91.3 | mono | 44.1kHz | mixup | log-mel energies, CQT, Gammatone | |
Hu_GT_task1b_1 | Hu2020 | 7 | 95.8 | binaural | 48kHz | mixup, channel confusion, SpecAugment | log-mel energies | |
Hu_GT_task1b_2 | Hu2020 | 10 | 95.5 | binaural | 48kHz | mixup, channel confusion, SpecAugment | log-mel energies | |
Hu_GT_task1b_3 | Hu2020 | 3 | 96.0 | binaural | 48kHz | mixup, channel confusion, SpecAugment | log-mel energies | |
Hu_GT_task1b_4 | Hu2020 | 5 | 95.8 | binaural | 48kHz | mixup, channel confusion, SpecAugment | log-mel energies | |
Kalinowski_SRPOL_task1b_4 | Kalinowski2020 | 31 | 93.1 | mono | 48kHz | time warping, frequency warping, loudness control, time length control, time masking, frequency masking | log-mel spectrogram | |
Koutini_CPJKU_task1b_1 | Koutini2020 | 16 | 94.7 | stereo | 22.05kHz | mixup | Perceptually-weighted log-mel energies | |
Koutini_CPJKU_task1b_2 | Koutini2020 | 1 | 96.5 | stereo | 22.05kHz | mixup | Perceptually-weighted log-mel energies | |
Koutini_CPJKU_task1b_3 | Koutini2020 | 8 | 95.7 | stereo | 22.05kHz | mixup | Perceptually-weighted log-mel energies | |
Koutini_CPJKU_task1b_4 | Koutini2020 | 2 | 96.2 | stereo | 22.05kHz | mixup | Perceptually-weighted log-mel energies | |
Kowaleczko_SRPOL_task1b_3 | Kalinowski2020 | 52 | 90.1 | mono | 48kHz | log-mel spectrogram | ||
Kwiatkowska_SRPOL_task1b_1 | Kalinowski2020 | 36 | 92.6 | mono | 48kHz | mixup | log-mel energies | |
Kwiatkowska_SRPOL_task1b_2 | Kalinowski2020 | 27 | 93.5 | mono | 48kHz | log-mel energies | ||
LamPham_Kent_task1b_1 | Pham2020 | 59 | 89.4 | left | 48kHz | mixup | Gammatone energy | |
LamPham_Kent_task1b_2 | Pham2020 | 71 | 87.0 | left | 48kHz | mixup | Gammatone energy | |
LamPham_Kent_task1b_3 | Pham2020 | 79 | 84.7 | left | 48kHz | mixup | Gammatone energy | |
Lee_CAU_task1b_1 | Lee2020 | 47 | 90.7 | binaural | 48kHz | mixup | log-mel energies, deltas, delta-deltas | |
Lee_CAU_task1b_2 | Lee2020 | 23 | 93.9 | binaural | 48kHz | mixup | log-mel energies, deltas, delta-deltas | |
Lee_CAU_task1b_3 | Lee2020 | 46 | 91.1 | binaural | 48kHz | mixup | HPSS | |
Lee_CAU_task1b_4 | Lee2020 | 45 | 91.2 | binaural | 48kHz | HPSS, log-mel energies, deltas, delta-deltas | ||
Lopez-Meyer_IL_task1b_1 | Lopez-Meyer2020_t1b | 50 | 90.4 | mono | 16kHz | random noise, random gain, random cropping, mixup | raw waveform | |
Lopez-Meyer_IL_task1b_2 | Lopez-Meyer2020_t1b | 53 | 90.1 | mono | 16kHz | random noise, random gain, random cropping, mixup | raw waveform | |
Lopez-Meyer_IL_task1b_3 | Lopez-Meyer2020_t1b | 49 | 90.5 | mono | 48kHz | SpecAugment | mel filterbank | |
Lopez-Meyer_IL_task1b_4 | Lopez-Meyer2020_t1b | 56 | 89.7 | binaural | 16kHz | SpecAugment | log-mel filterbanks, GCC-grams | |
McDonnell_USA_task1b_1 | McDonnell2020 | 13 | 94.9 | left, right | 48kHz | mixup, temporal cropping, channel swapping | log-mel energies | |
McDonnell_USA_task1b_2 | McDonnell2020 | 9 | 95.5 | left, right | 48kHz | mixup, temporal cropping, channel swapping | log-mel energies | |
McDonnell_USA_task1b_3 | McDonnell2020 | 4 | 95.9 | left, right | 48kHz | mixup, temporal cropping, channel swapping | log-mel energies | |
McDonnell_USA_task1b_4 | McDonnell2020 | 6 | 95.8 | left, right | 48kHz | mixup, temporal cropping, channel swapping | log-mel energies | |
Monteiro_INRS_task1b_1 | Joao2020 | 69 | 87.4 | mono | 44.1kHz | Sox distortions, SpecAugment | log-mel energies | |
Naranjo-Alcazar_Vfy_task1b_1 | Naranjo-Alcazar2020_t1 | 24 | 93.6 | left, right, difference | 48kHz | gammatone | ||
Naranjo-Alcazar_Vfy_task1b_2 | Naranjo-Alcazar2020_t1 | 25 | 93.6 | left, right, difference, mono | 48kHz | gammatone, HPSS, log-mel energies | ||
NguyenHongDuc_SU_task1b_1 | Nguyen_Hong_Duc2020 | 32 | 93.1 | mono, binaural | 48kHz | mixup | RMS level, third-octave levels, Leq, interaural cross correlation coefficient, hardness, depth, brightness, roughness, warmth, sharpness, boominess, reverb, log-mel spectrogram | |
NguyenHongDuc_SU_task1b_2 | Nguyen_Hong_Duc2020 | 37 | 92.3 | mono, binaural | 48kHz | mixup | RMS level, third-octave levels, Leq, interaural cross correlation coefficient, hardness, depth, brightness, roughness, warmth, sharpness, boominess, reverb, log-mel spectrogram | |
Ooi_NTU_task1b_1 | Ooi2020 | 67 | 87.8 | mono | 48kHz | log-mel energies | ||
Ooi_NTU_task1b_2 | Ooi2020 | 70 | 87.3 | mono | 48kHz | log-mel energies | ||
Ooi_NTU_task1b_3 | Ooi2020 | 55 | 89.8 | mono | 48kHz | log-mel energies | ||
Ooi_NTU_task1b_4 | Ooi2020 | 54 | 89.8 | mono | 48kHz | block mixing | log-mel energies | |
Paniagua_UPM_task1b_1 | Paniagua2020 | 60 | 89.4 | 48kHz | LTAS, envelope modulation spectrum, cepstrum of cross-correlation | |||
Patki_SELF_task1b_1 | Patki2020 | 76 | 86.0 | left+right, left-right | 48kHz | log-mel spectrogram | ||
Patki_SELF_task1b_2 | Patki2020 | 61 | 89.4 | left+right, left-right | 48kHz | log-mel spectrogram | ||
Patki_SELF_task1b_3 | Patki2020 | 82 | 83.7 | mono | 48kHz | log-mel spectrogram | ||
Phan_UIUC_task1b_1 | Phan2020_t1 | 65 | 88.5 | mono | 48kHz | log-mel energies | ||
Phan_UIUC_task1b_2 | Phan2020_t1 | 62 | 89.2 | mono | 48kHz | log-mel energies | ||
Phan_UIUC_task1b_3 | Phan2020_t1 | 63 | 89.0 | mono | 48kHz | log-mel energies | ||
Phan_UIUC_task1b_4 | Phan2020_t1 | 58 | 89.5 | mono | 48kHz | log-mel energies | ||
Sampathkumar_TUC_task1b_1 | Sampathkumar2020 | 68 | 87.5 | mono | 48kHz | log-mel energies | ||
Singh_IITMandi_task1b_1 | Singh2020 | 81 | 84.5 | mono | 16kHz | raw waveform segment | ||
Singh_IITMandi_task1b_2 | Singh2020 | 80 | 84.7 | mono | 16kHz | raw waveform segment | ||
Singh_IITMandi_task1b_3 | Singh2020 | 78 | 85.2 | mono | 16kHz | raw waveform segment | ||
Singh_IITMandi_task1b_4 | Singh2020 | 75 | 86.4 | mono | 16kHz | raw waveform segment | ||
Suh_ETRI_task1b_1 | Suh2020 | 29 | 93.3 | stereo | 48kHz | temporal cropping, mixup | log-mel energies | |
Suh_ETRI_task1b_2 | Suh2020 | 18 | 94.6 | stereo | 48kHz | temporal cropping, mixup | log-mel energies | |
Suh_ETRI_task1b_3 | Suh2020 | 11 | 95.1 | stereo | 48kHz | temporal cropping, mixup | log-mel energies | |
Suh_ETRI_task1b_4 | Suh2020 | 17 | 94.6 | stereo | 48kHz | temporal cropping, mixup | log-mel energies | |
Vilouras_AUTh_task1b_1 | Vilouras2020 | 40 | 91.8 | mono | 48kHz | log-mel energies | ||
Waldekar_IITKGP_task1b_1 | Waldekar2020 | 64 | 88.6 | mono | 48kHz | histogram of uniform LBP of log-mel energies | ||
Wu_CUHK_task1b_1 | Wu2020_t1b | 22 | 94.2 | binaural | 48kHz | mixup | wavelet filter-bank features | |
Wu_CUHK_task1b_2 | Wu2020_t1b | 21 | 94.2 | binaural | 48kHz | mixup | wavelet filter-bank features | |
Wu_CUHK_task1b_3 | Wu2020_t1b | 20 | 94.3 | binaural | 48kHz | mixup | wavelet filter-bank features | |
Wu_CUHK_task1b_4 | Wu2020_t1b | 14 | 94.9 | binaural | 48kHz | mixup | wavelet filter-bank features | |
Yang_UESTC_task1b_1 | Haocong2020 | 38 | 92.1 | binaural | 22.05kHz | CQT | ||
Yang_UESTC_task1b_2 | Haocong2020 | 28 | 93.5 | mixed | 22.05kHz | CQT | ||
Yang_UESTC_task1b_3 | Haocong2020 | 26 | 93.5 | binaural | 22.05kHz | CQT | ||
Yang_UESTC_task1b_4 | Haocong2020 | 51 | 90.4 | binaural | 22.05kHz | CQT | ||
Zhang_BUPT_task1b_1 | Zhang2020 | 39 | 92.0 | mixed | 44.1kHz | mixup | log-mel energies | |
Zhang_BUPT_task1b_2 | Zhang2020 | 35 | 92.7 | mixed | 44.1kHz | mixup | log-mel energies | |
Zhang_BUPT_task1b_3 | Zhang2020 | 34 | 92.9 | mixed | 44.1kHz | mixup | log-mel energies | |
Zhang_BUPT_task1b_4 | Zhang2020 | 33 | 93.0 | mixed | 44.1kHz | mixup | log-mel energies | |
Zhao_JNU_task1b_1 | Zhao2020 | 74 | 86.6 | mono | 48kHz | SpecAugment | log-mel energies | |
Zhao_JNU_task1b_2 | Zhao2020 | 72 | 86.9 | mono | 48kHz | SpecAugment | log-mel energies |
Machine learning characteristics
Rank | Code |
Technical Report |
Official system rank |
Accuracy (Eval) |
External data usage |
External data sources |
Classifier |
Ensemble subsystems |
Decision making |
---|---|---|---|---|---|---|---|---|---|
Chang_QTI_task1b_1 | Chang2020 | 12 | 95.0 | CNN | |||||
Chang_QTI_task1b_2 | Chang2020 | 30 | 93.2 | CNN | |||||
Chang_QTI_task1b_3 | Chang2020 | 15 | 94.8 | CNN | |||||
Chang_QTI_task1b_4 | Chang2020 | 19 | 94.4 | CNN | |||||
Dat_HCMUni_task1b_1 | Dat2020 | 57 | 89.5 | CNN | |||||
Farrugia_IMT-Atlantique-BRAIn_task1b_1 | Pajusco2020 | 77 | 85.4 | CNN | |||||
Farrugia_IMT-Atlantique-BRAIn_task1b_2 | Pajusco2020 | 48 | 90.6 | CNN | |||||
Farrugia_IMT-Atlantique-BRAIn_task1b_3 | Pajusco2020 | 73 | 86.6 | ResNet | |||||
Farrugia_IMT-Atlantique-BRAIn_task1b_4 | Pajusco2020 | 66 | 88.4 | ResNet | |||||
Feng_TJU_task1b_1 | Feng2020 | 86 | 72.3 | CNN | |||||
Feng_TJU_task1b_2 | Feng2020 | 83 | 81.9 | CNN | |||||
Feng_TJU_task1b_3 | Feng2020 | 84 | 80.7 | CNN | |||||
Feng_TJU_task1b_4 | Feng2020 | 85 | 79.9 | CNN | |||||
DCASE2020 baseline | 89.5 | embeddings | CNN | ||||||
Helin_ADSPLAB_task1b_1 | Wang2020_t1 | 42 | 91.6 | CNN | |||||
Helin_ADSPLAB_task1b_2 | Wang2020_t1 | 41 | 91.6 | CNN | |||||
Helin_ADSPLAB_task1b_3 | Wang2020_t1 | 43 | 91.6 | CNN | |||||
Helin_ADSPLAB_task1b_4 | Wang2020_t1 | 44 | 91.3 | CNN | |||||
Hu_GT_task1b_1 | Hu2020 | 7 | 95.8 | CNN | |||||
Hu_GT_task1b_2 | Hu2020 | 10 | 95.5 | CNN, MobileNet, ensemble | 2 | average | |||
Hu_GT_task1b_3 | Hu2020 | 3 | 96.0 | CNN, MobileNet, ensemble | 2 | logistical regression | |||
Hu_GT_task1b_4 | Hu2020 | 5 | 95.8 | CNN, ensemble | 2 | logistical regression | |||
Kalinowski_SRPOL_task1b_4 | Kalinowski2020 | 31 | 93.1 | CNN, VGG | softmax | ||||
Koutini_CPJKU_task1b_1 | Koutini2020 | 16 | 94.7 | RF-regularized CNNs | |||||
Koutini_CPJKU_task1b_2 | Koutini2020 | 1 | 96.5 | RF-regularized CNNs | |||||
Koutini_CPJKU_task1b_3 | Koutini2020 | 8 | 95.7 | RF-regularized CNNs | |||||
Koutini_CPJKU_task1b_4 | Koutini2020 | 2 | 96.2 | RF-regularized CNNs | |||||
Kowaleczko_SRPOL_task1b_3 | Kalinowski2020 | 52 | 90.1 | CNN | softmax | ||||
Kwiatkowska_SRPOL_task1b_1 | Kalinowski2020 | 36 | 92.6 | CNN, ensemble | 2 | soft voting | |||
Kwiatkowska_SRPOL_task1b_2 | Kalinowski2020 | 27 | 93.5 | CNN, ensemble | 2 | soft voting | |||
LamPham_Kent_task1b_1 | Pham2020 | 59 | 89.4 | CNN | |||||
LamPham_Kent_task1b_2 | Pham2020 | 71 | 87.0 | CNN | |||||
LamPham_Kent_task1b_3 | Pham2020 | 79 | 84.7 | CNN | |||||
Lee_CAU_task1b_1 | Lee2020 | 47 | 90.7 | ResNet | |||||
Lee_CAU_task1b_2 | Lee2020 | 23 | 93.9 | ResNet | |||||
Lee_CAU_task1b_3 | Lee2020 | 46 | 91.1 | ResNet | |||||
Lee_CAU_task1b_4 | Lee2020 | 45 | 91.2 | Multi-input model, ResNet | |||||
Lopez-Meyer_IL_task1b_1 | Lopez-Meyer2020_t1b | 50 | 90.4 | directly | Audioset | CNN | maximum softmax | ||
Lopez-Meyer_IL_task1b_2 | Lopez-Meyer2020_t1b | 53 | 90.1 | directly | Audioset | CNN | maximum softmax | ||
Lopez-Meyer_IL_task1b_3 | Lopez-Meyer2020_t1b | 49 | 90.5 | directly | Audioset | CNN | maximum softmax | ||
Lopez-Meyer_IL_task1b_4 | Lopez-Meyer2020_t1b | 56 | 89.7 | CNN | maximum softmax | ||||
McDonnell_USA_task1b_1 | McDonnell2020 | 13 | 94.9 | CNN | |||||
McDonnell_USA_task1b_2 | McDonnell2020 | 9 | 95.5 | CNN | |||||
McDonnell_USA_task1b_3 | McDonnell2020 | 4 | 95.9 | CNN | |||||
McDonnell_USA_task1b_4 | McDonnell2020 | 6 | 95.8 | CNN | |||||
Monteiro_INRS_task1b_1 | Joao2020 | 69 | 87.4 | CNN | |||||
Naranjo-Alcazar_Vfy_task1b_1 | Naranjo-Alcazar2020_t1 | 24 | 93.6 | CNN | |||||
Naranjo-Alcazar_Vfy_task1b_2 | Naranjo-Alcazar2020_t1 | 25 | 93.6 | CNN | |||||
NguyenHongDuc_SU_task1b_1 | Nguyen_Hong_Duc2020 | 32 | 93.1 | directly | CNN, GRU, MLP | 3 | average | ||
NguyenHongDuc_SU_task1b_2 | Nguyen_Hong_Duc2020 | 37 | 92.3 | directly | CNN, GRU, MLP | 2 | average | ||
Ooi_NTU_task1b_1 | Ooi2020 | 67 | 87.8 | VGGNet | |||||
Ooi_NTU_task1b_2 | Ooi2020 | 70 | 87.3 | InceptionNet | |||||
Ooi_NTU_task1b_3 | Ooi2020 | 55 | 89.8 | VGGNet, InceptionNet, ensemble | 6 | average | |||
Ooi_NTU_task1b_4 | Ooi2020 | 54 | 89.8 | VGGNet, InceptionNet, ensemble | 6 | average | |||
Paniagua_UPM_task1b_1 | Paniagua2020 | 60 | 89.4 | MLP | average log-likelihood | ||||
Patki_SELF_task1b_1 | Patki2020 | 76 | 86.0 | embeddings | SVM | ||||
Patki_SELF_task1b_2 | Patki2020 | 61 | 89.4 | embeddings | SVM | ||||
Patki_SELF_task1b_3 | Patki2020 | 82 | 83.7 | embeddings | SVM | ||||
Phan_UIUC_task1b_1 | Phan2020_t1 | 65 | 88.5 | CNN | |||||
Phan_UIUC_task1b_2 | Phan2020_t1 | 62 | 89.2 | CNN | |||||
Phan_UIUC_task1b_3 | Phan2020_t1 | 63 | 89.0 | CNN | |||||
Phan_UIUC_task1b_4 | Phan2020_t1 | 58 | 89.5 | CNN | |||||
Sampathkumar_TUC_task1b_1 | Sampathkumar2020 | 68 | 87.5 | CNN | |||||
Singh_IITMandi_task1b_1 | Singh2020 | 81 | 84.5 | CNN | maximum likelihood | ||||
Singh_IITMandi_task1b_2 | Singh2020 | 80 | 84.7 | pre-trained weights of SoundNet | CNN | maximum likelihood | |||
Singh_IITMandi_task1b_3 | Singh2020 | 78 | 85.2 | pre-trained weights of SoundNet for initilization | SoundNet | CNN | maximum likelihood | ||
Singh_IITMandi_task1b_4 | Singh2020 | 75 | 86.4 | pre-trained weights of SoundNet | SoundNet | CNN | maximum likelihood | ||
Suh_ETRI_task1b_1 | Suh2020 | 29 | 93.3 | CNN(Inception) | |||||
Suh_ETRI_task1b_2 | Suh2020 | 18 | 94.6 | CNN(Inception) | |||||
Suh_ETRI_task1b_3 | Suh2020 | 11 | 95.1 | CNN(Inception) | 2 | average | |||
Suh_ETRI_task1b_4 | Suh2020 | 17 | 94.6 | CNN(Inception) | 2 | weighted score average | |||
Vilouras_AUTh_task1b_1 | Vilouras2020 | 40 | 91.8 | CNN | |||||
Waldekar_IITKGP_task1b_1 | Waldekar2020 | 64 | 88.6 | SVM | |||||
Wu_CUHK_task1b_1 | Wu2020_t1b | 22 | 94.2 | CNN | |||||
Wu_CUHK_task1b_2 | Wu2020_t1b | 21 | 94.2 | CNN | 2 | average | |||
Wu_CUHK_task1b_3 | Wu2020_t1b | 20 | 94.3 | CNN | 3 | average | |||
Wu_CUHK_task1b_4 | Wu2020_t1b | 14 | 94.9 | CNN | 3 | average | |||
Yang_UESTC_task1b_1 | Haocong2020 | 38 | 92.1 | CNN | |||||
Yang_UESTC_task1b_2 | Haocong2020 | 28 | 93.5 | CNN | |||||
Yang_UESTC_task1b_3 | Haocong2020 | 26 | 93.5 | CNN | |||||
Yang_UESTC_task1b_4 | Haocong2020 | 51 | 90.4 | CNN | |||||
Zhang_BUPT_task1b_1 | Zhang2020 | 39 | 92.0 | ResNet | |||||
Zhang_BUPT_task1b_2 | Zhang2020 | 35 | 92.7 | ResNet | |||||
Zhang_BUPT_task1b_3 | Zhang2020 | 34 | 92.9 | ResNet | |||||
Zhang_BUPT_task1b_4 | Zhang2020 | 33 | 93.0 | ResNet | |||||
Zhao_JNU_task1b_1 | Zhao2020 | 74 | 86.6 | embeddings | CNN | ||||
Zhao_JNU_task1b_2 | Zhao2020 | 72 | 86.9 | embeddings | CNN |
Technical reports
QTI Submission to DCASE 2020: Model Efficient Acoustic Scene
Simyung Chang, Janghoon Cho, Hyoungwoo Park, Hyunsin Park, Sungrack Yun and Kyuwoong Hwang
Qualcomm AI Research, Qualcomm Korea YH, Seoul, South Korea
Chang_QTI_task1b_1 Chang_QTI_task1b_2 Chang_QTI_task1b_3 Chang_QTI_task1b_4
QTI Submission to DCASE 2020: Model Efficient Acoustic Scene
Simyung Chang, Janghoon Cho, Hyoungwoo Park, Hyunsin Park, Sungrack Yun and Kyuwoong Hwang
Qualcomm AI Research, Qualcomm Korea YH, Seoul, South Korea
Abstract
This technical report describes the details of our submission (QAIR team’s submission) for Task1B of the DCASE 2020 challenge. In this report, we introduce three methods for the efficient acoustic scene classification with low model complexity. First, inspired by CutMix which is proposed for image recognition tasks, we consider FreqMix for the data augmentation of mixing specific frequency bands of two different samples instead of cutting and pasting box patches. Second, as a novel feature normalization, we consider SubSpectral Normalization, which can reduce the correlation between the sub-spectral groups by performing the normalization on each separated group. Last, to reduce the number of model parameters, we propose a Shared Residual architecture where the weights of all layers (except the normalization layer) are shared. All submission models were trained without any external data, and our model is not based on an ensemble of multiple models but a single model to satisfy the model complexity condition.
System characteristics
Input | binaural |
Sampling rate | 22.05kHz |
Data augmentation | mixup+FreqMix |
Features | perceptual weighted power spectrogram |
Classifier | CNN |
Complexity management | sparsity |
CNN-Based Framework for DCASE 2020 Task 1B Challenge
Ngo Dat, Pham Lam, Nguyen Anh and Hoang Hao
Electrical & Electronic Engineering, Ho Chi Minh University of Technology, Ho Chi Minh, Vietnam
Dat_HCMUni_task1b_1
CNN-Based Framework for DCASE 2020 Task 1B Challenge
Ngo Dat, Pham Lam, Nguyen Anh and Hoang Hao
Electrical & Electronic Engineering, Ho Chi Minh University of Technology, Ho Chi Minh, Vietnam
Abstract
This technical report presents a low-complexity CNN-based deep learning framework for acoustic scene classification. Particularly, the proposed architecture constitute of two main steps front-end feature extraction and back-end network. Firstly, spectrogram representation is approached as front-end feature extraction in this framework. Next, the spectrograms extracted are fed into a CNN-based architecture for classification. Obtained experimental results conducted over the DCASE 2020 Task 1B dataset improve DCASE baseline by 7.2%.
System characteristics
Input | left, right, average of left+right |
Sampling rate | 48kHz |
Data augmentation | Random oversample & mixup |
Features | Gammatone energy |
Classifier | CNN |
Acoustic Scene Classification Based on Lightweight CNN with Efficient Convolutions
Guoqing Feng, Jinhua Liang and Biyun Ding
School of Electrical and Information Engineering, Tianjin University, Tianjin, China
Feng_TJU_task1b_1 Feng_TJU_task1b_2 Feng_TJU_task1b_3 Feng_TJU_task1b_4
Acoustic Scene Classification Based on Lightweight CNN with Efficient Convolutions
Guoqing Feng, Jinhua Liang and Biyun Ding
School of Electrical and Information Engineering, Tianjin University, Tianjin, China
Abstract
This technical report is for the Task 1B Acoustic scene classification of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE). Targeting low complexity solutions for the classification problem in term of model size, a kind of lightweight Convolutional Neural Network (CNN) with efficient convolutions is designed. The network is constructed by the improved bottleneck block based on the inverted residual linear bottleneck block. In the improved bottleneck block, the operations of Detpthwise Channel Ascent (DCA) and Group Channel Descent (GCD) are used to replace pointwise convolution to realize efficient channel transformation. The designed network is denoted by CNN-BDG in this report. CNN-BDF realizes a better performance which is 4.46% higher than the baseline model in the validation set. Besides, the parameters are reduced to about 30% compared to the baseline model.
System characteristics
Input | mono |
Sampling rate | 48kHz |
Data augmentation | same class mix |
Features | mel spectrogram |
Classifier | CNN |
Complexity management | optimize the convolution operation and the network structure |
Low-Complexity Acoustic Scene Classification Using Primary Ambient Extraction and Cyclegan
Yang Haocong, Shi Chuang and Li Huiyong
Electronic Engineering, University of Electronic Science and Technology of China, Chengdu, China
Yang_UESTC_task1b_1 Yang_UESTC_task1b_2 Yang_UESTC_task1b_3 Yang_UESTC_task1b_4
Low-Complexity Acoustic Scene Classification Using Primary Ambient Extraction and Cyclegan
Yang Haocong, Shi Chuang and Li Huiyong
Electronic Engineering, University of Electronic Science and Technology of China, Chengdu, China
Abstract
This report describes our submissions for DCASE2020 Challenge task 1b (Low-Complexity Acoustic Scene Classification). In each submission, constant-Q transform is used as acoustic feature, and the corresponding classifier is a full convolution neural network based on residual blocks. The classifier parameters use half-precision (16 bit) float-point number to limit the model size and accelerate training. We use primary ambient extraction in the audio front-end processing, and generate virtual samples according to the phase information of binaural audio. These virtual samples will be used for one of the submissions. We also used the virtual samples generated by CycleGAN for another submission. Finally, we give a 4-fold cross validation submission that meets the complexity limit. The highest macro recognition accuracy of the above methods in the development dataset is 96.05%, and the log loss is 0.120.
System characteristics
Input | binaural; mixed |
Sampling rate | 22.05kHz |
Features | CQT |
Classifier | CNN |
Complexity management | float16 |
Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation
Hu Hu1, Chao-Han Huck Yang1, Xianjun Xia2, Xue Bai3, Xin Tang3, Yajian Wang3, Shutong Niu3, Li Chai3, Juanjuan Li2, Hongning Zhu2, Feng Bao4, Yuanjun Zhao2, Sabato Marco Siniscalchi5, Yannan Wang2, Jun Du3 and Chin-Hui Lee1
1School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, USA, 2Tencent Media Lab, Shenzhen, China, 3University of Science and Technology of China, HeFei, China, 4Tencent Media Lab, Beijing, China, 5Computer Engineering School, University of Enna Kore, Italy
Hu_GT_task1b_1 Hu_GT_task1b_2 Hu_GT_task1b_3 Hu_GT_task1b_4
Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation
Hu Hu1, Chao-Han Huck Yang1, Xianjun Xia2, Xue Bai3, Xin Tang3, Yajian Wang3, Shutong Niu3, Li Chai3, Juanjuan Li2, Hongning Zhu2, Feng Bao4, Yuanjun Zhao2, Sabato Marco Siniscalchi5, Yannan Wang2, Jun Du3 and Chin-Hui Lee1
1School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, USA, 2Tencent Media Lab, Shenzhen, China, 3University of Science and Technology of China, HeFei, China, 4Tencent Media Lab, Beijing, China, 5Computer Engineering School, University of Enna Kore, Italy
Abstract
In this technical report, we present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge. Task 1 comprises two different sub-tasks: (i) Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes, and (ii) Task 1b concerns with classification of data into three higher-level classes using lowcomplexity solutions. For Task 1a, we propose a novel two-stage ASC system leveraging upon ad-hoc score combination of two convolutional neural networks (CNNs), classifying the acoustic input according to three classes, and then ten classes, respectively. Four different CNN-based architectures are explored to implement the two-stage classifiers, and several data augmentation techniques are also investigated. For Task 1b, we leverage upon a quantization method to reduce the complexity of two of our top-accuracy three-classes CNN-based architectures. On Task 1a development data set, an ASC accuracy of 76.9% is attained using our best single classifier and data augmentation. An accuracy of 81.9% is then attained by a final model fusion of our two-stage ASC classifiers. On Task 1b development data set, we achieve an accuracy of 96.7% with a model size smaller than 500KB
System characteristics
Input | binaural |
Sampling rate | 48kHz |
Data augmentation | mixup, channel confusion, SpecAugment |
Features | log-mel energies |
Classifier | CNN; CNN, MobileNet, ensemble; CNN, ensemble |
Decision making | average; logistical regression |
Complexity management | int8, quantization |
Development of the INRS-EMT Scene Classification Systems for the 2020 Edition of the DCASE Challenge
Monteiro Joao, Shruti Kshirsagar, Anderson Avila, Amr Aaballah, Parth Tiwari and Tiago Falk
EMT, Institut National de la Recherche Scientifique, Montreal, Canada
Abstract
In this report we provide a brief overview of a set of submissions for the scene classification sub-tasks of the 2020 edition of the DCASE challenge. Our submissions comprise efforts at the feature representation level, where we explored the use of modulation spectra and i-vectors (extracted from mel cepstral coefficients, as well as modulation spectra) and modeling strategies, where recent convolutional deep neural network models were used. Results on the Challenge validation set show several of the submitted methods outperforming the baseline model.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Data augmentation | Sox distortions, SpecAugment |
Features | log-mel energies |
Classifier | CNN |
Low-Complexity Acoustic Scene Classification with Small Convolutional Neural Networks and Curriculum Learning
Beniamin Kalinowski
Audio Intelligence, Samsung R&D Poland, Warsaw, Poland
Kalinowski_SRPOL_task1b_4 Kowaleczko_SRPOL_task1b_3 Kwiatkowska_SRPOL_task1b_1 Kwiatkowska_SRPOL_task1b_2
Low-Complexity Acoustic Scene Classification with Small Convolutional Neural Networks and Curriculum Learning
Beniamin Kalinowski
Audio Intelligence, Samsung R&D Poland, Warsaw, Poland
Abstract
The report presents the results of submission to Task 1B of Detection and Classification of Acoustic Scenes and Events Challenge (DCASE) 2020. Main issue in this task was size limitation of 500 KB for each of submitted models. Such limitations are important when model ought to be implemented on device with low memory size. For this task four different models based on convolutional neural networks were developed, varying from data preprocessing methods, data architectures etc. Crucial techniques used in complexity management were curriculum learning and the use of depth-wise and separable convolutions, along with ensembling models trained on 3 and 10 classes for performance preservation. Best models improved baseline by 10% increase in accuracy and by 60% decrease in log-loss.
System characteristics
Input | mono |
Sampling rate | 48kHz |
Data augmentation | time warping, frequency warping, loudness control, time length control, time masking, frequency masking; mixup |
Features | log-mel spectrogram; log-mel energies |
Classifier | CNN, VGG; CNN; CNN, ensemble |
Decision making | softmax; soft voting |
Complexity management | using rectangular convolution kernels; constraints-aware modelling |
CP-JKU Submissions to DCASE’20: Low-Complexity Cross-Device Acoustic Scene Classification with RF-Regularized CNNs
Khaled Koutini, Florian Henkel, Hamid Eghbal-zadeh and Gerhard Widmer
Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria
Koutini_CPJKU_task1b_1 Koutini_CPJKU_task1b_2 Koutini_CPJKU_task1b_3 Koutini_CPJKU_task1b_4
CP-JKU Submissions to DCASE’20: Low-Complexity Cross-Device Acoustic Scene Classification with RF-Regularized CNNs
Khaled Koutini, Florian Henkel, Hamid Eghbal-zadeh and Gerhard Widmer
Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria
Abstract
This technical report describes the CP-JKU team’s submission for Task 1 - Subtask A (Acoustic Scene Classification with Multiple Devices) and Subtask B (Low-Complexity Acoustic Scene Classification) of the DCASE-2020 challenge. For Subtask 1A, we provide our Receptive Field (RF) regularized CNN model as a baseline, and additionally explore the use of two different domain adaption objectives in the form of the Maximum Mean Discrepancy (MMD) and the Sliced Wasserstein Distance (SWD). For Subtask 1B, we investigate different parameter reduction methods such as Pruning and Knowledge Distillation (KD). Additionally, we incorporate a decomposed convolutional layer that reduces the number of nonezero parameters in our models while only slightly decreasing the accuracy compared to full-parameter baseline.
System characteristics
Input | stereo |
Sampling rate | 22.05kHz |
Data augmentation | mixup |
Features | Perceptually-weighted log-mel energies |
Classifier | RF-regularized CNNs |
Complexity management | float16, conv layers decomposition; pruning, float16; float16, smaller width/depth |
The CAU-ET Acoustic Scenery Classification System for DCASE 2020 Challenge
Yerin Lee1, Soyoung Lim1 and Il-Youp Kwak2
1Statistics, Chung-Ang University, Seoul, South Korea, 2Department of Applied Statistics, Chung-Ang University, Seoul, South Korea
Lee_CAU_task1b_1 Lee_CAU_task1b_2 Lee_CAU_task1b_3 Lee_CAU_task1b_4
The CAU-ET Acoustic Scenery Classification System for DCASE 2020 Challenge
Yerin Lee1, Soyoung Lim1 and Il-Youp Kwak2
1Statistics, Chung-Ang University, Seoul, South Korea, 2Department of Applied Statistics, Chung-Ang University, Seoul, South Korea
Abstract
The acoustic scenry classification problem is an interesting topic that has been studied for a long time through the DCASE competition. This technical report presents the CAU-ET’s submitted scenery detection system to the DCASE 2020 challenge, Task 1. In our method we generate mel-spectrogram from audio. From log-mel spectrogram, we got Deltas, Delta-deltas and Harmonic-percussive source seperation(HPSS) feature as inputs of our deep neural network models. The classification result of the proposed system was 66.26% for development dataset in subtask A and 95.27% in subtask B
System characteristics
Input | binaural |
Sampling rate | 48kHz |
Data augmentation | mixup |
Features | log-mel energies, deltas, delta-deltas; HPSS; HPSS, log-mel energies, deltas, delta-deltas |
Classifier | ResNet; Multi-input model, ResNet |
Low-Memory Convolutional Neural Networks for Acoustic Scene Classification
Paulo Lopez-Meyer1, Juan Antonio Del Hoyo Ontiveros1, Hong Lu2, Hector Alfonso Cordourier Maruri1, Georg Stemmer3, Lama Nachman2 and Jonathan Huang4
1Intel Labs, Intel Corporation, Jalisco, Mexico, 2Intel Labs, Intel Corporation, California, USA, 3Intel Labs, Intel Corporation, Neubiberg, Germany, 4California, USA
Lopez-Meyer_IL_task1b_1 Lopez-Meyer_IL_task1b_2 Lopez-Meyer_IL_task1b_3 Lopez-Meyer_IL_task1b_4
Low-Memory Convolutional Neural Networks for Acoustic Scene Classification
Paulo Lopez-Meyer1, Juan Antonio Del Hoyo Ontiveros1, Hong Lu2, Hector Alfonso Cordourier Maruri1, Georg Stemmer3, Lama Nachman2 and Jonathan Huang4
1Intel Labs, Intel Corporation, Jalisco, Mexico, 2Intel Labs, Intel Corporation, California, USA, 3Intel Labs, Intel Corporation, Neubiberg, Germany, 4California, USA
Abstract
In this work, we describe the implementation of four different convolutional neural networks for acoustic scene classification, complying with the memory size restrictions defined in the DCASE2020 Task 1b challenge guidelines. Quantization, pruning, knowledge distillation, and GCC-grams as input features, were explored as means to achieve the highest accuracy possible while reducing the number of resources in terms of the models trainable parameters and memory. Our experimental results yield to higher than the 87.30% reported accuracy in the challenge’s baseline, where our four submissions managed to achieve > 90.00% of acoustic classification accuracy using CNN models with < 500 KB .
System characteristics
Input | mono; binaural |
Sampling rate | 16kHz; 48kHz |
Data augmentation | random noise, random gain, random cropping, mixup; SpecAugment |
Features | raw waveform; mel filterbank; log-mel filterbanks, GCC-grams |
Classifier | CNN |
Decision making | maximum softmax |
Complexity management | quantization; pruning, quantization; knowledge distillation, quantization |
Low-Complexity Acoustic Scene Classification Using One-Bit-Per-Weight Deep Convolutional Neural Networks
Mark McDonnell
Computational Learning Systems Laboratory, University of South Australia, Mawson Lakes, Australia
McDonnell_USA_task1b_1 McDonnell_USA_task1b_2 McDonnell_USA_task1b_3 McDonnell_USA_task1b_4
Low-Complexity Acoustic Scene Classification Using One-Bit-Per-Weight Deep Convolutional Neural Networks
Mark McDonnell
Computational Learning Systems Laboratory, University of South Australia, Mawson Lakes, Australia
Abstract
This technical report describes a submission to Task 1b (“LowComplexity Acoustic Scene Classification”) in the DCASE2020 Acoustic Scene Challenge. Solutions for this task were required to be constrained to have parameters totalling no more than 500 KB. The strategy described in this report was to train a deep convolutional neural network applied to spectrograms formed from the acoustic scene files, such that each convolutional weight was set to one of two values following training, and hence could be stored using a single bit. This strategy allowed a single 36-layer all-convolutional deep neural network to be trained, consisting of a total of 3,987,000 binary weights, totalling 486.69KB. The model achieved a macro-average accuracy (balanced accuracy score) across the three classes of 96.6±0.5% on the 2020 DCASE Task 1b validation set.
System characteristics
Input | left, right |
Sampling rate | 48kHz |
Data augmentation | mixup, temporal cropping, channel swapping |
Features | log-mel energies |
Classifier | CNN |
Complexity management | 1-bit quantization |
Task 1 DCASE 2020: ASC with Mismatch Devices and Reduced Size Model Using Residual Squeeze-Excitation CNNs
Javier Naranjo-Alcazar1,2, Sergi Perez-Castanos3, Pedro Zuccarello3 and Maximo Cobos2
1AI department, Visualfy, Benisano, Spain, 2Computer Science Department, Universitat de Valencia, Burjassot, Spain, 3AI department, Visualfy, Benisano, Valencia
Naranjo-Alcazar_Vfy_task1b_1 Naranjo-Alcazar_Vfy_task1b_2
Task 1 DCASE 2020: ASC with Mismatch Devices and Reduced Size Model Using Residual Squeeze-Excitation CNNs
Javier Naranjo-Alcazar1,2, Sergi Perez-Castanos3, Pedro Zuccarello3 and Maximo Cobos2
1AI department, Visualfy, Benisano, Spain, 2Computer Science Department, Universitat de Valencia, Burjassot, Spain, 3AI department, Visualfy, Benisano, Valencia
Abstract
Acoustic Scene Classification (ASC) is a problem related to the field of machine listening whose objective is to classify/tag an audio clip in a predefined label describing a scene location such as park, airport among others. Due to the emergence of more extensive audio datasets, solutions based on Deep Learning techniques have become the state-of-the-art. The most common choice are those that implement a convolutional neural network (CNN) having previously transformed the audio signal into a 2D representation. This twodimensional audio representation is currently a subject of research. In addition, there are solutions that propose several concatenated 2D representations, thus creating a representation with several input channels. This article proposes two novel stereo audio representations to maximize the accuracy of an ASC framework. These representations correspond to the 3-channel representations such as the left channel, the right channel and the difference between channels (L − R) using the Gammatone filter bank and the harmonic, percussive and difference between channels sources using the Mel filter bank. Both representations are also concatenated creating a 6-channel with different audio filter banks. Furthermore, the proposed CNN is a residual network that employs squeeze-excitation techniques in its residual blocks in a novel way to force the network to extract meaningful features from the audio representation. The proposed network is used in both subtasks with different modifications to meet the requirements of each one. However, since stereo audio is not available in Subtask A, the representations are slightly modified in that task. This technical report first presents the overlaps of the two tasks and then makes the relevant changes to each task in one section per task. The baselines are surpassed in both tasks by approximately 10 percentage points.
System characteristics
Input | left, right, difference; left, right, difference, mono |
Sampling rate | 48kHz |
Features | gammatone; gammatone, HPSS, log-mel energies |
Classifier | CNN |
Acoustic Scene Classification Using Long-Term and Fine-Scale Audio Representations
Paul Nguyen Hong Duc1, Dorian Cazau2, Olivier Adam3, Odile Gerard4 and Paul R. White5
1Institut d’Alembert, Sorbonne Universite, Paris, France, 2Lab-Sticc, ENSTA Bretagne, Brest, France, 3Institut d'Alembert, Sorbonne Université, Paris, France, 4Techniques Navales, DGA-TN, Toulon, France, 5ISVR, University of Southampton, Southampton, United Kingdom
NguyenHongDuc_SU_task1b_1 NguyenHongDuc_SU_task1b_2
Acoustic Scene Classification Using Long-Term and Fine-Scale Audio Representations
Paul Nguyen Hong Duc1, Dorian Cazau2, Olivier Adam3, Odile Gerard4 and Paul R. White5
1Institut d’Alembert, Sorbonne Universite, Paris, France, 2Lab-Sticc, ENSTA Bretagne, Brest, France, 3Institut d'Alembert, Sorbonne Université, Paris, France, 4Techniques Navales, DGA-TN, Toulon, France, 5ISVR, University of Southampton, Southampton, United Kingdom
Abstract
Audio scene classification (ASC) is an emerging filed of research in different scientific communities such as urban soundscape characterization or bioacoustics. It has gained visibility and relevance with open challenges especially with the benchmark dataset and evaluation from DCASE. This paper present our deep learning model to address the ASC task of the DCASE 2020 challenge edition. The model exploits multiple long-term and fine-scale audio representations as inputs of the neural network. Each representation is fed into a different network. The audio embedding of each branch are fused before a Multi-Layer Perceptron to predict the final classes.
System characteristics
Input | mono, binaural |
Sampling rate | 48kHz |
Data augmentation | mixup |
Features | RMS level, third-octave levels, Leq, interaural cross correlation coefficient, hardness, depth, brightness, roughness, warmth, sharpness, boominess, reverb, log-mel spectrogram |
Classifier | CNN, GRU, MLP |
Decision making | average |
Ensemble of Pruned Models for Low-Complexity Acoustic Scene Classifcation
Kenneth Ooi, Santi Peksi and Gan Woon-Seng
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Abstract
For the DCASE 2020 Challenge, the focus of Task 1B is to develop low-complexity models for classification of 3 different types of acoustic scenes, which have potential applications in resource-scarce edge devices deployed in a large-scale acoustic network. For this report, we present the training methodology for our submissions for the challenge, with the best-performing system consisting of an ensemble of VGGNet- and InceptionNet-based lightweight classification models. The subsystems in the ensemble classifier were trained with log-mel spectrograms of the raw audio data, and were subsequently pruned by setting low-magnitude weights periodically to zero with a polynomial decay schedule for an 80% reduction in individual subsystem size. The resultant ensemble classifier outperformed the baseline model on the validation set over 5 runs and had 119758 nonzero parameters which took up 468 KB of memory, thus showing the efficacy of the pruning technique. No external data was used, and source code for the submission can be found at https://github.com/kenowr/DCASE-2020-Task-1B.
System characteristics
Input | mono |
Sampling rate | 48kHz |
Data augmentation | block mixing |
Features | log-mel energies |
Classifier | VGGNet; InceptionNet; VGGNet, InceptionNet, ensemble |
Decision making | average |
Complexity management | sparsity |
Lightweight Convolutional Neural Networks on Binaural Waveforms for Low Complexity Acoustic Scene Classification
Nicolas Pajusco, Richard Huang and Nicolas Farrugia
Electronics, IMT Atlantique, Brest, France
Farrugia_IMT-Atlantique-BRAIn_task1b_1 Farrugia_IMT-Atlantique-BRAIn_task1b_2 Farrugia_IMT-Atlantique-BRAIn_task1b_3 Farrugia_IMT-Atlantique-BRAIn_task1b_4
Lightweight Convolutional Neural Networks on Binaural Waveforms for Low Complexity Acoustic Scene Classification
Nicolas Pajusco, Richard Huang and Nicolas Farrugia
Electronics, IMT Atlantique, Brest, France
Abstract
This report describes our submission to DCASE 2020 task 1, subtask B, which is an acoustic scene classification task with the objective of minimizing parameter count. While the vast majority of proposed approaches rely on fixed feature extraction based on time-frequency representations such as spectrograms, we propose to fully exploit the information in binaural waveforms directly. To do so, we train one dimensional Convolutional Neural Networks (1D-CNN) on raw, subsampled binaural audio waveforms, thus exploiting phase information within and across the two input channels. In addition, our approach relies heavily on data augmentation in the temporal domain. Finally, we apply iterative structured parameter pruning to remove the least important convolutional kernels, and perform weight quantization in floating point half precision. We apply this approach to train two network architectures: a 1D-CNN based on VGG-like blocks, as well as a ResNet architecture with 1D convolutions. Our results show that we can train, prune and quantify a small VGG model to make it 20 times smaller than the 500 KB limit (model A) with an accuracy at baseline level (87.6 %), as well as a larger model achieving 91 % of accuracy while being 8 times smaller than the challenge limit. ResNets could be successfully trained, pruned and quantify in order to be below the 500 KB limit, achieving up to 91.2 % accuracy
System characteristics
Input | binaural |
Sampling rate | 18kHz |
Data augmentation | temporal masking, filtering, additive noise; cutmix |
Features | raw waveform |
Classifier | CNN; ResNet |
Complexity management | float16, quantization, pruning; float16, quantization |
Classification of Acoustic Scenes Based on Modulation Spectra and the Cepstrum of the Cross Correlation Between Binarual Audio Channels
Arturo Paniagua, Rubén Fraile, Juana M. Gutiérrez-Arriola, Nicolás Sáenz-lechón and Víctor J- Osma-Ruiz
CITSEM, Universidad Politéctica de Madrid, Madrid, Spain
Paniagua_UPM_task1b_1
Classification of Acoustic Scenes Based on Modulation Spectra and the Cepstrum of the Cross Correlation Between Binarual Audio Channels
Arturo Paniagua, Rubén Fraile, Juana M. Gutiérrez-Arriola, Nicolás Sáenz-lechón and Víctor J- Osma-Ruiz
CITSEM, Universidad Politéctica de Madrid, Madrid, Spain
Abstract
A system for the automatic classification of acoustic scenes is proposed that uses one audio channel for calculating the spectral distribution of energy across auditory-relevant frequency bands, and some descriptors of the envelope modulation spectrum (EMS) obtained by means of the discrete cosine transform. When the stereophonic signal captured by a binaural microphone is available, this parameter set is augmented by including the first coefficients of the cepstrum of the cross-correlation between both audio channels. This cross-correlation contains information on the angular distribution of acoustic sources. These three types of features (energy spectrum, EMS and cepstrum of cross-correlation) are used as inputs for a multilayer perceptron with two hidden layers and a number of adjustable parameters below 15,000.
System characteristics
Sampling rate | 48kHz |
Features | LTAS, envelope modulation spectrum, cepstrum of cross-correlation |
Classifier | MLP |
Decision making | average log-likelihood |
Exploring Compact Alternatives to Deep Learning in Task 1B
Prachi Patki
Patki_SELF_task1b_1 Patki_SELF_task1b_2 Patki_SELF_task1b_3
Exploring Compact Alternatives to Deep Learning in Task 1B
Prachi Patki
Abstract
Task 1b appeared primarily geared toward finding compact deep learning models; however, our experience is that other methodologies may sometimes achieve similar accuracies with substantially smaller parameter counts. We focused on finding alternative classifier formulations that significantly reduce complexity while still achieving superior results. Our primary submission, based on a multi-channel SVM formulation, performs better than the reference design on test data, but requires only ~17.5 KB in parameter complexity.
System characteristics
Input | left+right, left-right; mono |
Sampling rate | 48kHz |
Features | log-mel spectrogram |
Classifier | SVM |
DCASE 2020 Challenge Task 1B: Low-Complexity CNN-Based Framework for Acoustic Scene Classification
Lam Pham1, Ngo Dat2, Phan Huy3 and Duong Ngoc4
1School of Computing, University of Kent, Kent, UK, 2Electrical & Electronic Engineering, Ho Chi Minh University of Technology, Ho Chi Minh, Vietnam, 3School of Electronic Engineering and Computer Science, Queen Mary University of London, London, UK, 4InterDigital R&D, InterDigital Company, Rennes, France
LamPham_Kent_task1b_1 LamPham_Kent_task1b_2 LamPham_Kent_task1b_3
DCASE 2020 Challenge Task 1B: Low-Complexity CNN-Based Framework for Acoustic Scene Classification
Lam Pham1, Ngo Dat2, Phan Huy3 and Duong Ngoc4
1School of Computing, University of Kent, Kent, UK, 2Electrical & Electronic Engineering, Ho Chi Minh University of Technology, Ho Chi Minh, Vietnam, 3School of Electronic Engineering and Computer Science, Queen Mary University of London, London, UK, 4InterDigital R&D, InterDigital Company, Rennes, France
Abstract
This report presents a low-complexity CNN-based deep learning framework for acoustic scene classification task (ASC). In particular, the framework approaches spectrogram representation referred to as front-end feature extraction. The spectrograms extracted are fed into a CNN-based architecture for classification, referred to as the baseline. Next, quantization and pruning techniques are applied on the pre-trained baseline to fine-tune and further compress the network size, eventually achieve low-complexity models with competitive performance.
System characteristics
Input | left |
Sampling rate | 48kHz |
Data augmentation | mixup |
Features | Gammatone energy |
Classifier | CNN |
Complexity management | quantization; pruning |
DCASE 2020 Task 1 Subtask B: Low-Complexity Acoustic Scene Classification
Duc Phan and Douglas Jones
ECE, University of Illinois at Urbana Champaign, Illinois, USA
Phan_UIUC_task1b_1 Phan_UIUC_task1b_2 Phan_UIUC_task1b_3 Phan_UIUC_task1b_4
DCASE 2020 Task 1 Subtask B: Low-Complexity Acoustic Scene Classification
Duc Phan and Douglas Jones
ECE, University of Illinois at Urbana Champaign, Illinois, USA
Abstract
A deep network with depth-wise separable convolutions [1] and skip connections is introduced for low complexity acoustic scenes classification. The proposed network is not only more than 15 times smaller than the baseline convolution neural network [2] but also outperforms the baseline by two percents on average.
System characteristics
Input | mono |
Sampling rate | 48kHz |
Features | log-mel energies |
Classifier | CNN |
Low Complexity Acoustic Scene Classification Using Aalnet-94
Arunodhayan Sampathkumar and Danny Kowerko
Juniorprofessur MEDIA COMPUTING, Techniche universität Chemnitz, Chemnitz, Germany
Sampathkumar_TUC_task1b_1
Low Complexity Acoustic Scene Classification Using Aalnet-94
Arunodhayan Sampathkumar and Danny Kowerko
Juniorprofessur MEDIA COMPUTING, Techniche universität Chemnitz, Chemnitz, Germany
Abstract
One of the manifold application fields of Deep Neural Networks (DNN) is the classification of audio signals such as indoor, outdoor, transportation, humans and animals sounds. DCASE2020 provided a dataset consisting of 3 classes to perform classification using low complexity solutions. The dataset was trained using AALNet-94 from our previous research work that performed well in publicly available datasets such as ESC-50, Ultrasound 8K and audioset. The results obtained performed well when compared with the baseline.
System characteristics
Input | mono |
Sampling rate | 48kHz |
Features | log-mel energies |
Classifier | CNN |
End2end CNN-Based Low-Complexity Acoustic Scene Classification
Arshdeep Singh, Dhanunjaya Varma Devalraju and Padmanabhan Rajan
School of Computing and Electrical engineering, Indian institute of technology, Mandi, Mandi, India
Singh_IITMandi_task1b_1 Singh_IITMandi_task1b_2 Singh_IITMandi_task1b_3 Singh_IITMandi_task1b_4
End2end CNN-Based Low-Complexity Acoustic Scene Classification
Arshdeep Singh, Dhanunjaya Varma Devalraju and Padmanabhan Rajan
School of Computing and Electrical engineering, Indian institute of technology, Mandi, Mandi, India
Abstract
This technical report describes the IITMandi AudioTeam’s submission for ASC Task 1, Subtask B of DCASE2020 challenge. This report aims to design low-complexity systems for acoustic scene classification. We propose a convolution neural network based endto-end classification framework. The proposed framework learns from raw audio directly. We present performance analysis of various frameworks with model size lesser than 500KB for classification. The three acoustic scenes namely indoor, outdoor and transportation are considered. Our experimental analysis shows that the proposed end-to-end framework, where features are being learned from raw audio directly, with a model size of approx. 77KB gives similar performance on development dataset as that of baseline1 system proposed for the same task.
System characteristics
Input | mono |
Sampling rate | 16kHz |
Features | raw waveform segment |
Embeddings | Singh_IITMandi_task1b_1, Singh_IITMandi_task1b_3 |
Classifier | CNN |
Decision making | maximum likelihood |
Designing Acoustic Scene Classification Models with CNN Variants
Sangwon Suh, Sooyoung Park, Youngho Jeong and Taejin Lee
Media Coding Research Section, Electronics and Telecommunications Research Institute, Daejeon, South Korea
Suh_ETRI_task1b_1 Suh_ETRI_task1b_2 Suh_ETRI_task1b_3 Suh_ETRI_task1b_4
Designing Acoustic Scene Classification Models with CNN Variants
Sangwon Suh, Sooyoung Park, Youngho Jeong and Taejin Lee
Media Coding Research Section, Electronics and Telecommunications Research Institute, Daejeon, South Korea
Abstract
This technical report describes our Acoustic Scene Classification systems for DCASE2020 challenge Task1. For subtask A, we designed a single model implemented with three parallel ResNets, which is named Trident ResNet. We have confirmed that this structure is beneficial when analyzing samples collected from minority or unseen devices, and confirmed 73.7% classification accuracy for the test split. For subtask B, we used the Inception module to build a Shallow Inception model that has fewer parameters than the CNN of the DCASE baseline system. Due to the sparse structure of the Inception module, we have enhanced the accuracy of the model up to 97.6%, while reducing the number of parameters.
System characteristics
Input | stereo |
Sampling rate | 48kHz |
Data augmentation | temporal cropping, mixup |
Features | log-mel energies |
Classifier | CNN(Inception) |
Decision making | average; weighted score average |
Complexity management | float16 |
Acoustic Scene Classification Using Fully Convolutional Neural Networks and Per-Channel Energy Normalization
Konstantinos Vilouras
Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece
Vilouras_AUTh_task1b_1
Acoustic Scene Classification Using Fully Convolutional Neural Networks and Per-Channel Energy Normalization
Konstantinos Vilouras
Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece
Abstract
This technical report describes our approach to Task 1 ''Acoustic Scene Classification'' of the DCASE 2020 challenge. For subtask A, we introduce per-channel energy normalization (PCEN) as an additional preprocessing step along with log-Mel spectrograms. We also propose two residual network architectures utilizing “Shake-Shake” regularization and the “Squeeze-and-Excitation” block, respectively. Our best submission (ensemble of 8 classifiers) outperforms the corresponding baseline system by 16.2% in terms of macro-average accuracy. For subtask B, we mainly focus on a low complexity, fully convolutional neural network architecture, which leads to 5% relative improvement over baseline accuracy.
System characteristics
Input | mono |
Sampling rate | 48kHz |
Features | log-mel energies |
Classifier | CNN |
Mel-Scaled Wavelet-Based Features for Sub-Task A and Texture Features for Sub-Task B of DCASE 2020 Task 1
Shefali Waldekar, Kishore Kumar A and Goutam Saha
Electronics and Electrical Communication Engineering Dept., Indian Institute of Technology Kharagpur, Kharagpur, India
Waldekar_IITKGP_task1b_1
Mel-Scaled Wavelet-Based Features for Sub-Task A and Texture Features for Sub-Task B of DCASE 2020 Task 1
Shefali Waldekar, Kishore Kumar A and Goutam Saha
Electronics and Electrical Communication Engineering Dept., Indian Institute of Technology Kharagpur, Kharagpur, India
Abstract
This report describes a submission for IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 for Task 1 (acoustic scene classification (ASC)), sub-task A (ASC with Multiple Devices) and sub-task B (LowComplexity ASC). The systems exploit time-frequency representation of audio to obtain the scene labels. The system for Task1A follows a simple pattern classification framework employing wavelet transform based mel-scaled features along with support vector machine (SVM) as classifier. Texture features, namely Local Binary Pattern (LBP) extracted from log of mel-band energies is used in a similar classification framework for Task 1B. The proposed systems outperform the deep-learning based baseline system with the development dataset provided for the respective sub-tasks.
System characteristics
Input | mono |
Sampling rate | 48kHz |
Features | histogram of uniform LBP of log-mel energies |
Classifier | SVM |
Acoustic Scene Classification with Multiple Decision Schemes
Helin Wang, Dading Chong and Yuexian Zou
School of ECE, Peking University, Shenzhen, China
Helin_ADSPLAB_task1b_1 Helin_ADSPLAB_task1b_2 Helin_ADSPLAB_task1b_3 Helin_ADSPLAB_task1b_4
Acoustic Scene Classification with Multiple Decision Schemes
Helin Wang, Dading Chong and Yuexian Zou
School of ECE, Peking University, Shenzhen, China
Abstract
This technical report describes the ADSPLAB team’s submission for Task1 of DCASE2020 challenge. Our acoustic scene classifi- cation (ASC) system is based on the convolutional neural networks (CNN). Multiple decision schemes are proposed in our system, in- cluding the decision schemes in multiple representations, multiple frequency bands, and multiple temporal frames. The final system is the fusion of models with multiple decision schemes and mod- els pre-trained on AudioSet. The experimental results show that our system could achieve the accuracy of 84.5 %(official baseline: 54.1%) and 92.1% (official baseline: 87.3%) on the officially provided fold 1 evaluation dataset of Task1A and Task1B, respectively.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Data augmentation | mixup |
Features | log-mel energies, CQT, Gammatone |
Classifier | CNN |
Searching for Efficient Network Architectures for Acoustic Scene Classification
Yuzhong Wu and Tan Lee
Electronic Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
Abstract
This technical report describes our submission for Task 1B of DCASE2020 challenge. The objective of task 1B is to construct an acoustic scene classification (ASC) system with low model complexity. In our ASC system, the average-difference time-frequency features are extracted from binaural audio waveforms. A random search policy is used to find the best-performing CNN architecture while satisfying the requirement of model size. The search is limited to several predefined efficient convolutional modules based on depth-wise convolution and swish activation function to constrain the size of search space. Experimental results on development dataset shows that CNN model obtained by this search strategy has higher accuracy compared to an AlexNet-like CNN benchmark.
System characteristics
Input | binaural |
Sampling rate | 48kHz |
Data augmentation | mixup |
Features | wavelet filter-bank features |
Classifier | CNN |
Decision making | average |
Complexity management | float16 |
Bupt Submissions to DCASE 2020: Low-Complexity Acoustic Scene Classification with Post Training Static Quantization and Prune
Jiawang Zhang, Chunxia Ren and Shengchen Li
BUPT, Beijing University of Posts and Telecommunications, Beijing, China
Zhang_BUPT_task1b_1 Zhang_BUPT_task1b_2 Zhang_BUPT_task1b_3 Zhang_BUPT_task1b_4
Bupt Submissions to DCASE 2020: Low-Complexity Acoustic Scene Classification with Post Training Static Quantization and Prune
Jiawang Zhang, Chunxia Ren and Shengchen Li
BUPT, Beijing University of Posts and Telecommunications, Beijing, China
Abstract
This report describes a method for Task 1b (Low-Complexity Acoustic Scene Classification) of the DCASE 2020 challenge, which targets low complexity solutions for the classification problem. The proposed model has five residual block with average pooling. To improve the performance of the proposed system, binaural features from the dataset are used, and with Log Mel Spectrogram, mix-up data augmentation. To reduce system complexity, the proposed method uses Post Training Static Quantization and Prune methods, Post Training Static Quantization are used to do the 8-bits quantization, this method can reduce the model size by four times. Pruning can reduce redundant weights by prune the low weights, the process allows only a small part of the original weight parameters performance close to the original network. The accuracy of the method proposed in this report on the development data set is 92.9%, which is 5.6% higher than the baseline, but 81% lower than the baseline model.
System characteristics
Input | mixed |
Sampling rate | 44.1kHz |
Data augmentation | mixup |
Features | log-mel energies |
Classifier | ResNet |
Complexity management | 8-bit quantization |
Dd-CNN: Depthwise Disout Convolutional Neural Network for Low-Complexity Acoustic Scene Classification
Jingqiao Zhao1, Xiao-Jun Wu1, Xiaoning Song1, Zhen-Hua Feng2 and Qiuqiang Kong3
1Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China, 2Centre for Vision, Speech and Signal Processin, University of Surrey, Guildford, UK, 3AI Lab,, ByteDance, Shanghai, China
Zhao_JNU_task1b_1 Zhao_JNU_task1b_2
Dd-CNN: Depthwise Disout Convolutional Neural Network for Low-Complexity Acoustic Scene Classification
Jingqiao Zhao1, Xiao-Jun Wu1, Xiaoning Song1, Zhen-Hua Feng2 and Qiuqiang Kong3
1Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China, 2Centre for Vision, Speech and Signal Processin, University of Surrey, Guildford, UK, 3AI Lab,, ByteDance, Shanghai, China
Abstract
This report presents our Depthwise Disout Convolutional Neural Network (DD-CNN) used for the detection and classification of urban acoustic scenes in the DCASE2020 Challenge (Task 1 - Subtask B). Specifically, we use log-mel as feature representations of acoustic signals for the inputs of our network. In DD-CNN, depthwise separable convolution is used to reduce the network complexity. Besides, SpecAugment and Disout are used for further performance boosting. Experimental results demonstrate that our DD-CNN can learn discriminative acoustic characteristics from audio fragments and effectively reduce the network complexity. Our method achieves 92.04% accuracy on the validation set.
System characteristics
Input | mono |
Sampling rate | 48kHz |
Data augmentation | SpecAugment |
Features | log-mel energies |
Classifier | CNN |
Complexity management | disout |