Task description
The task evaluates systems for the large-scale detection of sound events using weakly labeled data (without timestamps). The target of the systems is to provide not only the event class but also the event time boundaries given that multiple events can be present in an audio recording. Another challenge of the task is to explore the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly annotated training set to improve system performance. The labels in the annotated subset are verified and can be considered as reliable.
More detailed task description can be found in the task description page
Systems ranking
Rank |
Submission code |
Submission name |
Technical Report |
Event-based F-score (Evaluation dataset) |
Event-based F-score (Development dataset) |
---|---|---|---|---|---|
Wang_NUDT_task4_4 | NUDT System for DCASE2019 Task4 | Wang2019 | 16.8 | 23.8 | |
Wang_NUDT_task4_3 | NUDT System for DCASE2019 Task4 | Wang2019 | 17.5 | 22.4 | |
Wang_NUDT_task4_2 | NUDT System for DCASE2019 Task4 | Wang2019 | 17.2 | 22.5 | |
Wang_NUDT_task4_1 | NUDT System for DCASE2019 Task4 | Wang2019 | 17.2 | 22.7 | |
Delphin_OL_task4_2 | DCASE2019 mean-teacher with shifted and noisy data augmentation system | Delphin-Poulat2019 | 42.1 | 43.6 | |
Delphin_OL_task4_1 | DCASE2019 mean-teacher with shifted data augmentation system | Delphin-Poulat2019 | 38.3 | 42.1 | |
Kong_SURREY_task4_1 | CVSSP cross-task CNN baseline | Kong2019 | 22.3 | 21.3 | |
CTK_NU_task4_2 | CTK_NU_task4_2 | Chan2019 | 29.7 | 29.7 | |
CTK_NU_task4_3 | CTK_NU_task4_3 | Chan2019 | 27.7 | 27.8 | |
CTK_NU_task4_4 | CTK_NU_task4_4 | Chan2019 | 26.9 | 27.2 | |
CTK_NU_task4_1 | CTK_NU_task4_1 | Chan2019 | 31.0 | 30.4 | |
Mishima_NEC_task4_3 | msm_ResNet_3_augmentation | Mishima2019 | 18.3 | 25.9 | |
Mishima_NEC_task4_4 | msm_ResNet_4_augmentation_pseudo | Mishima2019 | 19.8 | 24.7 | |
Mishima_NEC_task4_2 | msm_ResNet_2_pseudo | Mishima2019 | 17.7 | 24.8 | |
Mishima_NEC_task4_1 | msm_ResNet_1_simple | Mishima2019 | 16.7 | 24.0 | |
CANCES_IRIT_task4_2 | CANCES multi-task | Cances2019 | 28.4 | 33.8 | |
CANCES_IRIT_task4_2 | CANCES multi-task | Cances2019 | 26.1 | 28.8 | |
PELLEGRINI_IRIT_task4_1 | PELLEGRINI multi-task | Cances2019 | 39.7 | 39.9 | |
Lin_ICT_task4_2 | Guiding_learning_2 | Lin2019 | 40.9 | 44.0 | |
Lin_ICT_task4_4 | Guiding_learning_4 | Lin2019 | 41.8 | 45.4 | |
Lin_ICT_task4_3 | Guiding_learning_3 | Lin2019 | 42.7 | 45.3 | |
Lin_ICT_task4_1 | Guiding_learning_1 | Lin2019 | 40.7 | 44.5 | |
Baseline_dcase2019 | DCASE2019 baseline system | Turpault2019 | 25.8 | 23.7 | |
bolun_NWPU_task4_1 | DCASE2019 task4 system | Bolun2019 | 21.7 | 25.0 | |
bolun_NWPU_task4_4 | DCASE2019 task4 system | Bolun2019 | 25.3 | 31.9 | |
bolun_NWPU_task4_3 | DCASE2019 task4 system | Bolun2019 | 23.8 | 25.0 | |
bolun_NWPU_task4_2 | DCASE2019 task4 system | Bolun2019 | 27.8 | 31.9 | |
Agnone_PDL_task4_1 | Mean VAT Teacher | Agnone2019 | 25.0 | 59.6 | |
Kiyokawa_NEC_task4_1 | DCASE2019 SED ResNet self-mask kiyo | Kiyokawa2019 | 27.8 | 31.6 | |
Kiyokawa_NEC_task4_4 | DCASE2019 SED ResNet self-mask kiyo | Kiyokawa2019 | 32.4 | 36.1 | |
Kiyokawa_NEC_task4_3 | DCASE2019 SED ResNet self-mask kiyo | Kiyokawa2019 | 29.4 | 34.5 | |
Kiyokawa_NEC_task4_2 | DCASE2019 SED ResNet self-mask kiyo | Kiyokawa2019 | 28.3 | 31.8 | |
Kothinti_JHU_task4_2 | JHU DCASE2019 task4 system | Kothinti2019 | 30.5 | 35.3 | |
Kothinti_JHU_task4_3 | JHU DCASE2019 task4 system | Kothinti2019 | 29.0 | 34.4 | |
Kothinti_JHU_task4_4 | JHU DCASE2019 task4 system | Kothinti2019 | 29.4 | 35.0 | |
Kothinti_JHU_task4_1 | JHU DCASE2019 task4 system | Kothinti2019 | 30.7 | 34.6 | |
Shi_FRDC_task4_2 | BossLee_FRDC_2 | Shi2019 | 42.0 | 42.5 | |
Shi_FRDC_task4_3 | BossLee_FRDC_3 | Shi2019 | 40.9 | 38.9 | |
Shi_FRDC_task4_4 | BossLee_FRDC_4 | Shi2019 | 41.5 | 41.7 | |
Shi_FRDC_task4_1 | BossLee_FRDC_1 | Shi2019 | 37.0 | 36.7 | |
ZYL_UESTC_task4_1 | UESTC_SICE_task4_1 | Zhang2019 | 29.4 | 36.0 | |
ZYL_UESTC_task4_2 | UESTC_SICE_task4_2 | Zhang2019 | 30.8 | 35.6 | |
Wang_YSU_task4_1 | Wang_YSU_task4_1 | Yang2019 | 6.5 | 19.4 | |
Wang_YSU_task4_2 | Wang_YSU_task4_2 | Yang2019 | 6.2 | 20.9 | |
Wang_YSU_task4_3 | Wang_YSU_task4_3 | Yang2019 | 6.7 | 22.7 | |
Yan_USTC_task4_1 | USTC_CRNN_MT system1 | Yan2019 | 35.8 | 41.4 | |
Yan_USTC_task4_3 | USTC_CRNN_MT system3 | Yan2019 | 35.6 | 42.1 | |
Yan_USTC_task4_4 | USTC_CRNN_MT system4 | Yan2019 | 33.5 | 39.4 | |
Yan_USTC_task4_2 | USTC_CRNN_MT system2 | Yan2019 | 36.2 | 42.6 | |
Lee_KNU_task4_2 | KNUwaveCNN2 | Lee2019 | 25.8 | 31.6 | |
Lee_KNU_task4_4 | KNUwaveCNN4 | Lee2019 | 24.6 | 28.7 | |
Lee_KNU_task4_3 | KNUwaveCNN3 | Lee2019 | 26.7 | 31.6 | |
Lee_KNU_task4_1 | KNUwaveCNN1 | Lee2019 | 26.4 | 28.8 | |
Rakowski_SRPOL_task4_1 | Regularized Surrey9 | Rakowski2019 | 24.2 | 24.3 | |
Lim_ETRI_task4_1 | Lim_task4_1 | Lim2019 | 32.6 | 38.8 | |
Lim_ETRI_task4_2 | Lim_task4_2 | Lim2019 | 33.2 | 39.5 | |
Lim_ETRI_task4_3 | Lim_task4_3 | Lim2019 | 32.5 | 39.4 | |
Lim_ETRI_task4_4 | Lim_task4_4 | Lim2019 | 34.4 | 40.9 |
Supplementary metrics
Rank |
Submission code |
Submission name |
Technical Report |
Event-based F-score (Evaluation dataset) |
Event-based F-score (Youtube dataset) |
Event-based F-score (Vimeo dataset) |
Segment-based F-score (Evaluation dataset) |
---|---|---|---|---|---|---|---|
Wang_NUDT_task4_4 | NUDT System for DCASE2019 Task4 | Wang2019 | 16.8 | 18.3 | 13.2 | 64.8 | |
Wang_NUDT_task4_3 | NUDT System for DCASE2019 Task4 | Wang2019 | 17.5 | 19.2 | 13.3 | 63.0 | |
Wang_NUDT_task4_2 | NUDT System for DCASE2019 Task4 | Wang2019 | 17.2 | 18.4 | 14.4 | 65.0 | |
Wang_NUDT_task4_1 | NUDT System for DCASE2019 Task4 | Wang2019 | 17.2 | 18.7 | 13.5 | 64.4 | |
Delphin_OL_task4_2 | DCASE2019 mean-teacher with shifted and noisy data augmentation system | Delphin-Poulat2019 | 42.1 | 45.8 | 33.3 | 71.4 | |
Delphin_OL_task4_1 | DCASE2019 mean-teacher with shifted data augmentation system | Delphin-Poulat2019 | 38.3 | 41.9 | 29.2 | 68.6 | |
Kong_SURREY_task4_1 | CVSSP cross-task CNN baseline | Kong2019 | 22.3 | 24.1 | 17.0 | 59.4 | |
CTK_NU_task4_2 | CTK_NU_task4_2 | Chan2019 | 29.7 | 33.2 | 21.0 | 55.6 | |
CTK_NU_task4_3 | CTK_NU_task4_3 | Chan2019 | 27.7 | 30.8 | 19.8 | 50.5 | |
CTK_NU_task4_4 | CTK_NU_task4_4 | Chan2019 | 26.9 | 30.1 | 18.8 | 48.7 | |
CTK_NU_task4_1 | CTK_NU_task4_1 | Chan2019 | 31.0 | 34.7 | 21.6 | 58.2 | |
Mishima_NEC_task4_3 | msm_ResNet_3_augmentation | Mishima2019 | 18.3 | 20.6 | 12.6 | 58.8 | |
Mishima_NEC_task4_4 | msm_ResNet_4_augmentation_pseudo | Mishima2019 | 19.8 | 21.8 | 15.0 | 58.7 | |
Mishima_NEC_task4_2 | msm_ResNet_2_pseudo | Mishima2019 | 17.7 | 19.0 | 14.1 | 56.1 | |
Mishima_NEC_task4_1 | msm_ResNet_1_simple | Mishima2019 | 16.7 | 18.8 | 11.7 | 56.2 | |
CANCES_IRIT_task4_2 | CANCES multi-task | Cances2019 | 28.4 | 31.1 | 21.3 | 61.2 | |
CANCES_IRIT_task4_2 | CANCES multi-task | Cances2019 | 26.1 | 29.2 | 18.1 | 62.5 | |
PELLEGRINI_IRIT_task4_1 | PELLEGRINI multi-task | Cances2019 | 39.7 | 43.0 | 30.9 | 64.7 | |
Lin_ICT_task4_2 | Guiding_learning_2 | Lin2019 | 40.9 | 45.0 | 29.8 | 62.7 | |
Lin_ICT_task4_4 | Guiding_learning_4 | Lin2019 | 41.8 | 46.7 | 28.6 | 64.5 | |
Lin_ICT_task4_3 | Guiding_learning_3 | Lin2019 | 42.7 | 47.7 | 29.4 | 64.8 | |
Lin_ICT_task4_1 | Guiding_learning_1 | Lin2019 | 40.7 | 45.5 | 27.6 | 61.5 | |
Baseline_dcase2019 | DCASE2019 baseline system | Turpault2019 | 25.8 | 29.0 | 18.1 | 53.7 | |
bolun_NWPU_task4_1 | DCASE2019 task4 system | Bolun2019 | 21.7 | 23.0 | 18.2 | 63.3 | |
bolun_NWPU_task4_4 | DCASE2019 task4 system | Bolun2019 | 25.3 | 28.6 | 16.1 | 58.7 | |
bolun_NWPU_task4_3 | DCASE2019 task4 system | Bolun2019 | 23.8 | 26.2 | 17.5 | 61.7 | |
bolun_NWPU_task4_2 | DCASE2019 task4 system | Bolun2019 | 27.8 | 30.1 | 21.7 | 61.6 | |
Agnone_PDL_task4_1 | Mean VAT Teacher | Agnone2019 | 25.0 | 27.1 | 20.0 | 60.4 | |
Kiyokawa_NEC_task4_1 | DCASE2019 SED ResNet self-mask kiyo | Kiyokawa2019 | 27.8 | 30.4 | 22.1 | 66.1 | |
Kiyokawa_NEC_task4_4 | DCASE2019 SED ResNet self-mask kiyo | Kiyokawa2019 | 32.4 | 36.2 | 23.8 | 65.3 | |
Kiyokawa_NEC_task4_3 | DCASE2019 SED ResNet self-mask kiyo | Kiyokawa2019 | 29.4 | 32.9 | 21.2 | 65.7 | |
Kiyokawa_NEC_task4_2 | DCASE2019 SED ResNet self-mask kiyo | Kiyokawa2019 | 28.3 | 32.1 | 19.3 | 62.4 | |
Kothinti_JHU_task4_2 | JHU DCASE2019 task4 system | Kothinti2019 | 30.5 | 32.5 | 24.7 | 53.5 | |
Kothinti_JHU_task4_3 | JHU DCASE2019 task4 system | Kothinti2019 | 29.0 | 31.2 | 23.0 | 52.0 | |
Kothinti_JHU_task4_4 | JHU DCASE2019 task4 system | Kothinti2019 | 29.4 | 31.2 | 24.4 | 52.4 | |
Kothinti_JHU_task4_1 | JHU DCASE2019 task4 system | Kothinti2019 | 30.7 | 33.2 | 23.8 | 53.1 | |
Shi_FRDC_task4_2 | BossLee_FRDC_2 | Shi2019 | 42.0 | 46.1 | 31.5 | 69.8 | |
Shi_FRDC_task4_3 | BossLee_FRDC_3 | Shi2019 | 40.9 | 45.5 | 29.8 | 68.7 | |
Shi_FRDC_task4_4 | BossLee_FRDC_4 | Shi2019 | 41.5 | 46.4 | 29.3 | 67.8 | |
Shi_FRDC_task4_1 | BossLee_FRDC_1 | Shi2019 | 37.0 | 40.2 | 28.9 | 63.0 | |
ZYL_UESTC_task4_1 | UESTC_SICE_task4_1 | Zhang2019 | 29.4 | 31.9 | 23.3 | 62.0 | |
ZYL_UESTC_task4_2 | UESTC_SICE_task4_2 | Zhang2019 | 30.8 | 34.5 | 21.1 | 60.9 | |
Wang_YSU_task4_1 | Wang_YSU_task4_1 | Yang2019 | 6.5 | 7.4 | 4.1 | 26.1 | |
Wang_YSU_task4_2 | Wang_YSU_task4_2 | Yang2019 | 6.2 | 7.2 | 4.0 | 25.4 | |
Wang_YSU_task4_3 | Wang_YSU_task4_3 | Yang2019 | 6.7 | 7.6 | 4.6 | 26.3 | |
Yan_USTC_task4_1 | USTC_CRNN_MT system1 | Yan2019 | 35.8 | 38.2 | 29.3 | 66.1 | |
Yan_USTC_task4_3 | USTC_CRNN_MT system3 | Yan2019 | 35.6 | 38.2 | 28.2 | 64.6 | |
Yan_USTC_task4_4 | USTC_CRNN_MT system4 | Yan2019 | 33.5 | 35.6 | 27.3 | 64.1 | |
Yan_USTC_task4_2 | USTC_CRNN_MT system2 | Yan2019 | 36.2 | 38.8 | 28.7 | 65.2 | |
Lee_KNU_task4_2 | KNUwaveCNN2 | Lee2019 | 25.8 | 27.4 | 21.5 | 49.0 | |
Lee_KNU_task4_4 | KNUwaveCNN4 | Lee2019 | 24.6 | 26.1 | 20.5 | 48.3 | |
Lee_KNU_task4_3 | KNUwaveCNN3 | Lee2019 | 26.7 | 28.1 | 22.9 | 50.2 | |
Lee_KNU_task4_1 | KNUwaveCNN1 | Lee2019 | 26.4 | 27.8 | 22.6 | 49.0 | |
Rakowski_SRPOL_task4_1 | Regularized Surrey9 | Rakowski2019 | 24.2 | 26.2 | 19.2 | 63.4 | |
Lim_ETRI_task4_1 | Lim_task4_1 | Lim2019 | 32.6 | 35.3 | 25.8 | 67.1 | |
Lim_ETRI_task4_2 | Lim_task4_2 | Lim2019 | 33.2 | 36.7 | 24.8 | 69.2 | |
Lim_ETRI_task4_3 | Lim_task4_3 | Lim2019 | 32.5 | 36.3 | 22.4 | 63.2 | |
Lim_ETRI_task4_4 | Lim_task4_4 | Lim2019 | 34.4 | 38.6 | 23.7 | 66.4 |
Teams ranking
Table including only the best performing system per submitting team.
Rank |
Submission code |
Submission name |
Technical Report |
Event-based F-score (Evaluation dataset) |
Event-based F-score (Development dataset) |
---|---|---|---|---|---|
Wang_NUDT_task4_3 | NUDT System for DCASE2019 Task4 | Wang2019 | 17.5 | 22.4 | |
Delphin_OL_task4_2 | DCASE2019 mean-teacher with shifted and noisy data augmentation system | Delphin-Poulat2019 | 42.1 | 43.6 | |
Kong_SURREY_task4_1 | CVSSP cross-task CNN baseline | Kong2019 | 22.3 | 21.3 | |
CTK_NU_task4_1 | CTK_NU_task4_1 | Chan2019 | 31.0 | 30.4 | |
Mishima_NEC_task4_4 | msm_ResNet_4_augmentation_pseudo | Mishima2019 | 19.8 | 24.7 | |
PELLEGRINI_IRIT_task4_1 | PELLEGRINI multi-task | Cances2019 | 39.7 | 39.9 | |
Lin_ICT_task4_3 | Guiding_learning_3 | Lin2019 | 42.7 | 45.3 | |
Baseline_dcase2019 | DCASE2019 baseline system | Turpault2019 | 25.8 | 23.7 | |
bolun_NWPU_task4_2 | DCASE2019 task4 system | Bolun2019 | 27.8 | 31.9 | |
Agnone_PDL_task4_1 | Mean VAT Teacher | Agnone2019 | 25.0 | 59.6 | |
Kiyokawa_NEC_task4_4 | DCASE2019 SED ResNet self-mask kiyo | Kiyokawa2019 | 32.4 | 36.1 | |
Kothinti_JHU_task4_1 | JHU DCASE2019 task4 system | Kothinti2019 | 30.7 | 34.6 | |
Shi_FRDC_task4_2 | BossLee_FRDC_2 | Shi2019 | 42.0 | 42.5 | |
ZYL_UESTC_task4_2 | UESTC_SICE_task4_2 | Zhang2019 | 30.8 | 35.6 | |
Wang_YSU_task4_1 | Wang_YSU_task4_1 | Yang2019 | 6.7 | 19.4 | |
Yan_USTC_task4_2 | USTC_CRNN_MT system2 | Yan2019 | 36.2 | 42.6 | |
Lee_KNU_task4_3 | KNUwaveCNN3 | Lee2019 | 26.7 | 31.6 | |
Rakowski_SRPOL_task4_1 | Regularized Surrey9 | Rakowski2019 | 24.2 | 24.3 | |
Lim_ETRI_task4_4 | Lim_task4_4 | Lim2019 | 34.4 | 40.9 |
Supplementary metrics
Rank |
Submission code |
Submission name |
Technical Report |
Event-based F-score (Evaluation dataset) |
Event-based F-score (Youtube dataset) |
Event-based F-score (Vimeo dataset) |
Segment-based F-score (Evaluation dataset) |
---|---|---|---|---|---|---|---|
Wang_NUDT_task4_3 | NUDT System for DCASE2019 Task4 | Wang2019 | 17.5 | 19.2 | 13.3 | 63.0 | |
Delphin_OL_task4_2 | DCASE2019 mean-teacher with shifted and noisy data augmentation system | Delphin-Poulat2019 | 42.1 | 45.8 | 33.3 | 71.4 | |
Kong_SURREY_task4_1 | CVSSP cross-task CNN baseline | Kong2019 | 22.3 | 24.1 | 17.0 | 59.4 | |
CTK_NU_task4_1 | CTK_NU_task4_1 | Chan2019 | 31.0 | 34.7 | 21.6 | 58.2 | |
Mishima_NEC_task4_4 | msm_ResNet_4_augmentation_pseudo | Mishima2019 | 19.8 | 21.8 | 15.0 | 58.7 | |
PELLEGRINI_IRIT_task4_1 | PELLEGRINI multi-task | Cances2019 | 39.7 | 43.0 | 30.9 | 64.7 | |
Lin_ICT_task4_3 | Guiding_learning_3 | Lin2019 | 42.7 | 47.7 | 29.4 | 64.8 | |
Baseline_dcase2019 | DCASE2019 baseline system | Turpault2019 | 25.8 | 29.0 | 18.1 | 53.7 | |
bolun_NWPU_task4_2 | DCASE2019 task4 system | Bolun2019 | 27.8 | 30.1 | 21.7 | 61.6 | |
Agnone_PDL_task4_1 | Mean VAT Teacher | Agnone2019 | 25.0 | 27.1 | 20.0 | 60.4 | |
Kiyokawa_NEC_task4_4 | DCASE2019 SED ResNet self-mask kiyo | Kiyokawa2019 | 32.4 | 36.2 | 23.8 | 65.3 | |
Kothinti_JHU_task4_1 | JHU DCASE2019 task4 system | Kothinti2019 | 30.7 | 33.2 | 23.8 | 53.1 | |
Shi_FRDC_task4_2 | BossLee_FRDC_2 | Shi2019 | 42.0 | 46.1 | 31.5 | 69.8 | |
ZYL_UESTC_task4_2 | UESTC_SICE_task4_2 | Zhang2019 | 30.8 | 34.5 | 21.1 | 60.9 | |
Wang_YSU_task4_1 | Wang_YSU_task4_1 | Yang2019 | 6.7 | 7.6 | 4.6 | 26.3 | |
Yan_USTC_task4_2 | USTC_CRNN_MT system2 | Yan2019 | 36.2 | 38.8 | 28.7 | 65.2 | |
Lee_KNU_task4_3 | KNUwaveCNN3 | Lee2019 | 26.7 | 28.1 | 22.9 | 50.2 | |
Rakowski_SRPOL_task4_1 | Regularized Surrey9 | Rakowski2019 | 24.2 | 26.2 | 19.2 | 63.4 | |
Lim_ETRI_task4_4 | Lim_task4_4 | Lim2019 | 34.4 | 38.6 | 23.7 | 66.4 |
Class-wise performance
Rank |
Submission code |
Submission name |
Technical Report |
Event-based F-score (Evaluation dataset) |
Alarm Bell Ringing |
Blender | Cat | Dishes | Dog |
Electric shave toothbrush |
Frying |
Running water |
Speech |
Vacuum cleaner |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Wang_NUDT_task4_4 | NUDT System for DCASE2019 Task4 | Wang2019 | 16.8 | 14.0 | 21.5 | 0.4 | 0.2 | 0.3 | 21.5 | 25.0 | 24.6 | 10.7 | 50.2 | |
Wang_NUDT_task4_3 | NUDT System for DCASE2019 Task4 | Wang2019 | 17.5 | 14.0 | 26.1 | 0.4 | 0.0 | 0.3 | 22.5 | 26.8 | 26.3 | 10.7 | 47.9 | |
Wang_NUDT_task4_2 | NUDT System for DCASE2019 Task4 | Wang2019 | 17.2 | 13.2 | 22.1 | 0.4 | 0.2 | 0.3 | 23.6 | 27.8 | 23.1 | 11.6 | 49.8 | |
Wang_NUDT_task4_1 | NUDT System for DCASE2019 Task4 | Wang2019 | 17.2 | 11.9 | 20.9 | 0.4 | 0.2 | 0.0 | 24.9 | 27.7 | 27.0 | 11.7 | 47.5 | |
Delphin_OL_task4_2 | DCASE2019 mean-teacher with shifted and noisy data augmentation system | Delphin-Poulat2019 | 42.1 | 42.6 | 49.2 | 52.9 | 35.2 | 40.9 | 47.5 | 41.4 | 31.9 | 43.9 | 35.7 | |
Delphin_OL_task4_1 | DCASE2019 mean-teacher with shifted data augmentation system | Delphin-Poulat2019 | 38.3 | 41.6 | 40.8 | 51.9 | 37.2 | 37.8 | 41.1 | 39.0 | 22.1 | 41.2 | 29.8 | |
Kong_SURREY_task4_1 | CVSSP cross-task CNN baseline | Kong2019 | 22.3 | 6.2 | 14.2 | 41.7 | 11.1 | 17.1 | 28.7 | 3.0 | 20.8 | 50.3 | 30.3 | |
CTK_NU_task4_2 | CTK_NU_task4_2 | Chan2019 | 29.7 | 20.5 | 42.9 | 40.3 | 0.7 | 22.9 | 37.4 | 30.5 | 20.0 | 39.7 | 41.8 | |
CTK_NU_task4_3 | CTK_NU_task4_3 | Chan2019 | 27.7 | 32.5 | 38.8 | 33.8 | 0.0 | 17.6 | 40.2 | 29.7 | 19.5 | 23.0 | 42.1 | |
CTK_NU_task4_4 | CTK_NU_task4_4 | Chan2019 | 26.9 | 24.0 | 38.9 | 30.7 | 0.7 | 17.2 | 35.3 | 27.6 | 18.5 | 36.3 | 39.9 | |
CTK_NU_task4_1 | CTK_NU_task4_1 | Chan2019 | 31.0 | 25.1 | 38.2 | 27.2 | 7.7 | 25.6 | 50.0 | 35.0 | 24.2 | 26.6 | 50.7 | |
Mishima_NEC_task4_3 | msm_ResNet_3_augmentation | Mishima2019 | 18.3 | 17.9 | 5.3 | 36.5 | 24.9 | 28.7 | 13.9 | 5.0 | 4.6 | 38.0 | 8.4 | |
Mishima_NEC_task4_4 | msm_ResNet_4_augmentation_pseudo | Mishima2019 | 19.8 | 18.7 | 7.9 | 41.2 | 25.1 | 18.2 | 14.5 | 9.5 | 3.6 | 48.6 | 10.5 | |
Mishima_NEC_task4_2 | msm_ResNet_2_pseudo | Mishima2019 | 17.7 | 22.4 | 0.8 | 40.9 | 18.1 | 31.5 | 7.5 | 0.5 | 1.3 | 51.7 | 2.4 | |
Mishima_NEC_task4_1 | msm_ResNet_1_simple | Mishima2019 | 16.7 | 18.3 | 2.4 | 35.8 | 18.6 | 27.1 | 8.0 | 0.8 | 1.4 | 50.5 | 4.5 | |
CANCES_IRIT_task4_2 | CANCES multi-task | Cances2019 | 28.4 | 23.2 | 24.8 | 38.0 | 22.0 | 24.5 | 25.2 | 29.6 | 21.3 | 44.0 | 31.1 | |
CANCES_IRIT_task4_2 | CANCES multi-task | Cances2019 | 26.1 | 18.8 | 26.9 | 20.5 | 19.4 | 11.1 | 27.6 | 40.9 | 14.1 | 45.5 | 36.1 | |
PELLEGRINI_IRIT_task4_1 | PELLEGRINI multi-task | Cances2019 | 39.7 | 35.8 | 35.1 | 60.2 | 32.5 | 35.5 | 35.9 | 37.5 | 27.7 | 47.4 | 49.1 | |
Lin_ICT_task4_2 | Guiding_learning_2 | Lin2019 | 40.9 | 36.4 | 40.5 | 54.2 | 27.0 | 41.5 | 42.0 | 41.7 | 25.7 | 46.2 | 54.2 | |
Lin_ICT_task4_4 | Guiding_learning_4 | Lin2019 | 41.8 | 42.5 | 36.8 | 55.1 | 26.5 | 43.1 | 41.8 | 39.3 | 20.4 | 54.2 | 57.9 | |
Lin_ICT_task4_3 | Guiding_learning_3 | Lin2019 | 42.7 | 42.3 | 40.5 | 55.1 | 26.4 | 42.0 | 44.6 | 41.5 | 21.8 | 54.6 | 58.6 | |
Lin_ICT_task4_1 | Guiding_learning_1 | Lin2019 | 40.7 | 40.2 | 35.8 | 55.3 | 24.6 | 38.7 | 43.1 | 42.0 | 23.6 | 55.9 | 48.3 | |
Baseline_dcase2019 | DCASE2019 baseline system | Turpault2019 | 25.8 | 26.6 | 32.2 | 53.6 | 13.7 | 9.9 | 13.3 | 24.1 | 10.9 | 37.7 | 35.5 | |
bolun_NWPU_task4_1 | DCASE2019 task4 system | Bolun2019 | 21.7 | 6.4 | 18.6 | 40.9 | 12.3 | 9.0 | 32.8 | 12.7 | 19.0 | 46.0 | 19.5 | |
bolun_NWPU_task4_4 | DCASE2019 task4 system | Bolun2019 | 25.3 | 3.7 | 14.6 | 51.6 | 5.9 | 5.6 | 39.8 | 37.4 | 23.6 | 45.7 | 24.8 | |
bolun_NWPU_task4_3 | DCASE2019 task4 system | Bolun2019 | 23.8 | 17.0 | 14.6 | 34.6 | 14.8 | 13.8 | 31.9 | 16.3 | 23.6 | 46.0 | 25.5 | |
bolun_NWPU_task4_2 | DCASE2019 task4 system | Bolun2019 | 27.8 | 16.6 | 18.6 | 46.3 | 20.1 | 21.5 | 34.9 | 28.6 | 19.0 | 46.8 | 25.9 | |
Agnone_PDL_task4_1 | Mean VAT Teacher | Agnone2019 | 25.0 | 33.9 | 34.9 | 44.0 | 19.5 | 2.8 | 12.6 | 23.4 | 11.8 | 39.4 | 27.4 | |
Kiyokawa_NEC_task4_1 | DCASE2019 SED ResNet self-mask kiyo | Kiyokawa2019 | 27.8 | 21.0 | 32.1 | 34.1 | 22.6 | 20.2 | 25.1 | 14.5 | 22.1 | 46.4 | 40.1 | |
Kiyokawa_NEC_task4_4 | DCASE2019 SED ResNet self-mask kiyo | Kiyokawa2019 | 32.4 | 32.4 | 33.2 | 38.3 | 24.3 | 27.6 | 32.7 | 17.0 | 25.0 | 45.8 | 48.1 | |
Kiyokawa_NEC_task4_3 | DCASE2019 SED ResNet self-mask kiyo | Kiyokawa2019 | 29.4 | 32.4 | 27.0 | 38.3 | 24.3 | 27.6 | 25.1 | 13.7 | 19.4 | 45.8 | 40.1 | |
Kiyokawa_NEC_task4_2 | DCASE2019 SED ResNet self-mask kiyo | Kiyokawa2019 | 28.3 | 32.4 | 31.7 | 32.8 | 20.5 | 27.6 | 19.5 | 18.9 | 20.8 | 45.8 | 33.5 | |
Kothinti_JHU_task4_2 | JHU DCASE2019 task4 system | Kothinti2019 | 30.5 | 23.3 | 46.9 | 29.5 | 16.4 | 41.6 | 20.9 | 30.2 | 19.4 | 43.6 | 32.8 | |
Kothinti_JHU_task4_3 | JHU DCASE2019 task4 system | Kothinti2019 | 29.0 | 22.1 | 43.9 | 32.5 | 12.7 | 36.7 | 18.9 | 25.8 | 19.4 | 43.5 | 34.4 | |
Kothinti_JHU_task4_4 | JHU DCASE2019 task4 system | Kothinti2019 | 29.4 | 23.1 | 46.0 | 32.0 | 14.8 | 37.4 | 19.7 | 28.2 | 17.8 | 44.6 | 30.4 | |
Kothinti_JHU_task4_1 | JHU DCASE2019 task4 system | Kothinti2019 | 30.7 | 21.2 | 45.2 | 30.6 | 16.4 | 40.8 | 20.7 | 28.8 | 23.6 | 42.5 | 36.7 | |
Shi_FRDC_task4_2 | BossLee_FRDC_2 | Shi2019 | 42.0 | 45.8 | 44.2 | 63.2 | 32.2 | 27.5 | 41.9 | 27.6 | 21.8 | 58.7 | 56.5 | |
Shi_FRDC_task4_3 | BossLee_FRDC_3 | Shi2019 | 40.9 | 45.5 | 44.4 | 65.2 | 27.1 | 32.5 | 36.1 | 29.1 | 19.1 | 58.0 | 52.4 | |
Shi_FRDC_task4_4 | BossLee_FRDC_4 | Shi2019 | 41.5 | 48.0 | 42.6 | 62.6 | 29.4 | 33.6 | 35.5 | 29.9 | 19.0 | 59.1 | 55.0 | |
Shi_FRDC_task4_1 | BossLee_FRDC_1 | Shi2019 | 37.0 | 45.1 | 39.3 | 58.4 | 27.2 | 33.9 | 28.2 | 28.9 | 10.0 | 56.0 | 43.0 | |
ZYL_UESTC_task4_1 | UESTC_SICE_task4_1 | Zhang2019 | 29.4 | 32.8 | 41.5 | 54.7 | 8.1 | 1.0 | 25.5 | 21.6 | 17.3 | 56.6 | 35.3 | |
ZYL_UESTC_task4_2 | UESTC_SICE_task4_2 | Zhang2019 | 30.8 | 30.9 | 29.3 | 52.4 | 17.1 | 21.0 | 31.4 | 21.1 | 21.9 | 44.8 | 38.1 | |
Wang_YSU_task4_1 | Wang_YSU_task4_1 | Yang2019 | 6.5 | 1.7 | 0.0 | 3.3 | 1.4 | 3.4 | 7.0 | 11.9 | 3.1 | 19.2 | 13.6 | |
Wang_YSU_task4_2 | Wang_YSU_task4_2 | Yang2019 | 6.2 | 2.3 | 6.2 | 7.6 | 2.9 | 0.7 | 4.3 | 7.7 | 3.4 | 16.0 | 11.4 | |
Wang_YSU_task4_3 | Wang_YSU_task4_3 | Yang2019 | 6.7 | 1.9 | 6.2 | 7.6 | 2.9 | 1.0 | 4.3 | 9.6 | 6.0 | 16.0 | 11.0 | |
Yan_USTC_task4_1 | USTC_CRNN_MT system1 | Yan2019 | 35.8 | 23.3 | 36.6 | 52.8 | 28.0 | 41.4 | 22.6 | 42.6 | 34.9 | 37.0 | 39.2 | |
Yan_USTC_task4_3 | USTC_CRNN_MT system3 | Yan2019 | 35.6 | 23.3 | 33.5 | 57.1 | 32.1 | 41.8 | 23.1 | 42.6 | 31.9 | 32.6 | 37.5 | |
Yan_USTC_task4_4 | USTC_CRNN_MT system4 | Yan2019 | 33.5 | 23.3 | 33.5 | 57.1 | 32.1 | 41.8 | 23.1 | 22.2 | 31.9 | 32.6 | 37.5 | |
Yan_USTC_task4_2 | USTC_CRNN_MT system2 | Yan2019 | 36.2 | 18.5 | 39.2 | 52.3 | 26.8 | 41.4 | 30.5 | 42.6 | 34.9 | 38.3 | 37.3 | |
Lee_KNU_task4_2 | KNUwaveCNN2 | Lee2019 | 25.8 | 25.8 | 21.2 | 24.4 | 11.4 | 25.7 | 28.0 | 34.7 | 16.6 | 38.2 | 32.4 | |
Lee_KNU_task4_4 | KNUwaveCNN4 | Lee2019 | 24.6 | 26.9 | 30.0 | 20.1 | 10.4 | 26.5 | 27.7 | 30.9 | 16.4 | 31.4 | 26.0 | |
Lee_KNU_task4_3 | KNUwaveCNN3 | Lee2019 | 26.7 | 25.8 | 23.4 | 24.3 | 11.4 | 25.8 | 28.9 | 36.3 | 16.7 | 38.4 | 36.0 | |
Lee_KNU_task4_1 | KNUwaveCNN1 | Lee2019 | 26.4 | 25.8 | 27.0 | 24.4 | 11.4 | 25.7 | 26.9 | 35.1 | 16.2 | 38.2 | 33.5 | |
Rakowski_SRPOL_task4_1 | Regularized Surrey9 | Rakowski2019 | 24.2 | 25.6 | 21.6 | 25.1 | 14.4 | 12.9 | 25.7 | 24.9 | 17.5 | 50.6 | 23.7 | |
Lim_ETRI_task4_1 | Lim_task4_1 | Lim2019 | 32.6 | 22.2 | 41.7 | 53.1 | 17.2 | 29.2 | 12.6 | 36.0 | 21.8 | 50.8 | 41.4 | |
Lim_ETRI_task4_2 | Lim_task4_2 | Lim2019 | 33.2 | 26.9 | 36.7 | 53.7 | 19.3 | 27.1 | 14.0 | 35.9 | 23.0 | 52.4 | 42.9 | |
Lim_ETRI_task4_3 | Lim_task4_3 | Lim2019 | 32.5 | 25.7 | 31.6 | 52.6 | 20.1 | 35.2 | 15.9 | 33.2 | 19.5 | 58.4 | 32.6 | |
Lim_ETRI_task4_4 | Lim_task4_4 | Lim2019 | 34.4 | 26.2 | 35.5 | 57.2 | 24.1 | 33.1 | 17.4 | 33.3 | 21.5 | 58.5 | 37.1 |
System characteristics
General characteristics
Rank | Code |
Technical Report |
Event-based F-score (Eval) |
Sampling rate |
Data augmentation |
Features |
---|---|---|---|---|---|---|
Wang_NUDT_task4_4 | Wang2019 | 16.8 | 44.1kHz | mixup | log-mel energies, delta features | |
Wang_NUDT_task4_3 | Wang2019 | 17.5 | 44.1kHz | mixup | log-mel energies, delta features | |
Wang_NUDT_task4_2 | Wang2019 | 17.2 | 44.1kHz | mixup | log-mel energies, delta features | |
Wang_NUDT_task4_1 | Wang2019 | 17.2 | 44.1kHz | mixup | log-mel energies, delta features | |
Delphin_OL_task4_2 | Delphin-Poulat2019 | 42.1 | 22.05kHz | noise addition, time shifting, frequency shifting | log-mel energies | |
Delphin_OL_task4_1 | Delphin-Poulat2019 | 38.3 | 22.05kHz | time shifting, frequency shifting | log-mel energies | |
Kong_SURREY_task4_1 | Kong2019 | 22.3 | 32kHz | log-mel energies | ||
CTK_NU_task4_2 | Chan2019 | 29.7 | 32kHz | log-mel energies | ||
CTK_NU_task4_3 | Chan2019 | 27.7 | 32kHz | log-mel energies | ||
CTK_NU_task4_4 | Chan2019 | 26.9 | 32kHz | log-mel energies | ||
CTK_NU_task4_1 | Chan2019 | 31.0 | 32kHz | log-mel energies | ||
Mishima_NEC_task4_3 | Mishima2019 | 18.3 | 44.1kHz | block mixing,mixup | log-mel energies | |
Mishima_NEC_task4_4 | Mishima2019 | 19.8 | 44.1kHz | block mixing,mixup | log-mel energies | |
Mishima_NEC_task4_2 | Mishima2019 | 17.7 | 44.1kHz | log-mel energies | ||
Mishima_NEC_task4_1 | Mishima2019 | 16.7 | 44.1kHz | log-mel energies | ||
CANCES_IRIT_task4_2 | Cances2019 | 28.4 | 22KHz | pitch_shifting, time stretching, level, noise | log-mel energies | |
CANCES_IRIT_task4_2 | Cances2019 | 26.1 | 22KHz | pitch_shifting, time stretching, level, noise | log-mel energies | |
PELLEGRINI_IRIT_task4_1 | Cances2019 | 39.7 | 22KHz | pitch_shifting, time stretching, level | log-mel energies | |
Lin_ICT_task4_2 | Lin2019 | 40.9 | 44.1kHz | log-mel energies | ||
Lin_ICT_task4_4 | Lin2019 | 41.8 | 44.1kHz | log-mel energies | ||
Lin_ICT_task4_3 | Lin2019 | 42.7 | 44.1kHz | log-mel energies | ||
Lin_ICT_task4_1 | Lin2019 | 40.7 | 44.1kHz | log-mel energies | ||
Baseline_dcase2019 | Turpault2019 | 25.8 | 44.1kHz | log-mel energies | ||
bolun_NWPU_task4_1 | Bolun2019 | 21.7 | 32kHz | event adding | log-mel energies | |
bolun_NWPU_task4_4 | Bolun2019 | 25.3 | 32kHz | event adding | log-mel energies | |
bolun_NWPU_task4_3 | Bolun2019 | 23.8 | 32kHz | event adding | log-mel energies | |
bolun_NWPU_task4_2 | Bolun2019 | 27.8 | 32kHz | event adding | log-mel energies | |
Agnone_PDL_task4_1 | Agnone2019 | 25.0 | 44.1kHz | VAT | log-mel energies | |
Kiyokawa_NEC_task4_1 | Kiyokawa2019 | 27.8 | 44.1kHz | mixup | log-mel energies | |
Kiyokawa_NEC_task4_4 | Kiyokawa2019 | 32.4 | 44.1kHz | mixup | log-mel energies | |
Kiyokawa_NEC_task4_3 | Kiyokawa2019 | 29.4 | 44.1kHz | mixup | log-mel energies | |
Kiyokawa_NEC_task4_2 | Kiyokawa2019 | 28.3 | 44.1kHz | mixup | log-mel energies | |
Kothinti_JHU_task4_2 | Kothinti2019 | 30.5 | 44.1kHz | log-mel energies, auditory spectrogram | ||
Kothinti_JHU_task4_3 | Kothinti2019 | 29.0 | 44.1kHz | log-mel energies, auditory spectrogram | ||
Kothinti_JHU_task4_4 | Kothinti2019 | 29.4 | 44.1kHz | log-mel energies, auditory spectrogram | ||
Kothinti_JHU_task4_1 | Kothinti2019 | 30.7 | 44.1kHz | log-mel energies, auditory spectrogram | ||
Shi_FRDC_task4_2 | Shi2019 | 42.0 | 44.1kHz | Gaussian noise | log-mel energies | |
Shi_FRDC_task4_3 | Shi2019 | 40.9 | 44.1kHz | Gaussian noise | log-mel energies | |
Shi_FRDC_task4_4 | Shi2019 | 41.5 | 44.1kHz | Gaussian noise | log-mel energies | |
Shi_FRDC_task4_1 | Shi2019 | 37.0 | 44.1kHz | Gaussian noise | log-mel energies | |
ZYL_UESTC_task4_1 | Zhang2019 | 29.4 | 44.1kHz | mel-spectrogram | ||
ZYL_UESTC_task4_2 | Zhang2019 | 30.8 | 44.1kHz | mel-spectrogram | ||
Wang_YSU_task4_1 | Yang2019 | 6.5 | 44.1kHz | log-mel energies | ||
Wang_YSU_task4_2 | Yang2019 | 6.2 | 44.1kHz | log-mel energies | ||
Wang_YSU_task4_3 | Yang2019 | 6.7 | 44.1kHz | log-mel energies | ||
Yan_USTC_task4_1 | Yan2019 | 35.8 | 44.1kHz | SpecAugment | log-mel energies | |
Yan_USTC_task4_3 | Yan2019 | 35.6 | 44.1kHz | SpecAugment | log-mel energies | |
Yan_USTC_task4_4 | Yan2019 | 33.5 | 44.1kHz | SpecAugment | log-mel energies | |
Yan_USTC_task4_2 | Yan2019 | 36.2 | 44.1kHz | SpecAugment | log-mel energies | |
Lee_KNU_task4_2 | Lee2019 | 25.8 | 44.1kHz | waveform | ||
Lee_KNU_task4_4 | Lee2019 | 24.6 | 44.1kHz | notch filter | waveform | |
Lee_KNU_task4_3 | Lee2019 | 26.7 | 44.1kHz | waveform | ||
Lee_KNU_task4_1 | Lee2019 | 26.4 | 44.1kHz | waveform | ||
Rakowski_SRPOL_task4_1 | Rakowski2019 | 24.2 | 32kHz | occlusions | log-mel energies | |
Lim_ETRI_task4_1 | Lim2019 | 32.6 | 44.1kHz | SpecAugment | log-mel energies | |
Lim_ETRI_task4_2 | Lim2019 | 33.2 | 44.1kHz | SpecAugment | log-mel energies | |
Lim_ETRI_task4_3 | Lim2019 | 32.5 | 44.1kHz | SpecAugment | log-mel energies | |
Lim_ETRI_task4_4 | Lim2019 | 34.4 | 44.1kHz | SpecAugment | log-mel energies |
Machine learning characteristics
Rank | Code |
Technical Report |
Event-based F-score (Eval) |
Classifier | Semi-supervised approach | Post-processing |
Segmentation method |
Decision making |
---|---|---|---|---|---|---|---|---|
Wang_NUDT_task4_4 | Wang2019 | 16.8 | CRNN | pseudo-labelling | mean probabilities | |||
Wang_NUDT_task4_3 | Wang2019 | 17.5 | CRNN | pseudo-labelling | mean probabilities | |||
Wang_NUDT_task4_2 | Wang2019 | 17.2 | CRNN | pseudo-labelling | mean probabilities | |||
Wang_NUDT_task4_1 | Wang2019 | 17.2 | CRNN | pseudo-labelling | mean probabilities | |||
Delphin_OL_task4_2 | Delphin-Poulat2019 | 42.1 | CRNN | mean-teacher student | median filtering (class-dependent) | |||
Delphin_OL_task4_1 | Delphin-Poulat2019 | 38.3 | CRNN | mean-teacher student | median filtering (class-dependent) | |||
Kong_SURREY_task4_1 | Kong2019 | 22.3 | CNN | supervised | median filtering | |||
CTK_NU_task4_2 | Chan2019 | 29.7 | NMF, CNN | non-negative matrix factorization | ||||
CTK_NU_task4_3 | Chan2019 | 27.7 | NMF, CNN | non-negative matrix factorization | ||||
CTK_NU_task4_4 | Chan2019 | 26.9 | NMF, CNN | non-negative matrix factorization | ||||
CTK_NU_task4_1 | Chan2019 | 31.0 | NMF, CNN | non-negative matrix factorization | ||||
Mishima_NEC_task4_3 | Mishima2019 | 18.3 | ResNet | median filtering | ||||
Mishima_NEC_task4_4 | Mishima2019 | 19.8 | ResNet | pseudo-labelling | median filtering | |||
Mishima_NEC_task4_2 | Mishima2019 | 17.7 | ResNet | pseudo-labelling | time aggregation, median filtering | |||
Mishima_NEC_task4_1 | Mishima2019 | 16.7 | ResNet | median filtering | ||||
CANCES_IRIT_task4_2 | Cances2019 | 28.4 | CRNN | smoothed moving average | ||||
CANCES_IRIT_task4_2 | Cances2019 | 26.1 | CRNN | smoothed moving average | ||||
PELLEGRINI_IRIT_task4_1 | Cances2019 | 39.7 | CRNN | smoothed moving average | ||||
Lin_ICT_task4_2 | Lin2019 | 40.9 | CNN | guiding learning with a more professional teacher | median filtering (with adaptive window size) | attention layer | ||
Lin_ICT_task4_4 | Lin2019 | 41.8 | CNN | guiding learning with a more professional teacher | median filtering (with adaptive window size) | attention layer | ||
Lin_ICT_task4_3 | Lin2019 | 42.7 | CNN | guiding learning with a more professional teacher | median filtering (with adaptive window size) | attention layer | ||
Lin_ICT_task4_1 | Lin2019 | 40.7 | CNN | guiding learning with a more professional teacher | median filtering (with adaptive window size) | attention layer | ||
Baseline_dcase2019 | Turpault2019 | 25.8 | CRNN | mean-teacher student | median filtering | |||
bolun_NWPU_task4_1 | Bolun2019 | 21.7 | CNN | mean-teacher student | median filtering | |||
bolun_NWPU_task4_4 | Bolun2019 | 25.3 | CNN, RNN, ensemble | mean-teacher student | median filtering | |||
bolun_NWPU_task4_3 | Bolun2019 | 23.8 | CNN | mean-teacher student | median filtering | |||
bolun_NWPU_task4_2 | Bolun2019 | 27.8 | CNN, RNN, ensemble | mean-teacher student | median filtering | |||
Agnone_PDL_task4_1 | Agnone2019 | 25.0 | CRNN | mean-teacher student, VAT | median filtering | attention layer | ||
Kiyokawa_NEC_task4_1 | Kiyokawa2019 | 27.8 | ResNet, SENet | time thresholding | ||||
Kiyokawa_NEC_task4_4 | Kiyokawa2019 | 32.4 | ResNet, SENet | time thresholding | ||||
Kiyokawa_NEC_task4_3 | Kiyokawa2019 | 29.4 | ResNet, SENet | time thresholding | ||||
Kiyokawa_NEC_task4_2 | Kiyokawa2019 | 28.3 | ResNet, SENet | time thresholding | ||||
Kothinti_JHU_task4_2 | Kothinti2019 | 30.5 | CRNN, RBM, CRBM, PCA | mean-teacher student, pseudo-labelling | saliency | majority vote | ||
Kothinti_JHU_task4_3 | Kothinti2019 | 29.0 | CRNN, RBM, CRBM, PCA, Kalman Filter | mean-teacher student, pseudo-labelling | saliency | |||
Kothinti_JHU_task4_4 | Kothinti2019 | 29.4 | CRNN, RBM, CRBM, PCA, Kalman Filter | mean-teacher student, pseudo-labelling | saliency | majority vote | ||
Kothinti_JHU_task4_1 | Kothinti2019 | 30.7 | CRNN, RBM, CRBM, PCA | mean-teacher student, pseudo-labelling | saliency | |||
Shi_FRDC_task4_2 | Shi2019 | 42.0 | CRNN | interpolation consistency training | median filtering | |||
Shi_FRDC_task4_3 | Shi2019 | 40.9 | CRNN | MixMatch | median filtering | |||
Shi_FRDC_task4_4 | Shi2019 | 41.5 | CRNN | mean-teacher student, interpolation consistency training, MixMatch | median filtering | |||
Shi_FRDC_task4_1 | Shi2019 | 37.0 | CRNN | mean-teacher student | median filtering | |||
ZYL_UESTC_task4_1 | Zhang2019 | 29.4 | CNN,ResNet,RNN | mean-teacher student | median filtering | attention layer | ||
ZYL_UESTC_task4_2 | Zhang2019 | 30.8 | CNN,ResNet,RNN | mean-teacher student | median filtering | attention layer | ||
Wang_YSU_task4_1 | Yang2019 | 6.5 | CMRANN-MT | mean-teacher student | median filtering | |||
Wang_YSU_task4_2 | Yang2019 | 6.2 | CMRANN-MT | mean-teacher student | median filtering | |||
Wang_YSU_task4_3 | Yang2019 | 6.7 | CMRANN-MT | mean-teacher student | median filtering | |||
Yan_USTC_task4_1 | Yan2019 | 35.8 | CRNN | mean-teacher student | median filtering | |||
Yan_USTC_task4_3 | Yan2019 | 35.6 | CRNN | mean-teacher student | median filtering | |||
Yan_USTC_task4_4 | Yan2019 | 33.5 | CRNN | mean-teacher student | median filtering | |||
Yan_USTC_task4_2 | Yan2019 | 36.2 | CRNN | mean-teacher student | median filtering | |||
Lee_KNU_task4_2 | Lee2019 | 25.8 | CNN | pseudo-labelling, mean-teacher student | minimum gap/length compensation | double thresholding | ||
Lee_KNU_task4_4 | Lee2019 | 24.6 | CNN | pseudo-labelling, mean-teacher student | minimum gap/length compensation | double thresholding | ||
Lee_KNU_task4_3 | Lee2019 | 26.7 | CNN | pseudo-labelling, mean-teacher student | minimum gap/length compensation | double thresholding | ||
Lee_KNU_task4_1 | Lee2019 | 26.4 | CNN | pseudo-labelling, mean-teacher student | double thresholding | |||
Rakowski_SRPOL_task4_1 | Rakowski2019 | 24.2 | CNN | voice activity detection | ||||
Lim_ETRI_task4_1 | Lim2019 | 32.6 | CRNN, Ensemble | median filtering | mean probabilities, thresholding | |||
Lim_ETRI_task4_2 | Lim2019 | 33.2 | CRNN, Ensemble | median filtering | mean probabilities, thresholding | |||
Lim_ETRI_task4_3 | Lim2019 | 32.5 | CRNN, Ensemble | mean-teacher student | median filtering | mean probabilities, thresholding | ||
Lim_ETRI_task4_4 | Lim2019 | 34.4 | CRNN, Ensemble | mean-teacher student | median filtering | mean probabilities, thresholding |
Complexity
Rank | Code |
Technical Report |
Event-based F-score (Eval) |
Model complexity |
Ensemble subsystems |
Training time |
---|---|---|---|---|---|---|
Wang_NUDT_task4_4 | Wang2019 | 16.8 | 4034268 | 3 | 2h (1 GTX 1080 Ti) | |
Wang_NUDT_task4_3 | Wang2019 | 17.5 | 4034268 | 3 | 2h (1 GTX 1080 Ti) | |
Wang_NUDT_task4_2 | Wang2019 | 17.2 | 4034268 | 3 | 2h (1 GTX 1080 Ti) | |
Wang_NUDT_task4_1 | Wang2019 | 17.2 | 4034268 | 3 | 2h (1 GTX 1080 Ti) | |
Delphin_OL_task4_2 | Delphin-Poulat2019 | 42.1 | 1582036 | 21h (1 GTX 1080) | ||
Delphin_OL_task4_1 | Delphin-Poulat2019 | 38.3 | 1582036 | 17h (1 GTX 1080) | ||
Kong_SURREY_task4_1 | Kong2019 | 22.3 | 4686144 | 1h (1 Titan XP) | ||
CTK_NU_task4_2 | Chan2019 | 29.7 | 4309450 | 0.5h (1 GTX 1060) | ||
CTK_NU_task4_3 | Chan2019 | 27.7 | 4309450 | 0.5h (1 GTX 1060) | ||
CTK_NU_task4_4 | Chan2019 | 26.9 | 4309450 | 0.5h (1 GTX 1060) | ||
CTK_NU_task4_1 | Chan2019 | 31.0 | 4309450 | 0.5h (1 GTX 1060) | ||
Mishima_NEC_task4_3 | Mishima2019 | 18.3 | 23865546 | 6h (4 GTX 1080 Ti) | ||
Mishima_NEC_task4_4 | Mishima2019 | 19.8 | 23865546 | 6h (4 GTX 1080 Ti) | ||
Mishima_NEC_task4_2 | Mishima2019 | 17.7 | 23865546 | 8h (4 GTX 1080 Ti) | ||
Mishima_NEC_task4_1 | Mishima2019 | 16.7 | 23865546 | 6h (4 GTX 1080 Ti) | ||
CANCES_IRIT_task4_2 | Cances2019 | 28.4 | 420116 | 1h (1 GTX 1080 Ti) | ||
CANCES_IRIT_task4_2 | Cances2019 | 26.1 | 470036 | 1h (1 GTX 1080 Ti) | ||
PELLEGRINI_IRIT_task4_1 | Cances2019 | 39.7 | 165460 | 12h (1 GTX 1080 Ti) | ||
Lin_ICT_task4_2 | Lin2019 | 40.9 | 1209744 | 3h (1 GTX 1080 Ti) | ||
Lin_ICT_task4_4 | Lin2019 | 41.8 | 6048720 | 5 | 3h (1 GTX 1080 Ti) | |
Lin_ICT_task4_3 | Lin2019 | 42.7 | 7258464 | 6 | 3h (1 GTX 1080 Ti) | |
Lin_ICT_task4_1 | Lin2019 | 40.7 | 1209744 | 3h (1 GTX 1080 Ti) | ||
Baseline_dcase2019 | Turpault2019 | 25.8 | 214356 | 3h (1 GTX 1080 Ti) | ||
bolun_NWPU_task4_1 | Bolun2019 | 21.7 | ||||
bolun_NWPU_task4_4 | Bolun2019 | 25.3 | ||||
bolun_NWPU_task4_3 | Bolun2019 | 23.8 | ||||
bolun_NWPU_task4_2 | Bolun2019 | 27.8 | ||||
Agnone_PDL_task4_1 | Agnone2019 | 25.0 | 214356 | 2h (1 GTX 1080 Ti) | ||
Kiyokawa_NEC_task4_1 | Kiyokawa2019 | 27.8 | 11408962 | 8h (4 GTX 1080 Ti) | ||
Kiyokawa_NEC_task4_4 | Kiyokawa2019 | 32.4 | 11408962 | 12h (4 GTX 1080 Ti) | ||
Kiyokawa_NEC_task4_3 | Kiyokawa2019 | 29.4 | 11408962 | 12h (4 GTX 1080 Ti) | ||
Kiyokawa_NEC_task4_2 | Kiyokawa2019 | 28.3 | 11408962 | 12h (4 GTX 1080 Ti) | ||
Kothinti_JHU_task4_2 | Kothinti2019 | 30.5 | 1200000 | 3 | 2h (1 GTX 2080 Ti) | |
Kothinti_JHU_task4_3 | Kothinti2019 | 29.0 | 520000 | 2h (1 GTX 2080 Ti) | ||
Kothinti_JHU_task4_4 | Kothinti2019 | 29.4 | 1200000 | 3 | 2h (1 GTX 2080 Ti) | |
Kothinti_JHU_task4_1 | Kothinti2019 | 30.7 | 520000 | 2h (1 GTX 2080 Ti) | ||
Shi_FRDC_task4_2 | Shi2019 | 42.0 | 6878340 | 9 | 24h (1 TITAN Xp) | |
Shi_FRDC_task4_3 | Shi2019 | 40.9 | 4585560 | 6 | 24h (1 TITAN Xp) | |
Shi_FRDC_task4_4 | Shi2019 | 41.5 | 18342240 | 24 | 24h (1 TITAN Xp) | |
Shi_FRDC_task4_1 | Shi2019 | 37.0 | 6878340 | 9 | 24h (1 TITAN Xp) | |
ZYL_UESTC_task4_1 | Zhang2019 | 29.4 | 298122 | 2.5h (1 GTX 1080 Ti) | ||
ZYL_UESTC_task4_2 | Zhang2019 | 30.8 | 298698 | 4h (1 GTX 1080 Ti) | ||
Wang_YSU_task4_1 | Yang2019 | 6.5 | 126090 | 3h (1 GTX 1080 Ti) | ||
Wang_YSU_task4_2 | Yang2019 | 6.2 | 126090 | 3h (1 GTX 1080 Ti) | ||
Wang_YSU_task4_3 | Yang2019 | 6.7 | 126090 | 3h (1 GTX 1080 Ti) | ||
Yan_USTC_task4_1 | Yan2019 | 35.8 | 7068540 | 5 | 3.5h (1 GTX 1080 Ti) | |
Yan_USTC_task4_3 | Yan2019 | 35.6 | 7068540 | 5 | 3.5h (1 GTX 1080 Ti) | |
Yan_USTC_task4_4 | Yan2019 | 33.5 | 7068540 | 5 | 3.5h (1 GTX 1080 Ti) | |
Yan_USTC_task4_2 | Yan2019 | 36.2 | 7068540 | 5 | 3.5h (1 GTX 1080 Ti) | |
Lee_KNU_task4_2 | Lee2019 | 25.8 | 3425776 | 6h (1 GTX Titan V) | ||
Lee_KNU_task4_4 | Lee2019 | 24.6 | 3425776 | 6h (1 GTX Titan V) | ||
Lee_KNU_task4_3 | Lee2019 | 26.7 | 3425776 | 6h (1 GTX Titan V) | ||
Lee_KNU_task4_1 | Lee2019 | 26.4 | 3425776 | 6h (1 GTX Titan V) | ||
Rakowski_SRPOL_task4_1 | Rakowski2019 | 24.2 | 4691274 | 6h (1 Tesla P40) | ||
Lim_ETRI_task4_1 | Lim2019 | 32.6 | 10572448 | 4 | ||
Lim_ETRI_task4_2 | Lim2019 | 33.2 | 42289792 | 16 | ||
Lim_ETRI_task4_3 | Lim2019 | 32.5 | 10572448 | 4 | ||
Lim_ETRI_task4_4 | Lim2019 | 34.4 | 42289792 | 16 |
Technical reports
VIRTUAL ADVERSARIAL TRAINING SYSTEM FOR DCASE 2019 TASK 4
Agnone, Anthony and Altaf, Umair
Pindrop, Audio Research department, Atlanta, United States
Agnone_PDL_task4_1
VIRTUAL ADVERSARIAL TRAINING SYSTEM FOR DCASE 2019 TASK 4
Agnone, Anthony and Altaf, Umair
Pindrop, Audio Research department, Atlanta, United States
Abstract
This paper describes the approach used for Task 4 of the DCASE 2019 Challenge. This tasks challenges systems to learn from a combination of labeled and unlabeled data. Furthermore, the labeled data is itself a combination of strongly-informed, coarse time-based data and weakly-informed, fine time-based synthetic data. The baseline system builds off of the winning solution from last year, and adds the synthetic data, which was not provided in that iteration of the challenge. Our solution uses the semi-supervised virtual adversarial training method, in addition to the Mean Teacher consistency loss, to encourage generalization from weakly-labeled and unlabeled data. The chosen system parametrization achieves a 59.57% macro F1 score.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Data augmentation | VAT |
Features | log-mel energies |
Classifier | CRNN, mean-teacher student + VAT |
CLASS WISE FUSION SYSTEM FOR DCASE 2019 TASK4
Wang, Bolun and Wu, Hao and Bai, Jisheng and Chen, Chen and Wang, Mou and Wang, Rui and Fu, Zhonghua and Chen, Jianfeng and Rahardja, Susanto and Zhang, Xiaolei
Northwestern Polytechnical University, School of Computer Science, Xi'an, China
bolun_NWPU_task4_1 bolun_NWPU_task4_2 bolun_NWPU_task4_3 bolun_NWPU_task4_4
CLASS WISE FUSION SYSTEM FOR DCASE 2019 TASK4
Wang, Bolun and Wu, Hao and Bai, Jisheng and Chen, Chen and Wang, Mou and Wang, Rui and Fu, Zhonghua and Chen, Jianfeng and Rahardja, Susanto and Zhang, Xiaolei
Northwestern Polytechnical University, School of Computer Science, Xi'an, China
Abstract
In this report, we introduce our system for Task4 of Dcase 2019 challenges (Sound event detection in domestic environments). The goal of the task is to evaluate systems for the detection of sound events using real data either weakly labeled or unlabeled, along with strong labeled simulated data. With the aim of improving performance with large amount of unlabeled data, and a small labeled training data. We focus on three parts: data augmentation, loss function, and network fusion.
System characteristics
Input | mono |
Sampling rate | 32kHz |
Data augmentation | event adding |
Features | log-mel energies |
Classifier | CNN, mean-teacher student |
MULTI TASK LEARNING AND POST PROCESSING OPTIMIZATION FOR SOUND EVENT DETECTION
Cances, Léo and Pellegrini, Thomas and Guyot, Patrice
IRIT, Université de Toulouse, CNRS, Toulouse, France
Abstract
In this paper, we report our experiments in Sound Event Detection in domestic environments in the framework of the DCASE 2019 Task 4 challenge. The novelty, this year, lies in the availability of three different subsets for development: a weakly annotated dataset, a strongly annotated synthetic subset, and an unlabeled subset. The weak annotations, unlike the strong ones, provide tags from audio events but do not provide temporal boundaries. The task objective is twofold: detecting audio events (multi label classification at recording level), and localizing the events precisely within the recordings. First, we explore multi task training to take advantage of the synthetic and unlabeled in domain subsets. Then, we applied various temporal segmentation methods using optimization algorithms to obtain the best performing segmentation parameters. On the multi-task itself, we explored two strategies based on convolutional recurrent neural networks (CRNN): 1) a single branch model with two outputs, 2) multi branch models with two or three outputs. These approaches outperform the baseline of 23.7% in F measure by a large margin, with values of respectively 39.9% and 33.8% for the first and second strategy, on the official validation subset comprised of 1103 recordings.
System characteristics
Input | mono |
Sampling rate | 22kHz |
Data augmentation | pitch_shifting, time stretching, level, noise |
Features | log-mel energies |
Classifier | CRNN |
NON-NEGATIVE MATRIX FACTORIZATION-CONVOLUTION NEURAL NETWORK (NMF-CNN) FOR SOUND EVENT DETECTION
Chan, Teck Kai and Chin, Cheng Siong and Li, Ye
Newcastle University, Singapore
CTK_NU_task4_1 CTK_NU_task4_2 CTK_NU_task4_3 CTK_NU_task4_4
NON-NEGATIVE MATRIX FACTORIZATION-CONVOLUTION NEURAL NETWORK (NMF-CNN) FOR SOUND EVENT DETECTION
Chan, Teck Kai and Chin, Cheng Siong and Li, Ye
Newcastle University, Singapore
Abstract
The main scientific question of this year DCASE challenge, Task 4 - Sound Event Detection in Domestic Environments, is to investigate the types of data (strongly labeled synthetic data, weakly labeled data, unlabeled in domain data) required to achieve the best performing system. In this paper, we proposed a deep learning model that integrates Convolution Neural Network (CNN) with Non-Negative Matrix Factorization (NMF). The best performing model can achieve a higher event based F1-score of 30.39% as compared to the baseline system that achieved an F1-score of 23.7% on the validation dataset. Based on the results, even though synthetic data is strongly labeled, it cannot be used as a sole source of training data and resulted in the worst performance. Although, using a combination of weakly and strongly labeled data can achieve the highest F1-score, but the increment was not significant and may not be worth- while to include synthetic data into the training set. Results have also suggested that the quality of labeling unlabeled in domain data is essential and can have an adverse effect on the accuracy rather than improving the model performance if labeling was not done accurately.
System characteristics
Input | mono |
Sampling rate | 32kHz |
Features | log-mel energies |
Classifier | NMF, CNN |
MEAN TEACHER WITH DATA AUGMENTATION FOR DCASE 2019 TASK 4
Delphin-Poulat, Lionel and Plapous, Cyril
Orange Labs Lannion, France
Delphin_OL_task4_1 Delphin_OL_task4_2
MEAN TEACHER WITH DATA AUGMENTATION FOR DCASE 2019 TASK 4
Delphin-Poulat, Lionel and Plapous, Cyril
Orange Labs Lannion, France
Abstract
In this paper, we present our neural network for the DCASE 2019 challenge’s Task 4 (Sound event detection in domestic environments) [1]. The goal of the task is to evaluate systems for the detection of sound events using real data either weakly labeled or unlabeled and simulated data that is strongly labeled. We propose a mean-teacher model with convolutional neural network (CNN) and recurrent neural network (RNN) together with data augmentation and a median window tuned for each class based on prior knowledge.
System characteristics
Input | mono |
Sampling rate | 22.05kHz |
Data augmentation | time shifting, frequency shifting |
Features | log-mel energies |
Classifier | CRNN, mean-teacher student |
SOUND EVENT DETECTION WITH RESNET AND SELF-MASK MODULE FOR DCASE 2019 TASK 4
Kiyokawa, Yu and Mishima, Sakiko and Toizumi, Takahiro and Sagi, Kazutoshi and Kondo, Reishi and Nomura, Toshiyuki
Data Science Research Laboratories, NEC Corporation, Japan
Kiyokawa_NEC_task4_1 Kiyokawa_NEC_task4_2 Kiyokawa_NEC_task4_3 Kiyokawa_NEC_task4_4
SOUND EVENT DETECTION WITH RESNET AND SELF-MASK MODULE FOR DCASE 2019 TASK 4
Kiyokawa, Yu and Mishima, Sakiko and Toizumi, Takahiro and Sagi, Kazutoshi and Kondo, Reishi and Nomura, Toshiyuki
Data Science Research Laboratories, NEC Corporation, Japan
Abstract
In this technical report, we propose a sound event detection system using a residual network (ResNet) with a self-mask module for a task 4 of detection and classification of acoustic scenes and events 2019 (DCASE 2019) challenge. Our system is constructed with a convolutional neural network based on a ResNet. We introduce a self-mask module as a region proposal network in order to detect event time boundaries. The self-mask module constrains time duration of silent and sound events by proposing candidates of the sound event region. These constraints improve detection accuracy of the sound event regions. Evaluation results show that our system obtains 36.09% of event-based F1-score for a sound event detection on a validation dataset of the task 4.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Data augmentation | mixup |
Features | log-mel energies |
Classifier | ResNet, SENet |
CROSS-TASK LEARNING FOR AUDIO TAGGING, SOUND EVENT DETECTION AND SPATIAL LOCALIZATION: DCASE 2019 BASELINE SYSTEMS
Kong, Qiuqiang and Cao, Yin and Iqbal, Turab and Wang, Wenwu and Plumbley, Mark D.
Centre for Vision, Speech and Signal Processing, University of Surrey, UK
Kong_SURREY_task4_1
CROSS-TASK LEARNING FOR AUDIO TAGGING, SOUND EVENT DETECTION AND SPATIAL LOCALIZATION: DCASE 2019 BASELINE SYSTEMS
Kong, Qiuqiang and Cao, Yin and Iqbal, Turab and Wang, Wenwu and Plumbley, Mark D.
Centre for Vision, Speech and Signal Processing, University of Surrey, UK
Abstract
The Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge focuses on audio tagging, sound event detection and spatial localisation. DCASE 2019 consists of five tasks: 1) acoustic scene classification, 2) audio tagging with noisy labels and minimal supervision, 3) sound event localisation and detection, 4) sound event detection in domestic environments, and 5) urban sound tagging. In this paper, we propose generic cross-task baseline systems based on convolutional neural networks (CNNs). The motivation is to investigate the performance of a variety of models across several audio recognition tasks without exploiting the specific characteristics of the tasks. We looked at CNNs with 5, 9, and 13 layers, and found that the optimal architecture is task-dependent. For the systems we considered, we found that the 9-layer CNN with average pooling after convolutional layers is a good model for a majority of the DCASE 2019 tasks.
System characteristics
Input | mono |
Sampling rate | 32kHz |
Features | log-mel energies |
Classifier | CNN |
INTEGRATED BOTTOM-UP AND TOP-DOWN INFERENCE FOR SOUND EVENT DETECTION
Kothinti, Sandeep and Sell, Gregory and Watanabe, Shinji and Elhilali, Mounya
Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
Kothinti_JHU_task4_1 Kothinti_JHU_task4_2 Kothinti_JHU_task4_3 Mishima_NEC_task4_4
INTEGRATED BOTTOM-UP AND TOP-DOWN INFERENCE FOR SOUND EVENT DETECTION
Kothinti, Sandeep and Sell, Gregory and Watanabe, Shinji and Elhilali, Mounya
Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
Abstract
While supervised methods have been highly effective at defining boundaries of sound events, the characteristics of the acoustic scene itself can provide complementary information about the changing profile of the scene and presence of new events. This work explores an integrated supervised and unsupervised approach to weakly labeled sound event detection by complementing a class-based inference system with a bottom-up, salience-based analysis. The two systems work conjointly in two ways: 1) Class information from the supervised model is used to tune the parameters of the bottom-up salience detection; and 2) Salience-based boundaries are leveraged to create pseudo-labels for weakly labeled data to generate more samples of strongly annotated data. These operations reflect the interplay between stimulus driven analysis and semantic driven analysis. The proposed method gives an absolute improvement of 11% on macro-averaged F-score on the development set.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Features | log-mel energies, auditory spectrogram |
Classifier | CRNN, RBM, CRBM, PCA, mean-teacher student |
END-TO-END DEEP CONVOLUTIONAL NEURAL NETWORK WITH MULTI-SCALE STRUCTURE FOR WEAKLY LABELED SOUND EVENT DETECTION
Lee, Seokjin and Kim, Minhan and Jeong, Youngho
Kyungpook National University, School of Electronics Engineering, Daegu, Republic of Korea
Lee_KNU_task4_1 Lee_KNU_task4_2 Lee_KNU_task4_3 Lee_KNU_task4_4
END-TO-END DEEP CONVOLUTIONAL NEURAL NETWORK WITH MULTI-SCALE STRUCTURE FOR WEAKLY LABELED SOUND EVENT DETECTION
Lee, Seokjin and Kim, Minhan and Jeong, Youngho
Kyungpook National University, School of Electronics Engineering, Daegu, Republic of Korea
Abstract
In this paper, an end-to-end sound event detection algorithm that detects and classifies the sound events from the waveform itself. The proposed model consists of multi-scale time frames and networks to handle both short and long signal characteristics; the frame slides 0.1 second to provide sufficiently fine resolution. The element network for each time frame data consists of several one-dimensional convolutional neural networks (1D-CNNs) with deeply stacked structure. The results of element networks are averaged and gated by sound activity detection. The decision is made by performing the double thresholding, and the results are enhanced by class-wise minimum gap/length compensation. To evaluate our proposed network, the simulation was performed with development data from DCASE 2019 Task 4, and the results show that the proposed algorithm has a macro-averaged f1-score of 31.7% for the development dataset of DCASE 2019 and 30.2% for the evaluation dataset of DCASE 2018.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Features | waveform |
Classifier | CNN, pseudo-labelling, mean-teacher student |
SOUND EVENT DETECTION IN DOMESTIC ENVIRONMENTS USING ENSEMBLE OF CONVOLUTIONAL RECURRENT NEURAL NETWORKS
Lim, Wootaek and Suh, Sangwon and Park, Sooyoung and Jeong, Youngho
Realistic AV Research Group, Electronics and Telecommunications Research Institute, Daejeon, Korea
Lim_ETRI_task4_1 Lim_ETRI_task4_2 Lim_ETRI_task4_3 Lim_ETRI_task4_4
SOUND EVENT DETECTION IN DOMESTIC ENVIRONMENTS USING ENSEMBLE OF CONVOLUTIONAL RECURRENT NEURAL NETWORKS
Lim, Wootaek and Suh, Sangwon and Park, Sooyoung and Jeong, Youngho
Realistic AV Research Group, Electronics and Telecommunications Research Institute, Daejeon, Korea
Abstract
In this paper, we present a method to detect sound events in domestic environments using small weakly labeled data, large unlabeled data, and strongly labeled synthetic data as proposed in the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge task 4. To solve the problem, we use convolutional recurrent neural network (CRNN), as it stacks convolutional neural networks (CNN) and bi-directional gated recurrent unit (Bi-GRU). Moreover, we propose various methods such as data augmentation, event activity detection, multi-median filtering, mean-teacher student model, and the ensemble of neural networks to improve performance. By combining the proposed method, sound event detection performance can be enhanced, compared with the baseline algorithm. As a result, performance evaluation shows that the proposed method provides detection results of 40.89% for event-based metrics and 66.17% for segment-based metrics.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Data augmentation | SpecAugment |
Features | log-mel energies |
Classifier | CRNN |
Decision making | mean probabilities, thresholding |
GUIDED LEARNING CONVOLUTION SYSTEM FOR DCASE 2019 TASK 4
Lin, Liwei and Wang, Xiangdong
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Abstract
In this technical report, we describe in detail the system we submitted to DCASE2019 task 4: sound event detection (SED) in domestic environments. We approach SED as a multiple instance learning (MIL) problem and employ a convolutional neural network (CNN) with class-wise attention pooling (cATP) module to solve it. By considering the interference caused by the co-occurrence of multiple events in the unbalanced dataset, we combine the cATP-MIL framework with the disentangled feature. To take advantage of the unlabeled data, we adopt the guided learning with a more professional teacher for semi-supervised learning. A group of median filters with adaptive window sizes is utilized in post-processing. We also analyze the effect of the synthetic data on the performance of the model and finally achieve an F-measure of 45.43% on the validation set.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Features | log-mel energies |
Classifier | CNN |
TRAINING METHOD USING CLASS-FRAME PSEUDO LABEL FOR WEAKLY LABELED DATASET ON DCASE2019
Mishima, Sakiko and Kiyokawa, Yu and Toizumi, Takahiro and Sagi, Kazutoshi and Kondo, Reishi and Nomura, Toshiyuki
Data Science Research Laboratories, NEC Corporation, Japan
Mishima_NEC_task4_1 Mishima_NEC_task4_2 Mishima_NEC_task4_3 Mishima_NEC_task4_4
TRAINING METHOD USING CLASS-FRAME PSEUDO LABEL FOR WEAKLY LABELED DATASET ON DCASE2019
Mishima, Sakiko and Kiyokawa, Yu and Toizumi, Takahiro and Sagi, Kazutoshi and Kondo, Reishi and Nomura, Toshiyuki
Data Science Research Laboratories, NEC Corporation, Japan
Abstract
We propose a training method using class-frame pseudo label for weakly labeled datasets given by the IEEE AASP challenge on detection and classification of acoustic scenes and events 2019 (DCASE2019) task 4. Our model is constructed based on a residual network (ResNet) and trained by datasets including strong and weak labels. The strong label has event classes and their presences at each frame, and the weak label has only event classes. In order to train the model effectively, we propose class-frame pseudo labels for weakly labeled datasets. The class-frame pseudo label contributes to improvement of the event presence prediction at each frame by avoidance of overfitting to strongly labeled datasets. A result shows that F1-scores by our proposed method are 25.9% and 62.0% in the event-based and segment-based evaluations, respectively.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Data augmentation | mixup |
Features | log-mel energies |
Classifier | ResNet |
REGULARIZED CNN FOR SOUND EVENT DETECTION IN DOMESTIC ENVIRONMENTS
Rakowski, Alexander
Samsung R&D Poland, Audio Intelligence Dept., Warsaw, Poland
Rakowski_SRPOL_task4_1
REGULARIZED CNN FOR SOUND EVENT DETECTION IN DOMESTIC ENVIRONMENTS
Rakowski, Alexander
Samsung R&D Poland, Audio Intelligence Dept., Warsaw, Poland
Abstract
This report describes a system used for Task 4 of the DCASE 2019 Challenge - Sound Event Detection in Domestic Environments. The system consists of a 9-layer convolutional neural network which yields frame-level predictions. These are then aggregated using a Voice Activity Detection algorithm in order to extract sound events. To prevent the system from overfitting two techniques are applied. The first one consists of training the model with channel- and pixel-wise dropout. The second one removes information from a randomly selected subset of frames.
System characteristics
Input | mono |
Sampling rate | 32kHz |
Data augmentation | occlusions |
Features | log-mel energies |
Classifier | CNN |
HODGEPODGE: SOUND EVENT DETECTION BASED ON ENSEMBLE OF SEMI-SUPERVISED LEARNING METHODS
Shi, Ziqiang
Fujitsu Research and Development Center, Beijing, China
Shi_FRDC_task4_1 Shi_FRDC_task4_2 Shi_FRDC_task4_3 Shi_FRDC_task4_4
HODGEPODGE: SOUND EVENT DETECTION BASED ON ENSEMBLE OF SEMI-SUPERVISED LEARNING METHODS
Shi, Ziqiang
Fujitsu Research and Development Center, Beijing, China
Abstract
In this technical report, we present the techniques and models applied to our submission for DCASE 2019 task 4: Sound event detection in domestic environments. We aim to focus primarily on how to apply semi-supervise learning methods efficiently to deal with large amount of unlabeled in-domain data. Three semi-supervised learning principles have been used in our system, including: 1) Consistency regularization applies data augmentation; 2) MixUp regularizer requiring that the prediction for a interpolation of two inputs is close to the interpolation of the prediction for each individual input; 3) MixUp regularization applies to interpolation between data augmentations. We also tried an ensemble of various models, trained by using different semi-supervised learning principles.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Data augmentation | Gaussian noise |
Features | log-mel energies |
Classifier | CRNN, mean-teacher student |
Sound event detection in domestic environments with weakly labeled data and soundscape synthesis
Turpault, Nicolas and Serizel, Romain and Parag Shah, Ankit and Salamon, Justin
Université de Lorraine, CNRS, Inria, Loria, France
Abstract
This paper presents DCASE 2019 task 4 and proposes a first analysis of the results. The task is the follow up to DCASE 2018 task 4 and evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries). The paper focuses in particular on the additional synthetic, strongly labeled, dataset provided this year.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Features | log-mel energies |
Classifier | CRNN, mean-teacher student |
SOUND EVENT DETECTION USING WEAKLY LABELED AND UNLABELED DATA WITH SELF-ADAPTIVE EVENT THRESHOLD
Wang, Dezhi and Zhang, Lilun and Bao, Changchun and Wang, Yongxian and Xu, Kele and Zhu, Boqing
National University of Defense Technology, College of Meteorology and Oceanography, Changsha, China
Wang_NUDT_task4_1 Wang_NUDT_task4_2 Wang_NUDT_task4_3 Wang_NUDT_task4_4
SOUND EVENT DETECTION USING WEAKLY LABELED AND UNLABELED DATA WITH SELF-ADAPTIVE EVENT THRESHOLD
Wang, Dezhi and Zhang, Lilun and Bao, Changchun and Wang, Yongxian and Xu, Kele and Zhu, Boqing
National University of Defense Technology, College of Meteorology and Oceanography, Changsha, China
Abstract
The details of our method submitted to the task 4 of DCASE challenge 2019 are described in this technical report. This task evaluates systems for the detection of sound events in domestic environments using large-scale weakly labeled data. In particular, an architecture based on the framework of convolutional recurrent neural network (CRNN) is utilized to detect the timestamps of all the events in given audio clips where the training audio files have only clip-level labels. In order to take advantage of the large-scale unlabeled in-domain training data, an audio tagging system using deep residual network (ResNext) is first employed to make predictions for weak labels of the unlabeled data before the sound event detection process. In addition, a self-adaptive searching strategy for best sound-event thresholds is applied in the model testing process, which is believed to have some benefits on the improvement of model performance and generalization capability. Finally, the system achieves 23.79% F1-value in class-wise average metrics for the sound event detection on the provided testing dataset.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Data augmentation | mixup |
Features | log-mel energies, delta feature |
Classifier | CRNN |
Decision making | mean probability |
WEAKLY LABELED SOUND EVENT DETECTION WITH RESDUAL CRNN USING SEMI-SUPERVISED METHOD
Yan, Jie and Song, Yan
University of Science and Technology of China, National Engineering Laboratory for Speech and Language Information Processing, Hefei, China
Yan_USTC_task4_1 Yan_USTC_task4_2 Yan_USTC_task4_3 Yan_USTC_task4_4
WEAKLY LABELED SOUND EVENT DETECTION WITH RESDUAL CRNN USING SEMI-SUPERVISED METHOD
Yan, Jie and Song, Yan
University of Science and Technology of China, National Engineering Laboratory for Speech and Language Information Processing, Hefei, China
Abstract
In this report, we present our system for the task 4 of DCASE 2019 challenge (Sound event detection in domestic environments). The goal of the task is to evaluate systems with real data either weakly labeled or unlabeled and simulated data that is strongly labeled. To perform this task, we propose resdual CRNN as our system. We also use mean-teacher model based on confidence thresholding and smooth embedding method. In addition, we also apply specaugment for the labeled data shortage problem. Finaly, we achieve better performance than DCASE2019 baseline system.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Data augmentation | specaugment |
Features | log-mel energies |
Classifier | CRNN, mean-teacher student |
MEAN TEACHER MODEL BASED ON CMRANN NETWORK FOR SOUND EVENT DETECTION
Yang, Qian and Xia, Jing and Wang, Jinjia
College of Information Science and Engineering, Yan shan University, Qinhuangdao, China
Wang_YSU_task4_1 Wang_YSU_task4_2 Wang_YSU_task4_3
MEAN TEACHER MODEL BASED ON CMRANN NETWORK FOR SOUND EVENT DETECTION
Yang, Qian and Xia, Jing and Wang, Jinjia
College of Information Science and Engineering, Yan shan University, Qinhuangdao, China
Abstract
This paper proposes an improved mean teacher model for sound event detection tasks in a domestic environment. The model consists of CNN network, ML-LoBCoD network, RNN network and attention mechanism. To evaluate our method, we tested on the DCASE 2019 Challenge Task 4 dataset. The results show that the average score of F1 in the evaluation 2018 dataset is 22.7%, and the F1 score in the validation 2019 dataset is 23.4%.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Features | log-mel energies |
Classifier | CMRANN-MT, mean-teacher student |
AN IMPROVED SYSTEM FOR DCASE 2019 CHALLENGE TASK 4
Zhang, Zhenyuan Zhang and Yang, Mingxue and Liu, Li
University of Electronic Science and Technology of China ence and Technology of China, School of Information and Communication Engineering, Chengdu, China
ZYL_UESTC_task4_1 ZYL_UESTC_task4_2
AN IMPROVED SYSTEM FOR DCASE 2019 CHALLENGE TASK 4
Zhang, Zhenyuan Zhang and Yang, Mingxue and Liu, Li
University of Electronic Science and Technology of China ence and Technology of China, School of Information and Communication Engineering, Chengdu, China
Abstract
In this technical report, we present an improved system for DCASE2019 challenge task 4, with the goal to evaluate systems for the detection of sound events using real data either weakly labeled or unlabeled and simulated data that is strongly labeled .We use the multi-scale Mel-spectra as the feature and do the detection with the 3 layers convolu- tional neural network(CNN) and 2 layers recurrent neural network (RNN), after each layer of CNN, we apply a Res-Net (Residual Neural Network) block to increase learning depth. Aim to use data without labels or with weak labels, we apply the mean–teacher model to do the sound event detection.
System characteristics
Input | mono |
Sampling rate | 44.1kHz |
Features | log-mel energies |
Classifier | CNN,ResNet,RNN, mean-teacher student |