Task description
The task evaluates systems for the detection of sound events using weakly labeled data (without timestamps). The target of the systems is to provide not only the event class but also the event time boundaries given that multiple events can be present in an audio recording. The challenge of exploring the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly annotated training set to improve system performance remains. Isolated sound events, background sound files and scripst to design a training set with strongly annotated synthetic data are provided. The labels in all the annotated subsets are verified and can be considered as reliable.
More detailed task description can be found in the task description page
Systems ranking
Rank |
Submission code |
Submission name |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS 1 (Evaluation dataset) |
PSDS 2 (Evaluation dataset) |
PSDS 1 (Development dataset) |
PSDS 2 (Development dataset) |
---|---|---|---|---|---|---|---|---|
Zhang_UCAS_task4_2 | DCASE2022 pretrained system 2 | Xiao2022 | 1.41 | 0.484 | 0.697 | 0.481 | 0.694 | |
Zhang_UCAS_task4_1 | DCASE2022 pretrained system 1 | Xiao2022 | 1.39 | 0.472 | 0.700 | 0.475 | 0.688 | |
Zhang_UCAS_task4_3 | DCASE2022 base system | Xiao2022 | 1.21 | 0.420 | 0.599 | 0.431 | 0.645 | |
Zhang_UCAS_task4_4 | DCASE2022 weak_pred system | Xiao2022 | 0.79 | 0.049 | 0.784 | 0.051 | 0.826 | |
Liu_NSYSU_task4_2 | DCASE2022 PANNs SED 2 | Liu2022 | 0.06 | 0.000 | 0.063 | 0.451 | 0.734 | |
Liu_NSYSU_task4_3 | DCASE2022 PANNs SED 3 | Liu2022 | 0.29 | 0.070 | 0.194 | 0.457 | 0.767 | |
Huang_NSYSU_task4_1 | DCASE2022 KDmt SED | Huang2022 | 1.28 | 0.434 | 0.650 | 0.437 | 0.680 | |
Liu_NSYSU_task4_4 | DCASE2022 PANNs SED 4 | Liu2022 | 0.21 | 0.046 | 0.151 | 0.465 | 0.760 | |
Suh_ReturnZero_task4_1 | rtzr_dev-only | Suh2022 | 1.22 | 0.393 | 0.650 | |||
Suh_ReturnZero_task4_4 | rtzr_weak-SED | Suh2022 | 0.81 | 0.062 | 0.774 | 0.063 | 0.814 | |
Suh_ReturnZero_task4_2 | rtzr_strong-real | Suh2022 | 1.39 | 0.458 | 0.721 | 0.473 | 0.723 | |
Suh_ReturnZero_task4_3 | rtzr_audioset | Suh2022 | 1.42 | 0.478 | 0.719 | 0.445 | 0.704 | |
Cheng_CHT_task4_2 | DCASE2022_CRNN_ADJ | Cheng2022 | 0.93 | 0.276 | 0.543 | 0.356 | 0.601 | |
Cheng_CHT_task4_1 | DCASE2022_CRNN_IMP | Cheng2022 | 1.03 | 0.314 | 0.582 | 0.362 | 0.635 | |
Liu_SRCN_task4_2 | DCASE2022 task4 Pre-Trained 2 | Liu2022 | 0.90 | 0.129 | 0.758 | 0.177 | 0.801 | |
Liu_SRCN_task4_1 | DCASE2022 task4 Pre-Trained 1 | Liu2022 | 0.79 | 0.051 | 0.777 | 0.067 | 0.827 | |
Liu_SRCN_task4_4 | DCASE2022 task4 without external data | Liu2022 | 0.24 | 0.025 | 0.219 | 0.037 | 0.244 | |
Liu_SRCN_task4_3 | DCASE2022 task4 AudioSet strong | Liu2022 | 1.25 | 0.425 | 0.634 | 0.443 | 0.660 | |
Kim_LGE_task4_1 | DCASE2022 Kim system 1 | Kim2022a | 1.34 | 0.444 | 0.697 | 0.473 | 0.693 | |
Kim_LGE_task4_3 | DCASE2022 Kim system 3 | Kim2022a | 0.81 | 0.062 | 0.781 | 0.068 | 0.830 | |
Kim_LGE_task4_4 | DCASE2022 Kim system 4 | Kim2022a | 1.17 | 0.305 | 0.750 | 0.354 | 0.756 | |
Kim_LGE_task4_2 | DCASE2022 Kim system 2 | Kim2022a | 1.34 | 0.444 | 0.695 | 0.473 | 0.695 | |
Ryu_Deeply_task4_1 | SKATTN_1 | Ryu2022 | 0.83 | 0.257 | 0.461 | 0.269 | 0.446 | |
Ryu_Deeply_task4_2 | SKATTN_2 | Ryu2022 | 0.66 | 0.156 | 0.449 | 0.161 | 0.452 | |
Giannakopoulos_UNIPI_task4_2 | Multi-Task Learning using Variational AutoEncoders | Giannakopoulos2022 | 0.21 | 0.029 | 0.184 | 0.046 | 0.165 | |
Giannakopoulos_UNIPI_task4_1 | Multi-Task Learning using Variational AutoEncoders | Giannakopoulos2022 | 0.35 | 0.104 | 0.196 | 0.129 | 0.241 | |
Mizobuchi_PCO_task4_4 | PCO_task4_SED_D | Mizobuchi2022 | 0.82 | 0.062 | 0.787 | 0.075 | 0.852 | |
Mizobuchi_PCO_task4_2 | PCO_task4_SED_B | Mizobuchi2022 | 1.26 | 0.439 | 0.611 | 0.449 | 0.662 | |
Mizobuchi_PCO_task4_3 | PCO_task4_SED_C | Mizobuchi2022 | 0.88 | 0.197 | 0.620 | 0.231 | 0.714 | |
Mizobuchi_PCO_task4_1 | PCO_task4_SED_A | Mizobuchi2022 | 1.15 | 0.398 | 0.571 | 0.425 | 0.625 | |
KIM_HYU_task4_2 | single1 | Sojeong2022 | 1.28 | 0.421 | 0.664 | 0.422 | 0.667 | |
KIM_HYU_task4_4 | single2 | Sojeong2022 | 1.27 | 0.423 | 0.651 | 0.480 | 0.726 | |
KIM_HYU_task4_1 | train_ensemble1 | Sojeong2022 | 1.19 | 0.390 | 0.620 | 0.434 | 0.675 | |
KIM_HYU_task4_3 | train_ensemble2 | Sojeong2022 | 1.24 | 0.415 | 0.634 | 0.494 | 0.748 | |
Baseline | DCASE2022 SED baseline system | Turpault2022 | 1.00 | 0.315 | 0.543 | 0.342 | 0.527 | |
Dinkel_XiaoRice_task4_1 | SCRATCH | Dinkel2022 | 1.29 | 0.422 | 0.679 | 0.456 | 0.713 | |
Dinkel_XiaoRice_task4_2 | SMALL | Dinkel2022 | 1.15 | 0.373 | 0.613 | 0.395 | 0.631 | |
Dinkel_XiaoRice_task4_4 | TAG | Dinkel2022 | 0.92 | 0.104 | 0.824 | 0.126 | 0.877 | |
Dinkel_XiaoRice_task4_3 | PRECISE | Dinkel2022 | 1.38 | 0.451 | 0.727 | 0.482 | 0.757 | |
Hao_UNISOC_task4_2 | SUBMISSION FOR DCASE2022 TASK4 | Hao2022 | 0.78 | 0.078 | 0.723 | 0.448 | 0.700 | |
Hao_UNISOC_task4_1 | SUBMISSION FOR DCASE2022 TASK4 | Hao2022 | 1.24 | 0.425 | 0.615 | 0.448 | 0.700 | |
Hao_UNISOC_task4_3 | SUBMISSION FOR DCASE2022 TASK4 | Hao2022 | 1.09 | 0.373 | 0.547 | 0.448 | 0.700 | |
Khandelwal_FMSG-NTU_task4_1 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 0.83 | 0.158 | 0.633 | 0.088 | 0.837 | |
Khandelwal_FMSG-NTU_task4_2 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 0.80 | 0.082 | 0.731 | 0.102 | 0.840 | |
Khandelwal_FMSG-NTU_task4_3 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 1.26 | 0.410 | 0.664 | 0.472 | 0.721 | |
Khandelwal_FMSG-NTU_task4_4 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 1.20 | 0.386 | 0.643 | 0.474 | 0.730 | |
deBenito_AUDIAS_task4_4 | 7-Resolution CRNN+Conformer with class-wise median filtering | deBenito2022 | 1.28 | 0.432 | 0.649 | 0.428 | 0.655 | |
deBenito_AUDIAS_task4_1 | 10-Resolution CRNN+Conformer | deBenito2022 | 1.23 | 0.400 | 0.646 | 0.410 | 0.665 | |
deBenito_AUDIAS_task4_2 | 10-Resolution CRNN+Conformer with class-wise median filtering | deBenito2022 | 1.08 | 0.310 | 0.642 | 0.347 | 0.663 | |
deBenito_AUDIAS_task4_3 | 7-Resolution CRNN+Conformer | deBenito2022 | 1.23 | 0.407 | 0.643 | 0.422 | 0.656 | |
Li_WU_task4_4 | ATST-RCT SED system ATST ensemble | Shao2022 | 1.41 | 0.486 | 0.694 | 0.477 | 0.734 | |
Li_WU_task4_2 | ATST-RCT SED system ATST small | Shao2022 | 1.36 | 0.476 | 0.666 | 0.460 | 0.698 | |
Li_WU_task4_3 | ATST-RCT SED system ATST base | Shao2022 | 1.40 | 0.482 | 0.693 | 0.468 | 0.702 | |
Li_WU_task4_1 | ATST-RCT SED system CRNN with RCT | Shao2022 | 1.13 | 0.368 | 0.594 | 0.398 | 0.611 | |
Kim_GIST_task4_3 | Kim_GIST_task4_3 | Kim2022b | 1.43 | 0.500 | 0.695 | 0.452 | 0.682 | |
Kim_GIST_task4_1 | Kim_GIST_task4_1 | Kim2022b | 1.47 | 0.514 | 0.713 | 0.458 | 0.688 | |
Kim_GIST_task4_2 | Kim_GIST_task4_2 | Kim2022b | 1.46 | 0.510 | 0.711 | 0.456 | 0.685 | |
Kim_GIST_task4_4 | Kim_GIST_task4_4 | Kim2022b | 0.65 | 0.215 | 0.335 | 0.459 | 0.744 | |
Ebbers_UPB_task4_4 | CRNN ensemble w/o external data | Ebbers2022 | 1.49 | 0.509 | 0.742 | 0.492 | 0.721 | |
Ebbers_UPB_task4_2 | FBCRNN ensemble | Ebbers2022 | 0.83 | 0.047 | 0.824 | 0.080 | 0.868 | |
Ebbers_UPB_task4_1 | CRNN ensemble | Ebbers2022 | 1.59 | 0.552 | 0.786 | 0.512 | 0.772 | |
Ebbers_UPB_task4_3 | tag-conditioned CRNN ensemble | Ebbers2022 | 1.46 | 0.527 | 0.679 | 0.483 | 0.713 | |
Xu_SRCB-BIT_task4_2 | PANNs-FDY-CRNN-wrTCL system 2 | Xu2022 | 1.41 | 0.482 | 0.702 | 0.485 | 0.725 | |
Xu_SRCB-BIT_task4_1 | PANNs-FDY-CRNN-wrTCL system 1 | Xu2022 | 1.32 | 0.452 | 0.662 | 0.481 | 0.710 | |
Xu_SRCB-BIT_task4_3 | PANNs-FDY-CRNN-weak train | Xu2022 | 0.79 | 0.054 | 0.774 | 0.065 | 0.835 | |
Xu_SRCB-BIT_task4_4 | FDY-CRNN-weak train | Xu2022 | 0.75 | 0.049 | 0.738 | 0.058 | 0.813 | |
Nam_KAIST_task4_SED_2 | SED_2 | Nam2022 | 1.25 | 0.409 | 0.656 | 0.470 | 0.700 | |
Nam_KAIST_task4_SED_3 | SED_3 | Nam2022 | 0.77 | 0.057 | 0.747 | 0.061 | 0.822 | |
Nam_KAIST_task4_SED_4 | SED_4 | Nam2022 | 0.77 | 0.055 | 0.747 | 0.058 | 0.820 | |
Nam_KAIST_task4_SED_1 | SED_1 | Nam2022 | 1.24 | 0.404 | 0.653 | 0.470 | 0.687 | |
Blakala_SRPOL_task4_3 | Blakala_SRPOL_task4_3 | Blakala2022 | 0.95 | 0.293 | 0.527 | 0.341 | 0.596 | |
Blakala_SRPOL_task4_1 | Blakala_SRPOL_task4_1 | Blakala2022 | 1.11 | 0.365 | 0.584 | 0.374 | 0.583 | |
Blakala_SRPOL_task4_2 | Blakala_SRPOL_task4_2 | Blakala2022 | 0.78 | 0.069 | 0.728 | 0.070 | 0.794 | |
Li_USTC_task4_SED_1 | Mean teacher Pseudo labeling system 1 | Li2022b | 1.41 | 0.480 | 0.713 | 0.479 | 0.735 | |
Li_USTC_task4_SED_4 | Mean teacher Pseudo labeling system 4 | Li2022b | 1.34 | 0.429 | 0.723 | 0.436 | 0.778 | |
Li_USTC_task4_SED_2 | Mean teacher Pseudo labeling system 2 | Li2022b | 1.39 | 0.451 | 0.740 | 0.462 | 0.785 | |
Li_USTC_task4_SED_3 | Mean teacher Pseudo labeling system 3 | Li2022b | 1.35 | 0.450 | 0.699 | 0.456 | 0.726 | |
Bertola_UPF_task4_1 | DCASE2022 baseline system | Bertola2022 | 0.98 | 0.318 | 0.520 | 0.356 | 0.554 | |
He_BYTEDANCE_task4_4 | DCASE2022 SED mean teacher system 4 | He2022 | 0.82 | 0.053 | 0.810 | 0.071 | 0.857 | |
He_BYTEDANCE_task4_2 | DCASE2022 SED mean teacher system 2 | He2022 | 1.48 | 0.503 | 0.749 | 0.521 | 0.771 | |
He_BYTEDANCE_task4_3 | DCASE2022 SED mean teacher system 3 | He2022 | 1.52 | 0.525 | 0.748 | 0.533 | 0.762 | |
He_BYTEDANCE_task4_1 | DCASE2022 SED mean teacher system 1 | He2022 | 1.36 | 0.454 | 0.696 | 0.474 | 0.692 | |
Li_ICT-TOSHIBA_task4_2 | Hybrid system of SEDT and frame-wise model | Li2022d | 0.79 | 0.090 | 0.709 | 0.115 | 0.816 | |
Li_ICT-TOSHIBA_task4_4 | Hybrid system of SEDT and frame-wise model | Li2022d | 0.75 | 0.075 | 0.692 | 0.099 | 0.783 | |
Li_ICT-TOSHIBA_task4_1 | Hybrid system of SEDT and frame-wise model | Li2022d | 1.26 | 0.439 | 0.612 | 0.449 | 0.645 | |
Li_ICT-TOSHIBA_task4_3 | Hybrid system of SEDT and frame-wise model | Li2022d | 1.20 | 0.411 | 0.597 | 0.420 | 0.618 | |
Xie_UESTC_task4_2 | CNN14 FC | Xie2022 | 0.83 | 0.062 | 0.800 | 0.072 | 0.856 | |
Xie_UESTC_task4_3 | CBAM-T CRNN scratch | Xie2022 | 1.06 | 0.300 | 0.641 | 0.360 | 0.674 | |
Xie_UESTC_task4_1 | CBAM-T CRNN 1 | Xie2022 | 1.36 | 0.418 | 0.757 | 0.460 | 0.768 | |
Xie_UESTC_task4_4 | CBAM-T CRNN 2 | Xie2022 | 1.38 | 0.426 | 0.766 | 0.460 | 0.768 | |
Baseline (AudioSet) | DCASE2022 SED baseline system (AudioSet) | Ronchini2022 | 1.04 | 0.345 | 0.540 | 0.342 | 0.527 | |
Kim_CAUET_task4_1 | DCASE2022 SED system1 | Kim2022c | 1.02 | 0.317 | 0.565 | 0.372 | 0.592 | |
Kim_CAUET_task4_2 | DCASE2022 SED system2 | Kim2022c | 1.04 | 0.340 | 0.544 | 0.377 | 0.585 | |
Kim_CAUET_task4_3 | DCASE2022 SED system3 | Kim2022c | 1.04 | 0.338 | 0.554 | 0.373 | 0.571 | |
Li_XJU_task4_1 | DCASE2022 SED system 1 | Li2022c | 1.10 | 0.364 | 0.570 | 0.408 | 0.607 | |
Li_XJU_task4_3 | DCASE2022 SED system 3 | Li2022c | 1.17 | 0.371 | 0.635 | 0.398 | 0.640 | |
Li_XJU_task4_4 | DCASE2022 SED system 4 | Li2022c | 0.93 | 0.195 | 0.683 | 0.215 | 0.735 | |
Li_XJU_task4_2 | DCASE2022 SED system 2 | Li2022c | 0.75 | 0.086 | 0.671 | 0.095 | 0.754 | |
Castorena_UV_task4_3 | Strong and Max-Weak balanced | Castorena2022 | 0.91 | 0.267 | 0.531 | 0.305 | 0.587 | |
Castorena_UV_task4_1 | Max-Weak balanced | Castorena2022 | 1.01 | 0.334 | 0.524 | 0.343 | 0.538 | |
Castorena_UV_task4_2 | Avg-Weak balanced | Castorena2022 | 0.63 | 0.072 | 0.559 | 0.067 | 0.641 |
Supplementary metrics
Rank |
Submission code |
Submission name |
Technical Report |
PSDS 1 (Evaluation dataset) |
PSDS 1 (Public evaluation) |
PSDS 1 (Vimeo dataset) |
PSDS 2 (Evaluation dataset) |
PSDS 2 (Public evaluation) |
PSDS 2 (Vimeo dataset) |
F-score (Evaluation dataset) |
F-score (Public evaluation) |
F-score (Vimeo dataset) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Zhang_UCAS_task4_2 | DCASE2022 pretrained system 2 | Xiao2022 | 0.484 | 0.525 | 0.396 | 0.697 | 0.725 | 0.612 | 56.5 | 60.2 | 47.4 | |
Zhang_UCAS_task4_1 | DCASE2022 pretrained system 1 | Xiao2022 | 0.472 | 0.519 | 0.384 | 0.700 | 0.748 | 0.577 | 56.2 | 61.3 | 44.0 | |
Zhang_UCAS_task4_3 | DCASE2022 base system | Xiao2022 | 0.420 | 0.468 | 0.304 | 0.599 | 0.649 | 0.470 | 51.3 | 55.4 | 40.1 | |
Zhang_UCAS_task4_4 | DCASE2022 weak_pred system | Xiao2022 | 0.049 | 0.057 | 0.019 | 0.784 | 0.836 | 0.651 | 15.0 | 17.0 | 10.7 | |
Liu_NSYSU_task4_2 | DCASE2022 PANNs SED 2 | Liu2022 | 0.000 | 0.003 | 0.000 | 0.063 | 0.077 | 0.024 | 10.5 | 11.8 | 6.8 | |
Liu_NSYSU_task4_3 | DCASE2022 PANNs SED 3 | Liu2022 | 0.070 | 0.095 | 0.013 | 0.194 | 0.237 | 0.087 | 8.3 | 9.2 | 5.9 | |
Huang_NSYSU_task4_1 | DCASE2022 KDmt SED | Huang2022 | 0.434 | 0.483 | 0.324 | 0.650 | 0.702 | 0.521 | 47.6 | 50.7 | 39.3 | |
Liu_NSYSU_task4_4 | DCASE2022 PANNs SED 4 | Liu2022 | 0.046 | 0.069 | 0.003 | 0.151 | 0.180 | 0.070 | 7.5 | 8.3 | 5.1 | |
Suh_ReturnZero_task4_1 | rtzr_dev-only | Suh2022 | 0.393 | 0.432 | 0.324 | 0.650 | 0.686 | 0.560 | 46.8 | 50.0 | 38.7 | |
Suh_ReturnZero_task4_4 | rtzr_weak-SED | Suh2022 | 0.062 | 0.072 | 0.026 | 0.774 | 0.807 | 0.674 | 12.9 | 13.9 | 10.6 | |
Suh_ReturnZero_task4_2 | rtzr_strong-real | Suh2022 | 0.458 | 0.495 | 0.370 | 0.721 | 0.768 | 0.612 | 53.1 | 57.6 | 42.1 | |
Suh_ReturnZero_task4_3 | rtzr_audioset | Suh2022 | 0.478 | 0.512 | 0.390 | 0.719 | 0.772 | 0.592 | 53.8 | 57.7 | 44.1 | |
Cheng_CHT_task4_2 | DCASE2022_CRNN_ADJ | Cheng2022 | 0.276 | 0.308 | 0.212 | 0.543 | 0.568 | 0.470 | 40.9 | 43.5 | 34.3 | |
Cheng_CHT_task4_1 | DCASE2022_CRNN_IMP | Cheng2022 | 0.314 | 0.361 | 0.223 | 0.582 | 0.611 | 0.497 | 43.2 | 46.7 | 34.5 | |
Liu_SRCN_task4_2 | DCASE2022 task4 Pre-Trained 2 | Liu2022 | 0.129 | 0.139 | 0.100 | 0.758 | 0.791 | 0.682 | 19.3 | 20.0 | 17.9 | |
Liu_SRCN_task4_1 | DCASE2022 task4 Pre-Trained 1 | Liu2022 | 0.051 | 0.063 | 0.015 | 0.777 | 0.803 | 0.696 | 13.6 | 14.3 | 12.0 | |
Liu_SRCN_task4_4 | DCASE2022 task4 without external data | Liu2022 | 0.025 | 0.023 | 0.011 | 0.219 | 0.224 | 0.183 | 5.2 | 5.8 | 3.6 | |
Liu_SRCN_task4_3 | DCASE2022 task4 AudioSet strong | Liu2022 | 0.425 | 0.471 | 0.319 | 0.634 | 0.674 | 0.512 | 49.3 | 52.2 | 41.6 | |
Kim_LGE_task4_1 | DCASE2022 Kim system 1 | Kim2022a | 0.444 | 0.503 | 0.323 | 0.697 | 0.740 | 0.588 | 51.0 | 54.8 | 41.0 | |
Kim_LGE_task4_3 | DCASE2022 Kim system 3 | Kim2022a | 0.062 | 0.069 | 0.030 | 0.781 | 0.809 | 0.691 | 12.8 | 13.5 | 11.4 | |
Kim_LGE_task4_4 | DCASE2022 Kim system 4 | Kim2022a | 0.305 | 0.333 | 0.234 | 0.750 | 0.778 | 0.683 | 27.4 | 28.5 | 25.1 | |
Kim_LGE_task4_2 | DCASE2022 Kim system 2 | Kim2022a | 0.444 | 0.502 | 0.334 | 0.695 | 0.738 | 0.585 | 51.1 | 55.0 | 41.0 | |
Ryu_Deeply_task4_1 | SKATTN_1 | Ryu2022 | 0.257 | 0.280 | 0.207 | 0.461 | 0.514 | 0.345 | 30.5 | 32.8 | 25.1 | |
Ryu_Deeply_task4_2 | SKATTN_2 | Ryu2022 | 0.156 | 0.171 | 0.129 | 0.449 | 0.477 | 0.356 | 19.3 | 20.0 | 18.3 | |
Giannakopoulos_UNIPI_task4_2 | Multi-Task Learning using Variational AutoEncoders | Giannakopoulos2022 | 0.029 | 0.033 | 0.015 | 0.184 | 0.214 | 0.102 | 9.4 | 10.2 | 7.2 | |
Giannakopoulos_UNIPI_task4_1 | Multi-Task Learning using Variational AutoEncoders | Giannakopoulos2022 | 0.104 | 0.121 | 0.048 | 0.196 | 0.216 | 0.130 | 26.5 | 29.4 | 19.0 | |
Mizobuchi_PCO_task4_4 | PCO_task4_SED_D | Mizobuchi2022 | 0.062 | 0.071 | 0.029 | 0.787 | 0.818 | 0.693 | 13.7 | 14.5 | 11.5 | |
Mizobuchi_PCO_task4_2 | PCO_task4_SED_B | Mizobuchi2022 | 0.439 | 0.489 | 0.324 | 0.611 | 0.656 | 0.498 | 49.7 | 53.0 | 40.4 | |
Mizobuchi_PCO_task4_3 | PCO_task4_SED_C | Mizobuchi2022 | 0.197 | 0.218 | 0.164 | 0.620 | 0.660 | 0.517 | 21.8 | 24.7 | 15.3 | |
Mizobuchi_PCO_task4_1 | PCO_task4_SED_A | Mizobuchi2022 | 0.398 | 0.450 | 0.285 | 0.571 | 0.617 | 0.452 | 47.6 | 50.4 | 39.5 | |
KIM_HYU_task4_2 | single1 | Sojeong2022 | 0.421 | 0.470 | 0.314 | 0.664 | 0.724 | 0.524 | 49.6 | 53.4 | 39.8 | |
KIM_HYU_task4_4 | single2 | Sojeong2022 | 0.423 | 0.476 | 0.308 | 0.651 | 0.707 | 0.509 | 50.4 | 55.2 | 38.0 | |
KIM_HYU_task4_1 | train_ensemble1 | Sojeong2022 | 0.390 | 0.437 | 0.284 | 0.620 | 0.678 | 0.488 | 48.1 | 52.5 | 37.3 | |
KIM_HYU_task4_3 | train_ensemble2 | Sojeong2022 | 0.415 | 0.467 | 0.299 | 0.634 | 0.698 | 0.486 | 48.1 | 52.8 | 36.2 | |
Baseline | DCASE2022 SED baseline system | Turpault2022 | 0.315 | 0.360 | 0.222 | 0.543 | 0.591 | 0.403 | 37.3 | 40.8 | 29.7 | |
Dinkel_XiaoRice_task4_1 | SCRATCH | Dinkel2022 | 0.422 | 0.480 | 0.298 | 0.679 | 0.737 | 0.528 | 45.6 | 49.2 | 36.1 | |
Dinkel_XiaoRice_task4_2 | SMALL | Dinkel2022 | 0.373 | 0.421 | 0.250 | 0.613 | 0.663 | 0.459 | 39.3 | 42.9 | 29.6 | |
Dinkel_XiaoRice_task4_4 | TAG | Dinkel2022 | 0.104 | 0.119 | 0.086 | 0.824 | 0.855 | 0.736 | 14.2 | 14.9 | 12.5 | |
Dinkel_XiaoRice_task4_3 | PRECISE | Dinkel2022 | 0.451 | 0.505 | 0.325 | 0.727 | 0.773 | 0.605 | 47.5 | 51.0 | 38.3 | |
Hao_UNISOC_task4_2 | SUBMISSION FOR DCASE2022 TASK4 | Hao2022 | 0.078 | 0.091 | 0.028 | 0.723 | 0.772 | 0.603 | 10.8 | 11.5 | 9.5 | |
Hao_UNISOC_task4_1 | SUBMISSION FOR DCASE2022 TASK4 | Hao2022 | 0.425 | 0.475 | 0.322 | 0.615 | 0.669 | 0.490 | 47.1 | 50.9 | 36.8 | |
Hao_UNISOC_task4_3 | SUBMISSION FOR DCASE2022 TASK4 | Hao2022 | 0.373 | 0.426 | 0.249 | 0.547 | 0.606 | 0.400 | 45.3 | 48.7 | 36.4 | |
Khandelwal_FMSG-NTU_task4_1 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 0.158 | 0.182 | 0.126 | 0.633 | 0.678 | 0.521 | 20.3 | 21.7 | 17.1 | |
Khandelwal_FMSG-NTU_task4_2 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 0.082 | 0.093 | 0.033 | 0.731 | 0.762 | 0.645 | 13.1 | 13.8 | 11.7 | |
Khandelwal_FMSG-NTU_task4_3 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 0.410 | 0.457 | 0.310 | 0.664 | 0.718 | 0.531 | 50.3 | 54.6 | 39.4 | |
Khandelwal_FMSG-NTU_task4_4 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 0.386 | 0.428 | 0.305 | 0.643 | 0.686 | 0.531 | 44.7 | 48.5 | 35.0 | |
deBenito_AUDIAS_task4_4 | 7-Resolution CRNN+Conformer with class-wise median filtering | deBenito2022 | 0.432 | 0.480 | 0.324 | 0.649 | 0.691 | 0.537 | 46.5 | 51.0 | 35.6 | |
deBenito_AUDIAS_task4_1 | 10-Resolution CRNN+Conformer | deBenito2022 | 0.400 | 0.447 | 0.299 | 0.646 | 0.694 | 0.528 | 45.0 | 49.3 | 34.5 | |
deBenito_AUDIAS_task4_2 | 10-Resolution CRNN+Conformer with class-wise median filtering | deBenito2022 | 0.310 | 0.350 | 0.237 | 0.642 | 0.689 | 0.525 | 37.7 | 41.5 | 28.5 | |
deBenito_AUDIAS_task4_3 | 7-Resolution CRNN+Conformer | deBenito2022 | 0.407 | 0.454 | 0.303 | 0.643 | 0.686 | 0.528 | 46.5 | 50.6 | 36.4 | |
Li_WU_task4_4 | ATST-RCT SED system ATST ensemble | Shao2022 | 0.486 | 0.535 | 0.378 | 0.694 | 0.740 | 0.589 | 51.8 | 55.1 | 43.8 | |
Li_WU_task4_2 | ATST-RCT SED system ATST small | Shao2022 | 0.476 | 0.524 | 0.377 | 0.666 | 0.713 | 0.555 | 51.6 | 56.1 | 41.0 | |
Li_WU_task4_3 | ATST-RCT SED system ATST base | Shao2022 | 0.482 | 0.533 | 0.372 | 0.693 | 0.740 | 0.584 | 51.8 | 55.1 | 43.8 | |
Li_WU_task4_1 | ATST-RCT SED system CRNN with RCT | Shao2022 | 0.368 | 0.409 | 0.283 | 0.594 | 0.644 | 0.474 | 45.0 | 49.0 | 35.5 | |
Kim_GIST_task4_3 | Kim_GIST_task4_3 | Kim2022b | 0.500 | 0.551 | 0.383 | 0.695 | 0.738 | 0.582 | 55.3 | 57.6 | 49.3 | |
Kim_GIST_task4_1 | Kim_GIST_task4_1 | Kim2022b | 0.514 | 0.559 | 0.406 | 0.713 | 0.756 | 0.598 | 55.9 | 59.0 | 47.5 | |
Kim_GIST_task4_2 | Kim_GIST_task4_2 | Kim2022b | 0.510 | 0.555 | 0.399 | 0.711 | 0.752 | 0.599 | 55.5 | 58.8 | 46.9 | |
Kim_GIST_task4_4 | Kim_GIST_task4_4 | Kim2022b | 0.215 | 0.239 | 0.135 | 0.335 | 0.358 | 0.254 | 31.6 | 34.7 | 23.1 | |
Ebbers_UPB_task4_4 | CRNN ensemble w/o external data | Ebbers2022 | 0.509 | 0.552 | 0.413 | 0.742 | 0.797 | 0.626 | 57.6 | 61.5 | 47.9 | |
Ebbers_UPB_task4_2 | FBCRNN ensemble | Ebbers2022 | 0.047 | 0.055 | 0.025 | 0.824 | 0.866 | 0.734 | 11.8 | 12.4 | 10.7 | |
Ebbers_UPB_task4_1 | CRNN ensemble | Ebbers2022 | 0.552 | 0.593 | 0.474 | 0.786 | 0.844 | 0.664 | 59.8 | 62.6 | 53.5 | |
Ebbers_UPB_task4_3 | tag-conditioned CRNN ensemble | Ebbers2022 | 0.527 | 0.568 | 0.444 | 0.679 | 0.729 | 0.566 | 65.9 | 70.1 | 56.3 | |
Xu_SRCB-BIT_task4_2 | PANNs-FDY-CRNN-wrTCL system 2 | Xu2022 | 0.482 | 0.533 | 0.354 | 0.702 | 0.756 | 0.582 | 55.0 | 58.4 | 46.2 | |
Xu_SRCB-BIT_task4_1 | PANNs-FDY-CRNN-wrTCL system 1 | Xu2022 | 0.452 | 0.500 | 0.338 | 0.662 | 0.702 | 0.552 | 51.7 | 54.7 | 43.5 | |
Xu_SRCB-BIT_task4_3 | PANNs-FDY-CRNN-weak train | Xu2022 | 0.054 | 0.064 | 0.022 | 0.774 | 0.799 | 0.699 | 13.1 | 14.2 | 10.7 | |
Xu_SRCB-BIT_task4_4 | FDY-CRNN-weak train | Xu2022 | 0.049 | 0.057 | 0.018 | 0.738 | 0.771 | 0.637 | 12.8 | 13.5 | 11.4 | |
Nam_KAIST_task4_SED_2 | SED_2 | Nam2022 | 0.409 | 0.450 | 0.329 | 0.656 | 0.695 | 0.554 | 48.9 | 51.4 | 42.3 | |
Nam_KAIST_task4_SED_3 | SED_3 | Nam2022 | 0.057 | 0.068 | 0.021 | 0.747 | 0.770 | 0.668 | 12.5 | 13.6 | 10.3 | |
Nam_KAIST_task4_SED_4 | SED_4 | Nam2022 | 0.055 | 0.066 | 0.016 | 0.747 | 0.770 | 0.673 | 12.7 | 13.6 | 10.9 | |
Nam_KAIST_task4_SED_1 | SED_1 | Nam2022 | 0.404 | 0.446 | 0.317 | 0.653 | 0.686 | 0.558 | 49.8 | 52.7 | 42.4 | |
Blakala_SRPOL_task4_3 | Blakala_SRPOL_task4_3 | Blakala2022 | 0.293 | 0.337 | 0.200 | 0.527 | 0.590 | 0.391 | 37.9 | 41.8 | 28.3 | |
Blakala_SRPOL_task4_1 | Blakala_SRPOL_task4_1 | Blakala2022 | 0.365 | 0.395 | 0.289 | 0.584 | 0.621 | 0.494 | 39.5 | 43.2 | 30.5 | |
Blakala_SRPOL_task4_2 | Blakala_SRPOL_task4_2 | Blakala2022 | 0.069 | 0.084 | 0.030 | 0.728 | 0.765 | 0.645 | 13.9 | 14.5 | 12.6 | |
Li_USTC_task4_SED_1 | Mean teacher Pseudo labeling system 1 | Li2022b | 0.480 | 0.541 | 0.347 | 0.713 | 0.760 | 0.585 | 55.1 | 59.9 | 42.8 | |
Li_USTC_task4_SED_4 | Mean teacher Pseudo labeling system 4 | Li2022b | 0.429 | 0.487 | 0.305 | 0.723 | 0.763 | 0.614 | 52.4 | 56.7 | 41.5 | |
Li_USTC_task4_SED_2 | Mean teacher Pseudo labeling system 2 | Li2022b | 0.451 | 0.514 | 0.320 | 0.740 | 0.776 | 0.634 | 53.8 | 58.2 | 42.8 | |
Li_USTC_task4_SED_3 | Mean teacher Pseudo labeling system 3 | Li2022b | 0.450 | 0.507 | 0.329 | 0.699 | 0.745 | 0.576 | 53.1 | 58.0 | 40.7 | |
Bertola_UPF_task4_1 | DCASE2022 baseline system | Bertola2022 | 0.318 | 0.352 | 0.244 | 0.520 | 0.563 | 0.406 | 37.7 | 40.4 | 31.3 | |
He_BYTEDANCE_task4_4 | DCASE2022 SED mean teacher system 4 | He2022 | 0.053 | 0.063 | 0.024 | 0.810 | 0.839 | 0.729 | 14.3 | 14.8 | 13.2 | |
He_BYTEDANCE_task4_2 | DCASE2022 SED mean teacher system 2 | He2022 | 0.503 | 0.551 | 0.392 | 0.749 | 0.798 | 0.639 | 54.5 | 58.5 | 44.4 | |
He_BYTEDANCE_task4_3 | DCASE2022 SED mean teacher system 3 | He2022 | 0.525 | 0.578 | 0.401 | 0.748 | 0.795 | 0.634 | 55.7 | 59.7 | 45.6 | |
He_BYTEDANCE_task4_1 | DCASE2022 SED mean teacher system 1 | He2022 | 0.454 | 0.503 | 0.338 | 0.696 | 0.744 | 0.596 | 53.6 | 58.0 | 42.2 | |
Li_ICT-TOSHIBA_task4_2 | Hybrid system of SEDT and frame-wise model | Li2022d | 0.090 | 0.095 | 0.062 | 0.709 | 0.747 | 0.581 | 9.4 | 10.3 | 7.2 | |
Li_ICT-TOSHIBA_task4_4 | Hybrid system of SEDT and frame-wise model | Li2022d | 0.075 | 0.085 | 0.044 | 0.692 | 0.731 | 0.570 | 9.0 | 10.3 | 5.4 | |
Li_ICT-TOSHIBA_task4_1 | Hybrid system of SEDT and frame-wise model | Li2022d | 0.439 | 0.486 | 0.321 | 0.612 | 0.649 | 0.508 | 29.3 | 32.0 | 20.9 | |
Li_ICT-TOSHIBA_task4_3 | Hybrid system of SEDT and frame-wise model | Li2022d | 0.411 | 0.453 | 0.312 | 0.597 | 0.635 | 0.488 | 34.6 | 38.5 | 24.0 | |
Xie_UESTC_task4_2 | CNN14 FC | Xie2022 | 0.062 | 0.074 | 0.021 | 0.800 | 0.825 | 0.719 | 13.7 | 14.2 | 12.6 | |
Xie_UESTC_task4_3 | CBAM-T CRNN scratch | Xie2022 | 0.300 | 0.335 | 0.207 | 0.641 | 0.695 | 0.502 | 38.3 | 41.8 | 29.1 | |
Xie_UESTC_task4_1 | CBAM-T CRNN 1 | Xie2022 | 0.418 | 0.463 | 0.323 | 0.757 | 0.815 | 0.626 | 52.7 | 57.3 | 41.1 | |
Xie_UESTC_task4_4 | CBAM-T CRNN 2 | Xie2022 | 0.426 | 0.474 | 0.333 | 0.766 | 0.829 | 0.630 | 54.7 | 58.8 | 44.3 | |
Baseline (AudioSet) | DCASE2022 SED baseline system (AudioSet) | Ronchini2022 | 0.345 | 0.387 | 0.254 | 0.540 | 0.592 | 0.414 | 41.1 | 44.5 | 32.5 | |
Kim_CAUET_task4_1 | DCASE2022 SED system1 | Kim2022c | 0.317 | 0.361 | 0.217 | 0.565 | 0.619 | 0.425 | 42.4 | 46.5 | 33.0 | |
Kim_CAUET_task4_2 | DCASE2022 SED system2 | Kim2022c | 0.340 | 0.388 | 0.230 | 0.544 | 0.604 | 0.400 | 41.1 | 45.5 | 31.3 | |
Kim_CAUET_task4_3 | DCASE2022 SED system3 | Kim2022c | 0.338 | 0.381 | 0.245 | 0.554 | 0.603 | 0.426 | 42.4 | 46.4 | 32.7 | |
Li_XJU_task4_1 | DCASE2022 SED system 1 | Li2022c | 0.364 | 0.411 | 0.265 | 0.570 | 0.623 | 0.444 | 44.9 | 48.7 | 35.4 | |
Li_XJU_task4_3 | DCASE2022 SED system 3 | Li2022c | 0.371 | 0.408 | 0.280 | 0.635 | 0.688 | 0.521 | 47.8 | 51.9 | 37.8 | |
Li_XJU_task4_4 | DCASE2022 SED system 4 | Li2022c | 0.195 | 0.222 | 0.158 | 0.683 | 0.740 | 0.537 | 27.8 | 29.8 | 23.3 | |
Li_XJU_task4_2 | DCASE2022 SED system 2 | Li2022c | 0.086 | 0.101 | 0.060 | 0.671 | 0.713 | 0.561 | 15.0 | 15.4 | 14.6 | |
Castorena_UV_task4_3 | Strong and Max-Weak balanced | Castorena2022 | 0.267 | 0.299 | 0.184 | 0.531 | 0.577 | 0.405 | 32.8 | 35.7 | 25.0 | |
Castorena_UV_task4_1 | Max-Weak balanced | Castorena2022 | 0.334 | 0.365 | 0.256 | 0.524 | 0.558 | 0.420 | 39.2 | 43.2 | 29.0 | |
Castorena_UV_task4_2 | Avg-Weak balanced | Castorena2022 | 0.072 | 0.073 | 0.076 | 0.559 | 0.588 | 0.460 | 11.2 | 12.1 | 9.2 |
Without external resources
Rank |
Submission code |
Submission name |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS 1 (Evaluation dataset) |
PSDS 2 (Evaluation dataset) |
PSDS 1 (Development dataset) |
PSDS 2 (Development dataset) |
---|---|---|---|---|---|---|---|---|
Xiao_UCAS_task4_3 | DCASE2022 base system | Xiao2022 | 1.21 | 0.420 | 0.599 | 0.431 | 0.645 | |
Huang_NSYSU_task4_1 | DCASE2022 KDmt SED | Huang2022 | 1.28 | 0.434 | 0.650 | 0.437 | 0.680 | |
Suh_ReturnZero_task4_1 | rtzr_dev-only | Suh2022 | 1.22 | 0.393 | 0.650 | |||
Cheng_CHT_task4_2 | DCASE2022_CRNN_ADJ | Cheng2022 | 0.93 | 0.276 | 0.543 | 0.356 | 0.601 | |
Cheng_CHT_task4_1 | DCASE2022_CRNN_IMP | Cheng2022 | 1.03 | 0.314 | 0.582 | 0.362 | 0.635 | |
Liu_SRCN_task4_4 | DCASE2022 task4 without external data | Liu2022 | 0.24 | 0.025 | 0.219 | 0.037 | 0.244 | |
Kim_LGE_task4_1 | DCASE2022 Kim system 1 | Kim2022a | 1.34 | 0.444 | 0.697 | 0.473 | 0.693 | |
Kim_LGE_task4_2 | DCASE2022 Kim system 2 | Kim2022a | 1.34 | 0.444 | 0.695 | 0.473 | 0.695 | |
Giannakopoulos_UNIPI_task4_2 | Multi-Task Learning using Variational AutoEncoders | Giannakopoulos2022 | 0.21 | 0.029 | 0.184 | 0.046 | 0.165 | |
Mizobuchi_PCO_task4_1 | PCO_task4_SED_A | Mizobuchi2022 | 1.15 | 0.398 | 0.571 | 0.425 | 0.625 | |
KIM_HYU_task4_2 | single1 | Sojeong2022 | 1.28 | 0.421 | 0.664 | 0.422 | 0.667 | |
KIM_HYU_task4_4 | single2 | Sojeong2022 | 1.27 | 0.423 | 0.651 | 0.480 | 0.726 | |
KIM_HYU_task4_1 | train_ensemble1 | Sojeong2022 | 1.19 | 0.390 | 0.620 | 0.434 | 0.675 | |
KIM_HYU_task4_3 | train_ensemble2 | Sojeong2022 | 1.24 | 0.415 | 0.634 | 0.494 | 0.748 | |
Baseline | DCASE2022 SED baseline system | Turpault2022 | 1.00 | 0.315 | 0.543 | 0.342 | 0.527 | |
Dinkel_XiaoRice_task4_1 | SCRATCH | Dinkel2022 | 1.29 | 0.422 | 0.679 | 0.456 | 0.713 | |
Dinkel_XiaoRice_task4_2 | SMALL | Dinkel2022 | 1.15 | 0.373 | 0.613 | 0.395 | 0.631 | |
Dinkel_XiaoRice_task4_4 | TAG | Dinkel2022 | 0.92 | 0.104 | 0.824 | 0.126 | 0.877 | |
Dinkel_XiaoRice_task4_3 | PRECISE | Dinkel2022 | 1.38 | 0.451 | 0.727 | 0.482 | 0.757 | |
Hao_UNISOC_task4_2 | SUBMISSION FOR DCASE2022 TASK4 | Hao2022 | 0.78 | 0.078 | 0.723 | 0.448 | 0.700 | |
Hao_UNISOC_task4_1 | SUBMISSION FOR DCASE2022 TASK4 | Hao2022 | 1.24 | 0.425 | 0.615 | 0.448 | 0.700 | |
Hao_UNISOC_task4_3 | SUBMISSION FOR DCASE2022 TASK4 | Hao2022 | 1.09 | 0.373 | 0.547 | 0.448 | 0.700 | |
Khandelwal_FMSG-NTU_task4_4 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 1.20 | 0.386 | 0.643 | 0.474 | 0.730 | |
deBenito_AUDIAS_task4_4 | 7-Resolution CRNN+Conformer with class-wise median filtering | deBenito2022 | 1.28 | 0.432 | 0.649 | 0.428 | 0.655 | |
deBenito_AUDIAS_task4_1 | 10-Resolution CRNN+Conformer | deBenito2022 | 1.23 | 0.400 | 0.646 | 0.410 | 0.665 | |
deBenito_AUDIAS_task4_2 | 10-Resolution CRNN+Conformer with class-wise median filtering | deBenito2022 | 1.08 | 0.310 | 0.642 | 0.347 | 0.663 | |
deBenito_AUDIAS_task4_3 | 7-Resolution CRNN+Conformer | deBenito2022 | 1.23 | 0.407 | 0.643 | 0.422 | 0.656 | |
Li_WU_task4_1 | ATST-RCT SED system CRNN with RCT | Shao2022 | 1.13 | 0.368 | 0.594 | 0.398 | 0.611 | |
Kim_GIST_task4_3 | Kim_GIST_task4_3 | Kim2022b | 1.43 | 0.500 | 0.695 | 0.452 | 0.682 | |
Kim_GIST_task4_1 | Kim_GIST_task4_1 | Kim2022b | 1.47 | 0.514 | 0.713 | 0.458 | 0.688 | |
Kim_GIST_task4_2 | Kim_GIST_task4_2 | Kim2022b | 1.46 | 0.510 | 0.711 | 0.456 | 0.685 | |
Ebbers_UPB_task4_4 | CRNN ensemble w/o external data | Ebbers2022 | 1.49 | 0.509 | 0.742 | 0.492 | 0.721 | |
Xu_SRCB-BIT_task4_4 | FDY-CRNN-weak train | Xu2022 | 0.75 | 0.049 | 0.738 | 0.058 | 0.813 | |
Nam_KAIST_task4_SED_2 | SED_2 | Nam2022 | 1.25 | 0.409 | 0.656 | 0.470 | 0.700 | |
Nam_KAIST_task4_SED_3 | SED_3 | Nam2022 | 0.77 | 0.057 | 0.747 | 0.061 | 0.822 | |
Nam_KAIST_task4_SED_4 | SED_4 | Nam2022 | 0.77 | 0.055 | 0.747 | 0.058 | 0.820 | |
Nam_KAIST_task4_SED_1 | SED_1 | Nam2022 | 1.24 | 0.404 | 0.653 | 0.470 | 0.687 | |
Blakala_SRPOL_task4_3 | Blakala_SRPOL_task4_3 | Blakala2022 | 0.95 | 0.293 | 0.527 | 0.341 | 0.596 | |
Li_USTC_task4_SED_4 | Mean teacher Pseudo labeling system 4 | Li2022b | 1.34 | 0.429 | 0.723 | 0.436 | 0.778 | |
Li_USTC_task4_SED_3 | Mean teacher Pseudo labeling system 3 | Li2022b | 1.35 | 0.450 | 0.699 | 0.456 | 0.726 | |
Bertola_UPF_task4_1 | DCASE2022 baseline system | Bertola2022 | 0.98 | 0.318 | 0.520 | 0.356 | 0.554 | |
Xie_UESTC_task4_3 | CBAM-T CRNN scratch | Xie2022 | 1.06 | 0.300 | 0.641 | 0.360 | 0.674 | |
Kim_CAUET_task4_1 | DCASE2022 SED system1 | Kim2022c | 1.02 | 0.317 | 0.565 | 0.372 | 0.592 | |
Kim_CAUET_task4_2 | DCASE2022 SED system2 | Kim2022c | 1.04 | 0.340 | 0.544 | 0.377 | 0.585 | |
Kim_CAUET_task4_3 | DCASE2022 SED system3 | Kim2022c | 1.04 | 0.338 | 0.554 | 0.373 | 0.571 | |
Li_XJU_task4_2 | DCASE2022 SED system 2 | Li2022c | 0.75 | 0.086 | 0.671 | 0.095 | 0.754 | |
Li_XJU_task4_1 | DCASE2022 SED system 1 | Li2022c | 1.10 | 0.364 | 0.570 | 0.408 | 0.607 | |
Li_ICT-TOSHIBA_task4_4 | Hybrid system of SEDT and frame-wise model | Li2022d | 0.75 | 0.075 | 0.692 | 0.099 | 0.783 | |
Li_ICT-TOSHIBA_task4_3 | Hybrid system of SEDT and frame-wise model | Li2022d | 1.20 | 0.411 | 0.597 | 0.420 | 0.618 | |
Castorena_UV_task4_3 | Strong and Max-Weak balanced | Castorena2022 | 0.91 | 0.267 | 0.531 | 0.305 | 0.587 | |
Castorena_UV_task4_1 | Max-Weak balanced | Castorena2022 | 1.01 | 0.334 | 0.524 | 0.343 | 0.538 | |
Castorena_UV_task4_2 | Avg-Weak balanced | Castorena2022 | 0.63 | 0.072 | 0.559 | 0.067 | 0.641 |
With external resources
Rank |
Submission code |
Submission name |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS 1 (Evaluation dataset) |
PSDS 2 (Evaluation dataset) |
PSDS 1 (Development dataset) |
PSDS 2 (Development dataset) |
---|---|---|---|---|---|---|---|---|
Zhang_UCAS_task4_2 | DCASE2022 pretrained system 2 | Xiao2022 | 1.41 | 0.484 | 0.697 | 0.481 | 0.694 | |
Zhang_UCAS_task4_1 | DCASE2022 pretrained system 1 | Xiao2022 | 1.39 | 0.472 | 0.700 | 0.475 | 0.688 | |
Zhang_UCAS_task4_4 | DCASE2022 weak_pred system | Xiao2022 | 0.79 | 0.049 | 0.784 | 0.051 | 0.826 | |
Liu_NSYSU_task4_2 | DCASE2022 PANNs SED 2 | Liu2022 | 0.06 | 0.000 | 0.063 | 0.451 | 0.734 | |
Liu_NSYSU_task4_3 | DCASE2022 PANNs SED 3 | Liu2022 | 0.29 | 0.070 | 0.194 | 0.457 | 0.767 | |
Suh_ReturnZero_task4_4 | rtzr_weak-SED | Suh2022 | 0.81 | 0.062 | 0.774 | 0.063 | 0.814 | |
Suh_ReturnZero_task4_2 | rtzr_strong-real | Suh2022 | 1.39 | 0.458 | 0.721 | 0.473 | 0.723 | |
Suh_ReturnZero_task4_3 | rtzr_audioset | Suh2022 | 1.42 | 0.478 | 0.719 | 0.445 | 0.704 | |
Liu_SRCN_task4_2 | DCASE2022 task4 Pre-Trained 2 | Liu2022 | 0.90 | 0.129 | 0.758 | 0.177 | 0.801 | |
Liu_SRCN_task4_1 | DCASE2022 task4 Pre-Trained 1 | Liu2022 | 0.79 | 0.051 | 0.777 | 0.067 | 0.827 | |
Liu_SRCN_task4_3 | DCASE2022 task4 AudioSet strong | Liu2022 | 1.25 | 0.425 | 0.634 | 0.443 | 0.660 | |
Kim_LGE_task4_3 | DCASE2022 Kim system 3 | Kim2022a | 0.81 | 0.062 | 0.781 | 0.068 | 0.830 | |
Kim_LGE_task4_4 | DCASE2022 Kim system 4 | Kim2022a | 1.17 | 0.305 | 0.750 | 0.354 | 0.756 | |
Ryu_Deeply_task4_1 | SKATTN_1 | Ryu2022 | 0.83 | 0.257 | 0.461 | 0.269 | 0.446 | |
Ryu_Deeply_task4_2 | SKATTN_2 | Ryu2022 | 0.66 | 0.156 | 0.449 | 0.161 | 0.452 | |
Giannakopoulos_UNIPI_task4_1 | Multi-Task Learning using Variational AutoEncoders | Giannakopoulos2022 | 0.35 | 0.104 | 0.196 | 0.129 | 0.241 | |
Mizobuchi_PCO_task4_4 | PCO_task4_SED_D | Mizobuchi2022 | 0.82 | 0.062 | 0.787 | 0.075 | 0.852 | |
Mizobuchi_PCO_task4_2 | PCO_task4_SED_B | Mizobuchi2022 | 1.26 | 0.439 | 0.611 | 0.449 | 0.662 | |
Mizobuchi_PCO_task4_3 | PCO_task4_SED_C | Mizobuchi2022 | 0.88 | 0.197 | 0.620 | 0.231 | 0.714 | |
Dinkel_XiaoRice_task4_4 | TAG | Dinkel2022 | 0.92 | 0.104 | 0.824 | 0.126 | 0.877 | |
Dinkel_XiaoRice_task4_3 | PRECISE | Dinkel2022 | 1.38 | 0.451 | 0.727 | 0.482 | 0.757 | |
Khandelwal_FMSG-NTU_task4_1 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 0.83 | 0.158 | 0.633 | 0.088 | 0.837 | |
Khandelwal_FMSG-NTU_task4_2 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 0.80 | 0.082 | 0.731 | 0.102 | 0.840 | |
Khandelwal_FMSG-NTU_task4_3 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 1.26 | 0.410 | 0.664 | 0.472 | 0.721 | |
Li_WU_task4_4 | ATST-RCT SED system ATST ensemble | Shao2022 | 1.41 | 0.486 | 0.694 | 0.477 | 0.734 | |
Li_WU_task4_2 | ATST-RCT SED system ATST small | Shao2022 | 1.36 | 0.476 | 0.666 | 0.460 | 0.698 | |
Li_WU_task4_3 | ATST-RCT SED system ATST base | Shao2022 | 1.40 | 0.482 | 0.693 | 0.468 | 0.702 | |
Kim_GIST_task4_4 | Kim_GIST_task4_4 | Kim2022b | 0.65 | 0.215 | 0.335 | 0.459 | 0.744 | |
Ebbers_UPB_task4_2 | FBCRNN ensemble | Ebbers2022 | 0.83 | 0.047 | 0.824 | 0.080 | 0.868 | |
Ebbers_UPB_task4_1 | CRNN ensemble | Ebbers2022 | 1.59 | 0.552 | 0.786 | 0.512 | 0.772 | |
Ebbers_UPB_task4_3 | tag-conditioned CRNN ensemble | Ebbers2022 | 1.46 | 0.527 | 0.679 | 0.483 | 0.713 | |
Xu_SRCB-BIT_task4_2 | PANNs-FDY-CRNN-wrTCL system 2 | Xu2022 | 1.41 | 0.482 | 0.702 | 0.485 | 0.725 | |
Xu_SRCB-BIT_task4_1 | PANNs-FDY-CRNN-wrTCL system 1 | Xu2022 | 1.32 | 0.452 | 0.662 | 0.481 | 0.710 | |
Xu_SRCB-BIT_task4_3 | PANNs-FDY-CRNN-weak train | Xu2022 | 0.79 | 0.054 | 0.774 | 0.065 | 0.835 | |
Blakala_SRPOL_task4_1 | Blakala_SRPOL_task4_1 | Blakala2022 | 1.11 | 0.365 | 0.584 | 0.374 | 0.583 | |
Blakala_SRPOL_task4_2 | Blakala_SRPOL_task4_2 | Blakala2022 | 0.78 | 0.069 | 0.728 | 0.070 | 0.794 | |
Li_USTC_task4_SED_1 | Mean teacher Pseudo labeling system 1 | Li2022b | 1.41 | 0.480 | 0.713 | 0.479 | 0.735 | |
Li_USTC_task4_SED_2 | Mean teacher Pseudo labeling system 2 | Li2022b | 1.39 | 0.451 | 0.740 | 0.462 | 0.785 | |
He_BYTEDANCE_task4_4 | DCASE2022 SED mean teacher system 4 | He2022 | 0.82 | 0.053 | 0.810 | 0.071 | 0.857 | |
He_BYTEDANCE_task4_2 | DCASE2022 SED mean teacher system 2 | He2022 | 1.48 | 0.503 | 0.749 | 0.521 | 0.771 | |
He_BYTEDANCE_task4_3 | DCASE2022 SED mean teacher system 3 | He2022 | 1.52 | 0.525 | 0.748 | 0.533 | 0.762 | |
He_BYTEDANCE_task4_1 | DCASE2022 SED mean teacher system 1 | He2022 | 1.36 | 0.454 | 0.696 | 0.474 | 0.692 | |
Li_ICT-TOSHIBA_task4_2 | Hybrid system of SEDT and frame-wise model | Li2022d | 0.79 | 0.090 | 0.709 | 0.115 | 0.816 | |
Li_ICT-TOSHIBA_task4_1 | Hybrid system of SEDT and frame-wise model | Li2022d | 1.26 | 0.439 | 0.612 | 0.449 | 0.645 | |
Xie_UESTC_task4_2 | CNN14 FC | Xie2022 | 0.83 | 0.062 | 0.800 | 0.072 | 0.856 | |
Xie_UESTC_task4_1 | CBAM-T CRNN 1 | Xie2022 | 1.36 | 0.418 | 0.757 | 0.460 | 0.768 | |
Xie_UESTC_task4_4 | CBAM-T CRNN 2 | Xie2022 | 1.38 | 0.426 | 0.766 | 0.460 | 0.768 | |
Baseline (AudioSet) | DCASE2022 SED baseline system (AudioSet) | Ronchini2022 | 1.04 | 0.345 | 0.540 | 0.342 | 0.527 | |
Li_XJU_task4_4 | DCASE2022 SED system 4 | Li2022c | 0.93 | 0.195 | 0.683 | 0.215 | 0.735 | |
Li_XJU_task4_3 | DCASE2022 SED system 3 | Li2022c | 1.17 | 0.371 | 0.635 | 0.398 | 0.640 |
Teams ranking
Table including only the best ranking score per submitting team.
Rank |
Submission code (PSDS 1) |
Submission name (PSDS 1) |
Submission code (PSDS 2) |
Submission name (PSDS 2) |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS 1 (Evaluation dataset) |
PSDS 2 (Evaluation dataset) |
---|---|---|---|---|---|---|---|---|
Zhang_UCAS_task4_2 | DCASE2022 pretrained system 2 | Zhang_UCAS_task4_4 | DCASE2022 weak_pred system | Xiao2022 | 1.49 | 0.484 | 0.784 | |
Huang_NSYSU_task4_1 | DCASE2022 KDmt SED | Huang_NSYSU_task4_1 | DCASE2022 KDmt SED | Huang2022 | 1.28 | 0.434 | 0.650 | |
Suh_ReturnZero_task4_3 | rtzr_audioset | Suh_ReturnZero_task4_4 | rtzr_weak-SED | Suh2022 | 1.47 | 0.478 | 0.774 | |
Cheng_CHT_task4_1 | DCASE2022_CRNN_IMP | Cheng_CHT_task4_1 | DCASE2022_CRNN_IMP | Cheng2022 | 1.03 | 0.314 | 0.582 | |
Liu_SRCN_task4_3 | DCASE2022 task4 AudioSet strong | Liu_SRCN_task4_1 | DCASE2022 task4 Pre-Trained 1 | Liu2022 | 1.38 | 0.425 | 0.777 | |
Kim_LGE_task4_2 | DCASE2022 Kim system 2 | Kim_LGE_task4_3 | DCASE2022 Kim system 3 | Kim2022a | 1.42 | 0.444 | 0.781 | |
Ryu_Deeply_task4_1 | SKATTN_1 | Ryu_Deeply_task4_1 | SKATTN_1 | Ryu2022 | 0.83 | 0.257 | 0.461 | |
Giannakopoulos_UNIPI_task4_1 | Multi-Task Learning using Variational AutoEncoders | Giannakopoulos_UNIPI_task4_1 | Multi-Task Learning using Variational AutoEncoders | Giannakopoulos2022 | 0.35 | 0.104 | 0.196 | |
Mizobuchi_PCO_task4_2 | PCO_task4_SED_B | Mizobuchi_PCO_task4_4 | PCO_task4_SED_D | Mizobuchi2022 | 1.42 | 0.439 | 0.787 | |
KIM_HYU_task4_4 | single2 | KIM_HYU_task4_2 | single1 | Sojeong2022 | 1.28 | 0.423 | 0.664 | |
Baseline | DCASE2022 SED baseline system | Baseline | DCASE2022 SED baseline system | Turpault2022 | 1.00 | 0.315 | 0.543 | |
Dinkel_XiaoRice_task4_3 | PRECISE | Dinkel_XiaoRice_task4_4 | TAG | Dinkel2022 | 1.47 | 0.451 | 0.824 | |
Hao_UNISOC_task4_1 | SUBMISSION FOR DCASE2022 TASK4 | Hao_UNISOC_task4_2 | SUBMISSION FOR DCASE2022 TASK4 | Hao2022 | 1.34 | 0.425 | 0.723 | |
Khandelwal_FMSG-NTU_task4_3 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal_FMSG-NTU_task4_2 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 1.32 | 0.410 | 0.731 | |
deBenito_AUDIAS_task4_4 | 7-Resolution CRNN+Conformer with class-wise median filtering | deBenito_AUDIAS_task4_4 | 7-Resolution CRNN+Conformer with class-wise median filtering | deBenito2022 | 1.28 | 0.432 | 0.649 | |
Li_WU_task4_4 | ATST-RCT SED system ATST ensemble | Li_WU_task4_4 | ATST-RCT SED system ATST ensemble | Shao2022 | 1.41 | 0.486 | 0.694 | |
Kim_GIST_task4_1 | Kim_GIST_task4_1 | Kim_GIST_task4_1 | Kim_GIST_task4_1 | Kim2022b | 1.47 | 0.514 | 0.713 | |
Ebbers_UPB_task4_1 | CRNN ensemble | Ebbers_UPB_task4_2 | FBCRNN ensemble | Ebbers2022 | 1.63 | 0.552 | 0.824 | |
Xu_SRCB-BIT_task4_2 | PANNs-FDY-CRNN-wrTCL system 2 | Xu_SRCB-BIT_task4_3 | PANNs-FDY-CRNN-weak train | Xu2022 | 1.47 | 0.482 | 0.774 | |
Nam_KAIST_task4_SED_2 | SED_2 | Nam_KAIST_task4_SED_3 | SED_3 | Nam2022 | 1.33 | 0.409 | 0.747 | |
Blakala_SRPOL_task4_1 | Blakala_SRPOL_task4_1 | Blakala_SRPOL_task4_2 | Blakala_SRPOL_task4_2 | Blakala2022 | 1.25 | 0.365 | 0.728 | |
Li_USTC_task4_SED_1 | Mean teacher Pseudo labeling system 1 | Li_USTC_task4_SED_2 | Mean teacher Pseudo labeling system 2 | Li2022b | 1.44 | 0.480 | 0.740 | |
Bertola_UPF_task4_1 | DCASE2022 baseline system | Bertola_UPF_task4_1 | DCASE2022 baseline system | Bertola2022 | 0.98 | 0.318 | 0.520 | |
He_BYTEDANCE_task4_3 | DCASE2022 SED mean teacher system 3 | He_BYTEDANCE_task4_4 | DCASE2022 SED mean teacher system 4 | He2022 | 1.57 | 0.525 | 0.810 | |
Li_ICT-TOSHIBA_task4_1 | Hybrid system of SEDT and frame-wise model | Li_ICT-TOSHIBA_task4_2 | Hybrid system of SEDT and frame-wise model | Li2022d | 1.35 | 0.439 | 0.709 | |
Xie_UESTC_task4_4 | CBAM-T CRNN 2 | Xie_UESTC_task4_2 | CNN14 FC | Xie2022 | 1.41 | 0.426 | 0.800 | |
Baseline (AudioSet) | DCASE2022 SED baseline system (AudioSet) | Baseline (AudioSet) | DCASE2022 SED baseline system (AudioSet) | Ronchini2022 | 1.04 | 0.345 | 0.540 | |
Kim_CAUET_task4_2 | DCASE2022 SED system2 | Kim_CAUET_task4_1 | DCASE2022 SED system1 | Kim2022c | 1.06 | 0.340 | 0.565 | |
Li_XJU_task4_3 | DCASE2022 SED system 3 | Li_XJU_task4_4 | DCASE2022 SED system 4 | Li2022c | 1.21 | 0.371 | 0.683 | |
Castorena_UV_task4_1 | Max-Weak balanced | Castorena_UV_task4_2 | Avg-Weak balanced | Castorena2022 | 1.04 | 0.334 | 0.559 |
Supplementary metrics
Rank |
Submission code (PSDS 1) |
Submission name (PSDS 1) |
Submission code (PSDS 2) |
Submission name (PSDS 2) |
Technical Report |
Ranking score (Evaluation dataset) |
Ranking score (Public evaluation) |
Ranking score (Vimeo dataset) |
---|---|---|---|---|---|---|---|---|
Zhang_UCAS_task4_2 | DCASE2022 pretrained system 2 | Zhang_UCAS_task4_4 | DCASE2022 weak_pred system | Xiao2022 | 1.49 | 1.43 | 1.69 | |
Huang_NSYSU_task4_1 | DCASE2022 KDmt SED | Huang_NSYSU_task4_1 | DCASE2022 KDmt SED | Huang2022 | 1.28 | 1.26 | 1.37 | |
Suh_ReturnZero_task4_3 | rtzr_audioset | Suh_ReturnZero_task4_4 | rtzr_weak-SED | Suh2022 | 1.47 | 1.39 | 1.71 | |
Cheng_CHT_task4_1 | DCASE2022_CRNN_IMP | Cheng_CHT_task4_1 | DCASE2022_CRNN_IMP | Cheng2022 | 1.03 | 1.01 | 1.11 | |
Liu_SRCN_task4_3 | DCASE2022 task4 AudioSet strong | Liu_SRCN_task4_1 | DCASE2022 task4 Pre-Trained 1 | Liu2022 | 1.38 | 1.33 | 1.57 | |
Kim_LGE_task4_2 | DCASE2022 Kim system 2 | Kim_LGE_task4_3 | DCASE2022 Kim system 3 | Kim2022a | 1.42 | 1.38 | 1.60 | |
Ryu_Deeply_task4_1 | SKATTN_1 | Ryu_Deeply_task4_1 | SKATTN_1 | Ryu2022 | 0.83 | 0.82 | 0.89 | |
Giannakopoulos_UNIPI_task4_1 | Multi-Task Learning using Variational AutoEncoders | Giannakopoulos_UNIPI_task4_1 | Multi-Task Learning using Variational AutoEncoders | Giannakopoulos2022 | 0.35 | 0.35 | 0.27 | |
Mizobuchi_PCO_task4_2 | PCO_task4_SED_B | Mizobuchi_PCO_task4_4 | PCO_task4_SED_D | Mizobuchi2022 | 1.42 | 1.37 | 1.58 | |
KIM_HYU_task4_4 | single2 | KIM_HYU_task4_2 | single1 | Sojeong2022 | 1.28 | 1.27 | 1.34 | |
Baseline | DCASE2022 SED baseline system | Baseline | DCASE2022 SED baseline system | Turpault2022 | 1.00 | 1.00 | 1.00 | |
Dinkel_XiaoRice_task4_3 | PRECISE | Dinkel_XiaoRice_task4_4 | TAG | Dinkel2022 | 1.47 | 1.42 | 1.64 | |
Hao_UNISOC_task4_1 | SUBMISSION FOR DCASE2022 TASK4 | Hao_UNISOC_task4_2 | SUBMISSION FOR DCASE2022 TASK4 | Hao2022 | 1.34 | 1.31 | 1.47 | |
Khandelwal_FMSG-NTU_task4_3 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal_FMSG-NTU_task4_2 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 1.32 | 1.28 | 1.49 | |
deBenito_AUDIAS_task4_4 | 7-Resolution CRNN+Conformer with class-wise median filtering | deBenito_AUDIAS_task4_4 | 7-Resolution CRNN+Conformer with class-wise median filtering | deBenito2022 | 1.28 | 1.25 | 1.39 | |
Li_WU_task4_4 | ATST-RCT SED system ATST ensemble | Li_WU_task4_4 | ATST-RCT SED system ATST ensemble | Shao2022 | 1.41 | 1.37 | 1.58 | |
Kim_GIST_task4_1 | Kim_GIST_task4_1 | Kim_GIST_task4_1 | Kim_GIST_task4_1 | Kim2022b | 1.47 | 1.41 | 1.65 | |
Ebbers_UPB_task4_1 | CRNN ensemble | Ebbers_UPB_task4_2 | FBCRNN ensemble | Ebbers2022 | 1.63 | 1.55 | 1.97 | |
Xu_SRCB-BIT_task4_2 | PANNs-FDY-CRNN-wrTCL system 2 | Xu_SRCB-BIT_task4_3 | PANNs-FDY-CRNN-weak train | Xu2022 | 1.47 | 1.41 | 1.66 | |
Nam_KAIST_task4_SED_2 | SED_2 | Nam_KAIST_task4_SED_3 | SED_3 | Nam2022 | 1.33 | 1.27 | 1.56 | |
Blakala_SRPOL_task4_1 | Blakala_SRPOL_task4_1 | Blakala_SRPOL_task4_2 | Blakala_SRPOL_task4_2 | Blakala2022 | 1.25 | 1.19 | 1.45 | |
Li_USTC_task4_SED_1 | Mean teacher Pseudo labeling system 1 | Li_USTC_task4_SED_2 | Mean teacher Pseudo labeling system 2 | Li2022b | 1.44 | 1.40 | 1.56 | |
Bertola_UPF_task4_1 | DCASE2022 baseline system | Bertola_UPF_task4_1 | DCASE2022 baseline system | Bertola2022 | 0.98 | 0.96 | 1.05 | |
He_BYTEDANCE_task4_3 | DCASE2022 SED mean teacher system 3 | He_BYTEDANCE_task4_4 | DCASE2022 SED mean teacher system 4 | He2022 | 1.57 | 1.51 | 1.80 | |
Li_ICT-TOSHIBA_task4_1 | Hybrid system of SEDT and frame-wise model | Li_ICT-TOSHIBA_task4_2 | Hybrid system of SEDT and frame-wise model | Li2022d | 1.35 | 1.30 | 1.44 | |
Xie_UESTC_task4_4 | CBAM-T CRNN 2 | Xie_UESTC_task4_2 | CNN14 FC | Xie2022 | 1.41 | 1.35 | 1.64 | |
Baseline (AudioSet) | DCASE2022 SED baseline system (AudioSet) | Baseline (AudioSet) | DCASE2022 SED baseline system (AudioSet) | Ronchini2022 | 1.04 | 1.04 | 1.08 | |
Kim_CAUET_task4_2 | DCASE2022 SED system2 | Kim_CAUET_task4_1 | DCASE2022 SED system1 | Kim2022c | 1.06 | 1.06 | 1.04 | |
Li_XJU_task4_3 | DCASE2022 SED system 3 | Li_XJU_task4_4 | DCASE2022 SED system 4 | Li2022c | 1.21 | 1.19 | 1.29 | |
Castorena_UV_task4_1 | Max-Weak balanced | Castorena_UV_task4_2 | Avg-Weak balanced | Castorena2022 | 1.04 | 1.00 | 1.14 |
Without external resources
Rank |
Submission code (PSDS 1) |
Submission name (PSDS 1) |
Submission code (PSDS 2) |
Submission name (PSDS 2) |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS 1 (Evaluation dataset) |
PSDS 2 (Evaluation dataset) |
---|---|---|---|---|---|---|---|---|
Xiao_UCAS_task4_3 | DCASE2022 base system | Xiao_UCAS_task4_3 | DCASE2022 base system | Xiao2022 | 1.21 | 0.420 | 0.599 | |
Huang_NSYSU_task4_1 | DCASE2022 KDmt SED | Huang_NSYSU_task4_1 | DCASE2022 KDmt SED | Huang2022 | 1.28 | 0.434 | 0.650 | |
Suh_ReturnZero_task4_1 | rtzr_dev-only | Suh_ReturnZero_task4_1 | rtzr_dev-only | Suh2022 | 1.22 | 0.393 | 0.650 | |
Cheng_CHT_task4_1 | DCASE2022_CRNN_IMP | Cheng_CHT_task4_1 | DCASE2022_CRNN_IMP | Cheng2022 | 1.03 | 0.314 | 0.582 | |
Liu_SRCN_task4_4 | DCASE2022 task4 without external data | Liu_SRCN_task4_4 | DCASE2022 task4 without external data | Liu2022 | 0.24 | 0.025 | 0.219 | |
Kim_LGE_task4_2 | DCASE2022 Kim system 2 | Kim_LGE_task4_1 | DCASE2022 Kim system 1 | Kim2022a | 1.34 | 0.444 | 0.697 | |
Giannakopoulos_UNIPI_task4_2 | Multi-Task Learning using Variational AutoEncoders | Giannakopoulos_UNIPI_task4_2 | Multi-Task Learning using Variational AutoEncoders | Giannakopoulos2022 | 0.21 | 0.029 | 0.184 | |
Mizobuchi_PCO_task4_1 | PCO_task4_SED_A | Mizobuchi_PCO_task4_1 | PCO_task4_SED_A | Mizobuchi2022 | 1.15 | 0.398 | 0.571 | |
KIM_HYU_task4_4 | single2 | KIM_HYU_task4_2 | single1 | Sojeong2022 | 1.28 | 0.423 | 0.664 | |
Baseline | DCASE2022 SED baseline system | Baseline | DCASE2022 SED baseline system | Turpault2022 | 1.00 | 0.315 | 0.543 | |
Dinkel_XiaoRice_task4_3 | PRECISE | Dinkel_XiaoRice_task4_4 | TAG | Dinkel2022 | 1.47 | 0.451 | 0.824 | |
Hao_UNISOC_task4_1 | SUBMISSION FOR DCASE2022 TASK4 | Hao_UNISOC_task4_2 | SUBMISSION FOR DCASE2022 TASK4 | Hao2022 | 1.34 | 0.425 | 0.723 | |
Khandelwal_FMSG-NTU_task4_4 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal_FMSG-NTU_task4_4 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 1.20 | 0.386 | 0.643 | |
deBenito_AUDIAS_task4_4 | 7-Resolution CRNN+Conformer with class-wise median filtering | deBenito_AUDIAS_task4_4 | 7-Resolution CRNN+Conformer with class-wise median filtering | deBenito2022 | 1.28 | 0.432 | 0.649 | |
Li_WU_task4_1 | ATST-RCT SED system CRNN with RCT | Li_WU_task4_1 | ATST-RCT SED system CRNN with RCT | Shao2022 | 1.13 | 0.368 | 0.594 | |
Kim_GIST_task4_1 | Kim_GIST_task4_1 | Kim_GIST_task4_1 | Kim_GIST_task4_1 | Kim2022b | 1.47 | 0.514 | 0.713 | |
Ebbers_UPB_task4_4 | CRNN ensemble w/o external data | Ebbers_UPB_task4_4 | CRNN ensemble w/o external data | Ebbers2022 | 1.49 | 0.509 | 0.742 | |
Xu_SRCB-BIT_task4_4 | FDY-CRNN-weak train | Xu_SRCB-BIT_task4_4 | FDY-CRNN-weak train | Xu2022 | 0.75 | 0.049 | 0.738 | |
Nam_KAIST_task4_SED_2 | SED_2 | Nam_KAIST_task4_SED_3 | SED_3 | Nam2022 | 1.33 | 0.409 | 0.747 | |
Blakala_SRPOL_task4_3 | Blakala_SRPOL_task4_3 | Blakala_SRPOL_task4_3 | Blakala_SRPOL_task4_3 | Blakala2022 | 0.95 | 0.293 | 0.527 | |
Li_USTC_task4_SED_3 | Mean teacher Pseudo labeling system 3 | Li_USTC_task4_SED_4 | Mean teacher Pseudo labeling system 4 | Li2022b | 1.38 | 0.450 | 0.723 | |
Bertola_UPF_task4_1 | DCASE2022 baseline system | Bertola_UPF_task4_1 | DCASE2022 baseline system | Bertola2022 | 0.98 | 0.318 | 0.520 | |
Xie_UESTC_task4_3 | CBAM-T CRNN scratch | Xie_UESTC_task4_3 | CBAM-T CRNN scratch | Xie2022 | 1.06 | 0.300 | 0.641 | |
Kim_CAUET_task4_2 | DCASE2022 SED system2 | Kim_CAUET_task4_1 | DCASE2022 SED system1 | Kim2022c | 1.06 | 0.340 | 0.565 | |
Li_XJU_task4_1 | DCASE2022 SED system 1 | Li_XJU_task4_2 | DCASE2022 SED system 2 | Li2022c | 1.19 | 0.364 | 0.671 | |
Li_ICT-TOSHIBA_task4_3 | Hybrid system of SEDT and frame-wise model | Li_ICT-TOSHIBA_task4_4 | Hybrid system of SEDT and frame-wise model | Li2022d | 1.29 | 0.411 | 0.692 | |
Castorena_UV_task4_1 | Max-Weak balanced | Castorena_UV_task4_2 | Avg-Weak balanced | Castorena2022 | 1.04 | 0.334 | 0.559 |
With external resources
Rank |
Submission code (PSDS 1) |
Submission name (PSDS 1) |
Submission code (PSDS 2) |
Submission name (PSDS 2) |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS 1 (Evaluation dataset) |
PSDS 2 (Evaluation dataset) |
---|---|---|---|---|---|---|---|---|
Zhang_UCAS_task4_2 | DCASE2022 pretrained system 2 | Zhang_UCAS_task4_4 | DCASE2022 weak_pred system | Xiao2022 | 1.49 | 0.484 | 0.784 | |
Suh_ReturnZero_task4_3 | rtzr_audioset | Suh_ReturnZero_task4_4 | rtzr_weak-SED | Suh2022 | 1.47 | 0.478 | 0.774 | |
Liu_SRCN_task4_3 | DCASE2022 task4 AudioSet strong | Liu_SRCN_task4_1 | DCASE2022 task4 Pre-Trained 1 | Liu2022 | 1.38 | 0.425 | 0.777 | |
Kim_LGE_task4_4 | DCASE2022 Kim system 4 | Kim_LGE_task4_3 | DCASE2022 Kim system 3 | Kim2022a | 1.20 | 0.305 | 0.781 | |
Ryu_Deeply_task4_1 | SKATTN_1 | Ryu_Deeply_task4_1 | SKATTN_1 | Ryu2022 | 0.83 | 0.257 | 0.461 | |
Giannakopoulos_UNIPI_task4_1 | Multi-Task Learning using Variational AutoEncoders | Giannakopoulos_UNIPI_task4_1 | Multi-Task Learning using Variational AutoEncoders | Giannakopoulos2022 | 0.35 | 0.104 | 0.196 | |
Mizobuchi_PCO_task4_2 | PCO_task4_SED_B | Mizobuchi_PCO_task4_4 | PCO_task4_SED_D | Mizobuchi2022 | 1.42 | 0.439 | 0.787 | |
Dinkel_XiaoRice_task4_3 | PRECISE | Dinkel_XiaoRice_task4_4 | TAG | Dinkel2022 | 1.47 | 0.451 | 0.824 | |
Khandelwal_FMSG-NTU_task4_3 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal_FMSG-NTU_task4_2 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 1.32 | 0.410 | 0.731 | |
Li_WU_task4_4 | ATST-RCT SED system ATST ensemble | Li_WU_task4_4 | ATST-RCT SED system ATST ensemble | Shao2022 | 1.41 | 0.486 | 0.694 | |
Kim_GIST_task4_4 | Kim_GIST_task4_4 | Kim_GIST_task4_4 | Kim_GIST_task4_4 | Kim2022b | 0.65 | 0.215 | 0.335 | |
Ebbers_UPB_task4_1 | CRNN ensemble | Ebbers_UPB_task4_2 | FBCRNN ensemble | Ebbers2022 | 1.63 | 0.552 | 0.824 | |
Xu_SRCB-BIT_task4_2 | PANNs-FDY-CRNN-wrTCL system 2 | Xu_SRCB-BIT_task4_3 | PANNs-FDY-CRNN-weak train | Xu2022 | 1.47 | 0.482 | 0.774 | |
Blakala_SRPOL_task4_1 | Blakala_SRPOL_task4_1 | Blakala_SRPOL_task4_2 | Blakala_SRPOL_task4_2 | Blakala2022 | 1.25 | 0.365 | 0.728 | |
Li_USTC_task4_SED_1 | Mean teacher Pseudo labeling system 1 | Li_USTC_task4_SED_2 | Mean teacher Pseudo labeling system 2 | Li2022b | 1.44 | 0.480 | 0.740 | |
He_BYTEDANCE_task4_3 | DCASE2022 SED mean teacher system 3 | He_BYTEDANCE_task4_4 | DCASE2022 SED mean teacher system 4 | He2022 | 1.57 | 0.525 | 0.810 | |
Li_ICT-TOSHIBA_task4_1 | Hybrid system of SEDT and frame-wise model | Li_ICT-TOSHIBA_task4_2 | Hybrid system of SEDT and frame-wise model | Li2022d | 1.35 | 0.439 | 0.709 | |
Xie_UESTC_task4_4 | CBAM-T CRNN 2 | Xie_UESTC_task4_2 | CNN14 FC | Xie2022 | 1.41 | 0.426 | 0.800 | |
Baseline (AudioSet) | DCASE2022 SED baseline system (AudioSet) | Baseline (AudioSet) | DCASE2022 SED baseline system (AudioSet) | Ronchini2022 | 1.04 | 0.345 | 0.540 | |
Li_XJU_task4_3 | DCASE2022 SED system 3 | Li_XJU_task4_4 | DCASE2022 SED system 4 | Li2022c | 1.21 | 0.371 | 0.683 |
Class-wise performance
Rank |
Submission code |
Submission name |
Technical Report |
Ranking score (Evaluation dataset) |
Alarm Bell Ringing |
Blender | Cat | Dishes | Dog |
Electric shave toothbrush |
Frying |
Running water |
Speech |
Vacuum cleaner |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Zhang_UCAS_task4_2 | DCASE2022 pretrained system 2 | Xiao2022 | 1.41 | 56.4 | 66.1 | 72.2 | 40.8 | 45.4 | 58.2 | 54.6 | 39.9 | 62.4 | 69.2 | |
Zhang_UCAS_task4_1 | DCASE2022 pretrained system 1 | Xiao2022 | 1.39 | 58.8 | 61.9 | 72.8 | 42.9 | 47.2 | 61.3 | 49.8 | 35.9 | 65.2 | 66.4 | |
Zhang_UCAS_task4_3 | DCASE2022 base system | Xiao2022 | 1.21 | 45.5 | 54.5 | 70.3 | 39.3 | 49.8 | 48.8 | 43.3 | 36.4 | 62.4 | 62.5 | |
Zhang_UCAS_task4_4 | DCASE2022 weak_pred system | Xiao2022 | 0.79 | 6.6 | 6.1 | 0.9 | 0.0 | 0.3 | 15.2 | 52.1 | 29.8 | 0.3 | 38.7 | |
Liu_NSYSU_task4_2 | DCASE2022 PANNs SED 2 | Liu2022 | 0.06 | 13.1 | 1.6 | 11.0 | 0.0 | 29.5 | 1.0 | 49.1 | 0.0 | |||
Liu_NSYSU_task4_3 | DCASE2022 PANNs SED 3 | Liu2022 | 0.29 | 11.3 | 1.6 | 17.5 | 0.5 | 14.1 | 0.0 | 0.0 | 38.4 | 0.0 | ||
Huang_NSYSU_task4_1 | DCASE2022 KDmt SED | Huang2022 | 1.28 | 45.1 | 48.9 | 65.5 | 35.8 | 47.8 | 54.7 | 41.3 | 32.5 | 70.3 | 34.3 | |
Liu_NSYSU_task4_4 | DCASE2022 PANNs SED 4 | Liu2022 | 0.21 | 8.5 | 1.6 | 15.0 | 0.0 | 11.2 | 0.0 | 37.5 | 1.5 | |||
Suh_ReturnZero_task4_1 | rtzr_dev-only | Suh2022 | 1.22 | 26.0 | 53.5 | 71.7 | 40.3 | 45.1 | 39.8 | 46.0 | 33.6 | 52.6 | 59.6 | |
Suh_ReturnZero_task4_4 | rtzr_weak-SED | Suh2022 | 0.81 | 5.8 | 2.8 | 0.5 | 0.0 | 0.3 | 14.7 | 49.2 | 18.5 | 0.2 | 37.0 | |
Suh_ReturnZero_task4_2 | rtzr_strong-real | Suh2022 | 1.39 | 37.8 | 65.3 | 77.9 | 44.5 | 45.6 | 53.4 | 56.4 | 33.8 | 56.4 | 60.1 | |
Suh_ReturnZero_task4_3 | rtzr_audioset | Suh2022 | 1.42 | 39.7 | 62.9 | 77.8 | 47.1 | 46.0 | 52.5 | 63.0 | 32.8 | 55.1 | 61.1 | |
Cheng_CHT_task4_2 | DCASE2022_CRNN_ADJ | Cheng2022 | 0.93 | 31.2 | 39.8 | 67.4 | 32.8 | 32.5 | 41.2 | 46.2 | 29.5 | 53.8 | 34.4 | |
Cheng_CHT_task4_1 | DCASE2022_CRNN_IMP | Cheng2022 | 1.03 | 31.6 | 48.9 | 65.6 | 28.6 | 24.0 | 45.6 | 45.8 | 32.4 | 51.0 | 59.0 | |
Liu_SRCN_task4_2 | DCASE2022 task4 Pre-Trained 2 | Liu2022 | 0.90 | 10.9 | 20.8 | 2.3 | 0.7 | 1.7 | 19.7 | 58.3 | 27.1 | 4.2 | 47.8 | |
Liu_SRCN_task4_1 | DCASE2022 task4 Pre-Trained 1 | Liu2022 | 0.79 | 4.8 | 4.5 | 0.9 | 0.0 | 0.3 | 15.5 | 49.2 | 21.8 | 0.2 | 38.9 | |
Liu_SRCN_task4_4 | DCASE2022 task4 without external data | Liu2022 | 0.24 | 2.7 | 2.2 | 0.0 | 29.6 | 0.2 | 17.7 | |||||
Liu_SRCN_task4_3 | DCASE2022 task4 AudioSet strong | Liu2022 | 1.25 | 36.3 | 58.7 | 69.8 | 40.6 | 48.4 | 36.4 | 49.6 | 27.3 | 69.2 | 56.9 | |
Kim_LGE_task4_1 | DCASE2022 Kim system 1 | Kim2022a | 1.34 | 36.8 | 52.5 | 73.3 | 46.0 | 45.1 | 38.6 | 50.8 | 30.3 | 70.2 | 66.7 | |
Kim_LGE_task4_3 | DCASE2022 Kim system 3 | Kim2022a | 0.81 | 3.3 | 2.9 | 0.5 | 0.0 | 0.3 | 11.8 | 50.2 | 22.5 | 0.2 | 36.5 | |
Kim_LGE_task4_4 | DCASE2022 Kim system 4 | Kim2022a | 1.17 | 12.2 | 34.1 | 12.0 | 8.8 | 4.3 | 17.9 | 50.2 | 28.1 | 47.6 | 58.8 | |
Kim_LGE_task4_2 | DCASE2022 Kim system 2 | Kim2022a | 1.34 | 36.8 | 52.8 | 73.3 | 46.1 | 45.1 | 39.3 | 50.8 | 30.2 | 70.2 | 66.9 | |
Ryu_Deeply_task4_1 | SKATTN_1 | Ryu2022 | 0.83 | 23.7 | 30.1 | 39.9 | 1.1 | 15.1 | 36.2 | 46.3 | 29.1 | 47.9 | 36.1 | |
Ryu_Deeply_task4_2 | SKATTN_2 | Ryu2022 | 0.66 | 11.3 | 4.4 | 18.4 | 10.1 | 5.7 | 16.8 | 38.9 | 18.1 | 36.8 | 33.0 | |
Giannakopoulos_UNIPI_task4_2 | Multi-Task Learning using Variational AutoEncoders | Giannakopoulos2022 | 0.21 | 4.5 | 11.2 | 6.5 | 2.5 | 2.0 | 8.5 | 17.4 | 9.6 | 3.9 | 27.9 | |
Giannakopoulos_UNIPI_task4_1 | Multi-Task Learning using Variational AutoEncoders | Giannakopoulos2022 | 0.35 | 22.9 | 35.1 | 29.2 | 19.5 | 11.5 | 21.0 | 27.9 | 15.9 | 45.5 | 37.0 | |
Mizobuchi_PCO_task4_4 | PCO_task4_SED_D | Mizobuchi2022 | 0.82 | 3.9 | 5.5 | 0.9 | 0.0 | 0.3 | 14.5 | 47.5 | 23.9 | 0.2 | 39.8 | |
Mizobuchi_PCO_task4_2 | PCO_task4_SED_B | Mizobuchi2022 | 1.26 | 46.5 | 44.4 | 71.4 | 40.8 | 43.5 | 44.8 | 45.4 | 37.0 | 64.7 | 58.5 | |
Mizobuchi_PCO_task4_3 | PCO_task4_SED_C | Mizobuchi2022 | 0.88 | 11.8 | 34.8 | 21.4 | 1.6 | 2.5 | 34.8 | 27.8 | 30.4 | 12.8 | 39.8 | |
Mizobuchi_PCO_task4_1 | PCO_task4_SED_A | Mizobuchi2022 | 1.15 | 34.6 | 47.5 | 69.4 | 36.5 | 48.3 | 40.5 | 49.4 | 38.8 | 61.0 | 50.2 | |
KIM_HYU_task4_2 | single1 | Sojeong2022 | 1.28 | 43.1 | 53.5 | 70.8 | 33.1 | 44.3 | 42.9 | 50.4 | 35.3 | 62.9 | 59.6 | |
KIM_HYU_task4_4 | single2 | Sojeong2022 | 1.27 | 42.7 | 58.2 | 68.7 | 31.2 | 43.1 | 55.3 | 48.9 | 32.6 | 61.3 | 62.2 | |
KIM_HYU_task4_1 | train_ensemble1 | Sojeong2022 | 1.19 | 42.5 | 53.6 | 69.6 | 29.8 | 44.1 | 43.2 | 42.7 | 37.0 | 61.2 | 57.5 | |
KIM_HYU_task4_3 | train_ensemble2 | Sojeong2022 | 1.24 | 39.9 | 58.5 | 68.6 | 32.0 | 39.9 | 48.4 | 49.4 | 32.0 | 59.1 | 53.3 | |
Baseline | DCASE2022 SED baseline system | Turpault2022 | 1.00 | 32.2 | 39.0 | 62.4 | 28.6 | 34.5 | 21.1 | 37.2 | 26.4 | 49.7 | 42.0 | |
Dinkel_XiaoRice_task4_1 | SCRATCH | Dinkel2022 | 1.29 | 36.7 | 51.9 | 61.1 | 30.9 | 40.8 | 47.9 | 50.4 | 29.8 | 60.2 | 46.6 | |
Dinkel_XiaoRice_task4_2 | SMALL | Dinkel2022 | 1.15 | 36.8 | 37.6 | 57.3 | 28.2 | 39.2 | 29.1 | 46.1 | 25.6 | 58.3 | 34.4 | |
Dinkel_XiaoRice_task4_4 | TAG | Dinkel2022 | 0.92 | 4.4 | 4.6 | 0.5 | 0.0 | 0.3 | 13.5 | 53.1 | 24.2 | 0.4 | 41.0 | |
Dinkel_XiaoRice_task4_3 | PRECISE | Dinkel2022 | 1.38 | 36.5 | 55.6 | 65.0 | 35.0 | 41.7 | 48.2 | 56.0 | 33.6 | 51.4 | 52.0 | |
Hao_UNISOC_task4_2 | SUBMISSION FOR DCASE2022 TASK4 | Hao2022 | 0.78 | 3.5 | 4.4 | 0.5 | 0.0 | 0.3 | 15.2 | 29.6 | 18.3 | 0.3 | 35.8 | |
Hao_UNISOC_task4_1 | SUBMISSION FOR DCASE2022 TASK4 | Hao2022 | 1.24 | 49.8 | 46.2 | 72.1 | 28.7 | 47.4 | 49.6 | 25.8 | 27.6 | 65.1 | 58.7 | |
Hao_UNISOC_task4_3 | SUBMISSION FOR DCASE2022 TASK4 | Hao2022 | 1.09 | 41.6 | 46.2 | 71.6 | 29.7 | 45.5 | 42.9 | 25.7 | 27.5 | 64.5 | 58.4 | |
Khandelwal_FMSG-NTU_task4_1 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 0.83 | 18.8 | 14.9 | 7.4 | 2.5 | 2.7 | 32.0 | 48.0 | 23.5 | 12.9 | 39.7 | |
Khandelwal_FMSG-NTU_task4_2 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 0.80 | 3.9 | 3.0 | 0.5 | 0.0 | 0.3 | 14.8 | 45.2 | 24.5 | 0.2 | 38.6 | |
Khandelwal_FMSG-NTU_task4_3 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 1.26 | 41.5 | 54.0 | 69.4 | 45.8 | 41.4 | 47.8 | 51.0 | 40.2 | 51.6 | 60.5 | |
Khandelwal_FMSG-NTU_task4_4 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 1.20 | 27.7 | 46.4 | 67.2 | 40.9 | 30.3 | 39.3 | 49.1 | 40.9 | 48.7 | 56.1 | |
deBenito_AUDIAS_task4_4 | 7-Resolution CRNN+Conformer with class-wise median filtering | deBenito2022 | 1.28 | 38.6 | 50.4 | 65.7 | 32.6 | 42.3 | 45.2 | 49.8 | 30.5 | 51.5 | 58.2 | |
deBenito_AUDIAS_task4_1 | 10-Resolution CRNN+Conformer | deBenito2022 | 1.23 | 39.8 | 55.0 | 66.5 | 26.0 | 34.1 | 44.6 | 42.2 | 34.5 | 52.4 | 54.7 | |
deBenito_AUDIAS_task4_2 | 10-Resolution CRNN+Conformer with class-wise median filtering | deBenito2022 | 1.08 | 40.3 | 49.3 | 54.3 | 3.5 | 8.1 | 45.1 | 44.3 | 34.1 | 42.3 | 55.6 | |
deBenito_AUDIAS_task4_3 | 7-Resolution CRNN+Conformer | deBenito2022 | 1.23 | 39.3 | 55.9 | 69.1 | 29.0 | 36.5 | 45.1 | 47.0 | 31.6 | 54.1 | 57.3 | |
Li_WU_task4_4 | ATST-RCT SED system ATST ensemble | Shao2022 | 1.41 | 39.6 | 47.5 | 73.3 | 43.3 | 57.0 | 51.5 | 47.3 | 38.3 | 61.6 | 58.3 | |
Li_WU_task4_2 | ATST-RCT SED system ATST small | Shao2022 | 1.36 | 45.1 | 44.9 | 77.1 | 47.6 | 55.3 | 43.0 | 60.8 | 36.2 | 66.7 | 39.7 | |
Li_WU_task4_3 | ATST-RCT SED system ATST base | Shao2022 | 1.40 | 39.6 | 47.5 | 73.3 | 43.3 | 57.0 | 51.5 | 47.3 | 38.3 | 61.6 | 58.3 | |
Li_WU_task4_1 | ATST-RCT SED system CRNN with RCT | Shao2022 | 1.13 | 33.1 | 48.3 | 70.2 | 32.7 | 26.1 | 46.8 | 44.5 | 37.6 | 50.9 | 60.1 | |
Kim_GIST_task4_3 | Kim_GIST_task4_3 | Kim2022b | 1.43 | 48.4 | 59.0 | 71.4 | 30.8 | 45.2 | 56.9 | 64.0 | 38.6 | 71.9 | 66.7 | |
Kim_GIST_task4_1 | Kim_GIST_task4_1 | Kim2022b | 1.47 | 49.3 | 58.2 | 74.0 | 38.0 | 46.5 | 56.1 | 61.2 | 39.7 | 71.5 | 64.2 | |
Kim_GIST_task4_2 | Kim_GIST_task4_2 | Kim2022b | 1.46 | 45.4 | 61.1 | 74.1 | 39.8 | 46.5 | 54.3 | 63.5 | 38.2 | 71.7 | 60.3 | |
Kim_GIST_task4_4 | Kim_GIST_task4_4 | Kim2022b | 0.65 | 31.5 | 22.2 | 59.5 | 17.8 | 30.3 | 35.4 | 9.4 | 19.6 | 57.8 | 32.0 | |
Ebbers_UPB_task4_4 | CRNN ensemble w/o external data | Ebbers2022 | 1.49 | 40.7 | 61.9 | 75.5 | 38.8 | 53.6 | 63.1 | 65.9 | 41.7 | 56.7 | 78.2 | |
Ebbers_UPB_task4_2 | FBCRNN ensemble | Ebbers2022 | 0.83 | 4.8 | 4.4 | 0.9 | 0.0 | 0.3 | 10.3 | 43.7 | 17.5 | 0.2 | 36.0 | |
Ebbers_UPB_task4_1 | CRNN ensemble | Ebbers2022 | 1.59 | 52.7 | 64.8 | 78.1 | 41.2 | 51.2 | 60.6 | 70.0 | 40.4 | 60.3 | 78.9 | |
Ebbers_UPB_task4_3 | tag-conditioned CRNN ensemble | Ebbers2022 | 1.46 | 55.8 | 73.2 | 80.7 | 48.9 | 49.9 | 72.7 | 72.6 | 48.5 | 72.5 | 84.6 | |
Xu_SRCB-BIT_task4_2 | PANNs-FDY-CRNN-wrTCL system 2 | Xu2022 | 1.41 | 45.2 | 60.8 | 74.3 | 46.4 | 50.7 | 44.3 | 53.9 | 30.3 | 74.5 | 69.8 | |
Xu_SRCB-BIT_task4_1 | PANNs-FDY-CRNN-wrTCL system 1 | Xu2022 | 1.32 | 43.8 | 48.1 | 72.0 | 43.7 | 47.7 | 43.4 | 56.2 | 33.3 | 73.3 | 55.9 | |
Xu_SRCB-BIT_task4_3 | PANNs-FDY-CRNN-weak train | Xu2022 | 0.79 | 4.4 | 4.2 | 0.5 | 0.0 | 0.3 | 13.2 | 44.8 | 23.4 | 0.3 | 40.2 | |
Xu_SRCB-BIT_task4_4 | FDY-CRNN-weak train | Xu2022 | 0.75 | 3.8 | 3.4 | 0.5 | 0.0 | 0.3 | 13.0 | 49.2 | 20.2 | 0.2 | 37.3 | |
Nam_KAIST_task4_SED_2 | SED_2 | Nam2022 | 1.25 | 31.8 | 58.6 | 73.1 | 43.2 | 41.8 | 40.2 | 44.6 | 31.4 | 64.9 | 59.6 | |
Nam_KAIST_task4_SED_3 | SED_3 | Nam2022 | 0.77 | 3.9 | 3.6 | 0.5 | 0.0 | 0.3 | 13.3 | 44.4 | 21.3 | 0.2 | 37.7 | |
Nam_KAIST_task4_SED_4 | SED_4 | Nam2022 | 0.77 | 3.9 | 3.6 | 0.5 | 0.0 | 0.3 | 14.1 | 43.9 | 22.3 | 0.2 | 38.5 | |
Nam_KAIST_task4_SED_1 | SED_1 | Nam2022 | 1.24 | 29.1 | 59.4 | 71.6 | 43.9 | 44.3 | 45.4 | 44.3 | 33.2 | 64.5 | 62.2 | |
Blakala_SRPOL_task4_3 | Blakala_SRPOL_task4_3 | Blakala2022 | 0.95 | 33.3 | 43.4 | 58.3 | 18.4 | 27.8 | 41.2 | 44.3 | 21.7 | 50.3 | 40.4 | |
Blakala_SRPOL_task4_1 | Blakala_SRPOL_task4_1 | Blakala2022 | 1.11 | 29.9 | 44.8 | 58.0 | 29.0 | 36.4 | 26.8 | 42.6 | 24.3 | 54.4 | 48.7 | |
Blakala_SRPOL_task4_2 | Blakala_SRPOL_task4_2 | Blakala2022 | 0.78 | 5.2 | 4.4 | 1.3 | 0.0 | 0.8 | 17.0 | 47.4 | 22.1 | 4.5 | 36.2 | |
Li_USTC_task4_SED_1 | Mean teacher Pseudo labeling system 1 | Li2022b | 1.41 | 43.9 | 46.5 | 75.7 | 37.8 | 48.2 | 61.1 | 61.7 | 42.9 | 65.3 | 68.0 | |
Li_USTC_task4_SED_4 | Mean teacher Pseudo labeling system 4 | Li2022b | 1.34 | 33.3 | 44.9 | 72.1 | 36.3 | 47.6 | 59.2 | 60.1 | 36.1 | 65.7 | 68.9 | |
Li_USTC_task4_SED_2 | Mean teacher Pseudo labeling system 2 | Li2022b | 1.39 | 41.7 | 43.2 | 74.2 | 36.6 | 48.6 | 59.1 | 60.8 | 39.4 | 65.5 | 69.2 | |
Li_USTC_task4_SED_3 | Mean teacher Pseudo labeling system 3 | Li2022b | 1.35 | 33.0 | 47.2 | 74.1 | 38.5 | 47.9 | 59.5 | 58.3 | 37.2 | 65.2 | 70.0 | |
Bertola_UPF_task4_1 | DCASE2022 baseline system | Bertola2022 | 0.98 | 30.7 | 45.1 | 59.8 | 18.4 | 38.4 | 24.6 | 34.3 | 21.6 | 55.3 | 48.4 | |
He_BYTEDANCE_task4_4 | DCASE2022 SED mean teacher system 4 | He2022 | 0.82 | 6.3 | 3.8 | 0.9 | 0.0 | 0.3 | 15.9 | 48.3 | 24.4 | 0.1 | 42.6 | |
He_BYTEDANCE_task4_2 | DCASE2022 SED mean teacher system 2 | He2022 | 1.48 | 48.5 | 62.8 | 71.5 | 34.1 | 43.4 | 65.0 | 45.9 | 36.7 | 70.0 | 67.2 | |
He_BYTEDANCE_task4_3 | DCASE2022 SED mean teacher system 3 | He2022 | 1.52 | 55.6 | 61.9 | 71.8 | 42.4 | 52.0 | 53.5 | 47.8 | 34.7 | 72.2 | 65.5 | |
He_BYTEDANCE_task4_1 | DCASE2022 SED mean teacher system 1 | He2022 | 1.36 | 32.4 | 64.0 | 71.1 | 39.1 | 44.5 | 57.9 | 54.5 | 39.1 | 67.4 | 65.8 | |
Li_ICT-TOSHIBA_task4_2 | Hybrid system of SEDT and frame-wise model | Li2022d | 0.79 | 5.0 | 2.5 | 0.6 | 0.0 | 0.3 | 5.6 | 39.8 | 15.1 | 0.0 | 25.4 | |
Li_ICT-TOSHIBA_task4_4 | Hybrid system of SEDT and frame-wise model | Li2022d | 0.75 | 5.7 | 1.3 | 0.0 | 0.0 | 0.0 | 3.5 | 40.6 | 12.8 | 0.0 | 25.7 | |
Li_ICT-TOSHIBA_task4_1 | Hybrid system of SEDT and frame-wise model | Li2022d | 1.26 | 27.2 | 14.5 | 38.9 | 46.0 | 39.0 | 28.0 | 15.2 | 10.8 | 48.4 | 25.0 | |
Li_ICT-TOSHIBA_task4_3 | Hybrid system of SEDT and frame-wise model | Li2022d | 1.20 | 31.6 | 22.1 | 48.0 | 44.0 | 38.6 | 30.6 | 45.9 | 11.0 | 38.9 | 35.3 | |
Xie_UESTC_task4_2 | CNN14 FC | Xie2022 | 0.83 | 6.2 | 4.7 | 0.9 | 0.0 | 0.3 | 13.5 | 48.9 | 25.3 | 0.3 | 36.7 | |
Xie_UESTC_task4_3 | CBAM-T CRNN scratch | Xie2022 | 1.06 | 33.7 | 36.6 | 64.9 | 19.9 | 17.0 | 38.8 | 35.3 | 34.0 | 53.9 | 49.0 | |
Xie_UESTC_task4_1 | CBAM-T CRNN 1 | Xie2022 | 1.36 | 40.5 | 62.3 | 71.3 | 33.3 | 33.0 | 59.0 | 58.6 | 44.2 | 58.2 | 66.4 | |
Xie_UESTC_task4_4 | CBAM-T CRNN 2 | Xie2022 | 1.38 | 43.0 | 65.3 | 71.3 | 32.6 | 36.9 | 63.4 | 60.2 | 42.8 | 57.3 | 74.5 | |
Baseline (AudioSet) | DCASE2022 SED baseline system (AudioSet) | Ronchini2022 | 1.04 | 41.3 | 42.2 | 60.4 | 22.3 | 40.7 | 25.3 | 45.6 | 28.5 | 56.2 | 48.5 | |
Kim_CAUET_task4_1 | DCASE2022 SED system1 | Kim2022c | 1.02 | 37.1 | 50.2 | 64.8 | 27.5 | 15.1 | 44.4 | 36.6 | 37.5 | 49.2 | 62.2 | |
Kim_CAUET_task4_2 | DCASE2022 SED system2 | Kim2022c | 1.04 | 39.1 | 54.5 | 68.9 | 36.1 | 31.0 | 31.4 | 27.4 | 36.4 | 38.8 | 47.9 | |
Kim_CAUET_task4_3 | DCASE2022 SED system3 | Kim2022c | 1.04 | 31.9 | 48.7 | 62.1 | 31.0 | 36.6 | 48.5 | 30.6 | 33.2 | 57.3 | 44.4 | |
Li_XJU_task4_1 | DCASE2022 SED system 1 | Li2022c | 1.10 | 28.6 | 48.7 | 68.6 | 33.9 | 38.3 | 43.3 | 49.0 | 34.5 | 59.9 | 44.5 | |
Li_XJU_task4_3 | DCASE2022 SED system 3 | Li2022c | 1.17 | 43.5 | 50.0 | 68.7 | 30.9 | 32.4 | 49.8 | 50.2 | 39.7 | 55.8 | 57.4 | |
Li_XJU_task4_4 | DCASE2022 SED system 4 | Li2022c | 0.93 | 25.3 | 34.5 | 7.4 | 11.4 | 4.2 | 38.3 | 51.6 | 26.5 | 43.1 | 35.8 | |
Li_XJU_task4_2 | DCASE2022 SED system 2 | Li2022c | 0.75 | 6.8 | 9.6 | 3.1 | 0.9 | 1.4 | 22.0 | 44.6 | 20.3 | 3.2 | 38.4 | |
Castorena_UV_task4_3 | Strong and Max-Weak balanced | Castorena2022 | 0.91 | 35.2 | 39.3 | 44.7 | 21.0 | 14.9 | 37.4 | 40.2 | 24.9 | 22.0 | 48.1 | |
Castorena_UV_task4_1 | Max-Weak balanced | Castorena2022 | 1.01 | 33.5 | 41.6 | 58.7 | 19.2 | 37.5 | 45.7 | 34.5 | 26.5 | 50.6 | 44.5 | |
Castorena_UV_task4_2 | Avg-Weak balanced | Castorena2022 | 0.63 | 4.8 | 3.2 | 3.4 | 0.2 | 1.2 | 22.3 | 30.9 | 17.4 | 3.4 | 24.9 |
Energy Consumption
Rank |
Submission code |
Submission name |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS 1 (Evaluation dataset) |
PSDS 2 (Evaluation dataset) |
Energy (kWh) (training) |
Energy (kWh) (Test) |
EW-PSDS 1 (training energy) |
EW-PSDS 2 (training energy) |
EW-PSDS 1 (test energy) |
EW-PSDS 2 (test energy) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Zhang_UCAS_task4_2 | DCASE2022 pretrained system 2 | Xiao2022 | 1.41 | 0.484 | 0.697 | 4.800 | 0.060 | 0.173 | 0.249 | 0.242 | 0.348 | |
Zhang_UCAS_task4_1 | DCASE2022 pretrained system 1 | Xiao2022 | 1.39 | 0.472 | 0.700 | 4.800 | 0.060 | 0.169 | 0.250 | 0.236 | 0.350 | |
Zhang_UCAS_task4_3 | DCASE2022 base system | Xiao2022 | 1.21 | 0.420 | 0.599 | 2.700 | 0.040 | 0.267 | 0.381 | 0.315 | 0.449 | |
Zhang_UCAS_task4_4 | DCASE2022 weak_pred system | Xiao2022 | 0.79 | 0.049 | 0.784 | 2.100 | 0.032 | 0.040 | 0.641 | 0.046 | 0.735 | |
Liu_NSYSU_task4_2 | DCASE2022 PANNs SED 2 | Liu2022 | 0.06 | 0.000 | 0.063 | 1.593 | 0.002 | 0.000 | 0.068 | 0.003 | 0.943 | |
Liu_NSYSU_task4_3 | DCASE2022 PANNs SED 3 | Liu2022 | 0.29 | 0.070 | 0.194 | 7.846 | 0.004 | 0.015 | 0.042 | 0.525 | 1.456 | |
Huang_NSYSU_task4_1 | DCASE2022 KDmt SED | Huang2022 | 1.28 | 0.434 | 0.650 | 9.563 | 0.008 | 0.078 | 0.117 | 1.629 | 2.436 | |
Liu_NSYSU_task4_4 | DCASE2022 PANNs SED 4 | Liu2022 | 0.21 | 0.046 | 0.151 | 6.372 | 0.006 | 0.012 | 0.041 | 0.231 | 0.754 | |
Suh_ReturnZero_task4_1 | rtzr_dev-only | Suh2022 | 1.22 | 0.393 | 0.650 | 21.694 | 0.031 | 0.051 | ||||
Suh_ReturnZero_task4_4 | rtzr_weak-SED | Suh2022 | 0.81 | 0.062 | 0.774 | 0.011 | 0.169 | 2.110 | ||||
Suh_ReturnZero_task4_2 | rtzr_strong-real | Suh2022 | 1.39 | 0.458 | 0.721 | 22.986 | 0.010 | 0.034 | 0.054 | 1.379 | 2.171 | |
Suh_ReturnZero_task4_3 | rtzr_audioset | Suh2022 | 1.42 | 0.478 | 0.719 | 46.891 | 0.074 | 0.017 | 0.026 | 0.194 | 0.292 | |
Liu_SRCN_task4_2 | DCASE2022 task4 Pre-Trained 2 | Liu2022 | 0.90 | 0.129 | 0.758 | 6.751 | 0.004 | 0.033 | 0.193 | 0.871 | 5.130 | |
Liu_SRCN_task4_1 | DCASE2022 task4 Pre-Trained 1 | Liu2022 | 0.79 | 0.051 | 0.777 | 6.751 | 0.004 | 0.013 | 0.198 | 0.345 | 5.259 | |
Liu_SRCN_task4_4 | DCASE2022 task4 without external data | Liu2022 | 0.24 | 0.025 | 0.219 | 10.012 | 0.048 | 0.004 | 0.038 | 0.016 | 0.138 | |
Liu_SRCN_task4_3 | DCASE2022 task4 AudioSet strong | Liu2022 | 1.25 | 0.425 | 0.634 | 0.733 | 0.004 | 0.996 | 1.486 | 3.275 | 4.888 | |
Kim_LGE_task4_1 | DCASE2022 Kim system 1 | Kim2022a | 1.34 | 0.444 | 0.697 | 17.000 | 0.300 | 0.045 | 0.070 | 0.044 | 0.070 | |
Kim_LGE_task4_3 | DCASE2022 Kim system 3 | Kim2022a | 0.81 | 0.062 | 0.781 | 17.000 | 0.300 | 0.006 | 0.079 | 0.006 | 0.078 | |
Kim_LGE_task4_4 | DCASE2022 Kim system 4 | Kim2022a | 1.17 | 0.305 | 0.750 | 17.000 | 0.300 | 0.031 | 0.076 | 0.030 | 0.075 | |
Kim_LGE_task4_2 | DCASE2022 Kim system 2 | Kim2022a | 1.34 | 0.444 | 0.695 | 17.000 | 0.300 | 0.045 | 0.070 | 0.044 | 0.069 | |
Ryu_Deeply_task4_1 | SKATTN_1 | Ryu2022 | 0.83 | 0.257 | 0.461 | 29.850 | 0.040 | 0.015 | 0.027 | 0.193 | 0.346 | |
Ryu_Deeply_task4_2 | SKATTN_2 | Ryu2022 | 0.66 | 0.156 | 0.449 | 18.780 | 0.040 | 0.014 | 0.041 | 0.117 | 0.337 | |
Giannakopoulos_UNIPI_task4_2 | Multi-Task Learning using Variational AutoEncoders | Giannakopoulos2022 | 0.21 | 0.029 | 0.184 | 1.717 | 0.030 | 0.029 | 0.184 | 0.029 | 0.184 | |
Giannakopoulos_UNIPI_task4_1 | Multi-Task Learning using Variational AutoEncoders | Giannakopoulos2022 | 0.35 | 0.104 | 0.196 | 1.717 | 0.030 | 0.104 | 0.196 | 0.104 | 0.196 | |
KIM_HYU_task4_2 | single1 | Sojeong2022 | 1.28 | 0.421 | 0.664 | 1.780 | 0.010 | 0.406 | 0.640 | 1.264 | 1.991 | |
KIM_HYU_task4_4 | single2 | Sojeong2022 | 1.27 | 0.423 | 0.651 | 1.800 | 0.004 | 0.403 | 0.621 | 3.172 | 4.885 | |
KIM_HYU_task4_1 | train_ensemble1 | Sojeong2022 | 1.19 | 0.390 | 0.620 | 1.910 | 0.010 | 0.350 | 0.557 | 1.169 | 1.860 | |
KIM_HYU_task4_3 | train_ensemble2 | Sojeong2022 | 1.24 | 0.415 | 0.634 | 1.800 | 0.005 | 0.396 | 0.605 | 2.492 | 3.804 | |
Baseline | DCASE2022 SED baseline system | Turpault2022 | 1.00 | 0.315 | 0.543 | 1.717 | 0.030 | 0.315 | 0.543 | 0.315 | 0.543 | |
Dinkel_XiaoRice_task4_2 | SMALL | Dinkel2022 | 1.15 | 0.373 | 0.613 | 1.717 | 0.025 | 0.373 | 0.613 | 0.448 | 0.736 | |
Hao_UNISOC_task4_3 | SUBMISSION FOR DCASE2022 TASK4 | Hao2022 | 1.09 | 0.373 | 0.547 | 1.717 | 0.030 | 0.373 | 0.547 | 0.373 | 0.547 | |
Khandelwal_FMSG-NTU_task4_1 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 0.83 | 0.158 | 0.633 | 1.820 | 0.005 | 0.149 | 0.597 | 0.968 | 3.876 | |
Khandelwal_FMSG-NTU_task4_2 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 0.80 | 0.082 | 0.731 | 6.100 | 0.005 | 0.023 | 0.206 | 0.445 | 3.987 | |
Khandelwal_FMSG-NTU_task4_3 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 1.26 | 0.410 | 0.664 | 3.250 | 0.005 | 0.217 | 0.351 | 2.676 | 4.332 | |
Khandelwal_FMSG-NTU_task4_4 | FMSG-NTU DCASE2022 SED Model-1 | Khandelwal2022 | 1.20 | 0.386 | 0.643 | 3.630 | 0.005 | 0.183 | 0.304 | 2.413 | 4.018 | |
deBenito_AUDIAS_task4_3 | 7-Resolution CRNN+Conformer with class-wise median filtering | deBenito2022 | 1.28 | 0.432 | 0.649 | 12.872 | 0.045 | 0.058 | 0.087 | 0.288 | 0.433 | |
deBenito_AUDIAS_task4_1 | 10-Resolution CRNN+Conformer | deBenito2022 | 1.23 | 0.400 | 0.646 | 18.162 | 0.056 | 0.038 | 0.061 | 0.214 | 0.346 | |
deBenito_AUDIAS_task4_2 | 10-Resolution CRNN+Conformer with class-wise median filtering | deBenito2022 | 1.08 | 0.310 | 0.642 | 18.162 | 0.056 | 0.029 | 0.061 | 0.166 | 0.344 | |
deBenito_AUDIAS_task4_3 | 7-Resolution CRNN+Conformer | deBenito2022 | 1.23 | 0.407 | 0.643 | 12.872 | 0.045 | 0.054 | 0.086 | 0.271 | 0.429 | |
Li_WU_task4_4 | ATST-RCT SED system ATST ensemble | Shao2022 | 1.41 | 0.486 | 0.694 | 23.900 | 1.772 | 0.035 | 0.050 | 0.008 | 0.012 | |
Li_WU_task4_2 | ATST-RCT SED system ATST small | Shao2022 | 1.36 | 0.476 | 0.666 | 3.500 | 0.624 | 0.234 | 0.327 | 0.023 | 0.032 | |
Li_WU_task4_3 | ATST-RCT SED system ATST base | Shao2022 | 1.40 | 0.482 | 0.693 | 4.800 | 0.626 | 0.172 | 0.248 | 0.023 | 0.033 | |
Li_WU_task4_1 | ATST-RCT SED system CRNN with RCT | Shao2022 | 1.13 | 0.368 | 0.594 | 2.210 | 0.450 | 0.286 | 0.462 | 0.025 | 0.040 | |
Kim_GIST_task4_3 | Kim_GIST_task4_3 | Kim2022b | 1.43 | 0.500 | 0.695 | 151.415 | 1.190 | 0.006 | 0.008 | 0.013 | 0.018 | |
Kim_GIST_task4_1 | Kim_GIST_task4_1 | Kim2022b | 1.47 | 0.514 | 0.713 | 151.415 | 1.190 | 0.006 | 0.008 | 0.013 | 0.018 | |
Kim_GIST_task4_2 | Kim_GIST_task4_2 | Kim2022b | 1.46 | 0.510 | 0.711 | 151.415 | 1.190 | 0.006 | 0.008 | 0.013 | 0.018 | |
Kim_GIST_task4_4 | Kim_GIST_task4_4 | Kim2022b | 0.65 | 0.215 | 0.335 | 3.768 | 0.246 | 0.098 | 0.153 | 0.026 | 0.041 | |
Ebbers_UPB_task4_4 | CRNN ensemble w/o external data | Ebbers2022 | 1.49 | 0.509 | 0.742 | 27.200 | 0.020 | 0.032 | 0.047 | 0.764 | 1.113 | |
Ebbers_UPB_task4_2 | FBCRNN ensemble | Ebbers2022 | 0.83 | 0.047 | 0.824 | 36.000 | 0.020 | 0.002 | 0.039 | 0.070 | 1.236 | |
Ebbers_UPB_task4_1 | CRNN ensemble | Ebbers2022 | 1.59 | 0.552 | 0.786 | 50.000 | 0.020 | 0.019 | 0.027 | 0.828 | 1.179 | |
Ebbers_UPB_task4_3 | tag-conditioned CRNN ensemble | Ebbers2022 | 1.46 | 0.527 | 0.679 | 50.000 | 0.020 | 0.018 | 0.023 | 0.791 | 1.019 | |
Xu_SRCB-BIT_task4_2 | PANNs-FDY-CRNN-wrTCL system 2 | Xu2022 | 1.41 | 0.482 | 0.702 | 1.823 | 0.027 | 0.454 | 0.662 | 0.535 | 0.781 | |
Xu_SRCB-BIT_task4_1 | PANNs-FDY-CRNN-wrTCL system 1 | Xu2022 | 1.32 | 0.452 | 0.662 | 1.823 | 0.027 | 0.426 | 0.624 | 0.502 | 0.736 | |
Xu_SRCB-BIT_task4_3 | PANNs-FDY-CRNN-weak train | Xu2022 | 0.79 | 0.054 | 0.774 | 1.514 | 0.027 | 0.061 | 0.878 | 0.060 | 0.861 | |
Xu_SRCB-BIT_task4_4 | FDY-CRNN-weak train | Xu2022 | 0.75 | 0.049 | 0.738 | 1.446 | 0.027 | 0.058 | 0.876 | 0.054 | 0.820 | |
Nam_KAIST_task4_SED_2 | SED_2 | Nam2022 | 1.25 | 0.409 | 0.656 | 1.327 | 0.077 | 0.529 | 0.849 | 0.159 | 0.256 | |
Nam_KAIST_task4_SED_3 | SED_3 | Nam2022 | 0.77 | 0.057 | 0.747 | 1.327 | 0.077 | 0.074 | 0.966 | 0.022 | 0.291 | |
Nam_KAIST_task4_SED_4 | SED_4 | Nam2022 | 0.77 | 0.055 | 0.747 | 1.327 | 0.077 | 0.071 | 0.966 | 0.021 | 0.291 | |
Nam_KAIST_task4_SED_1 | SED_1 | Nam2022 | 1.24 | 0.404 | 0.653 | 1.327 | 0.077 | 0.522 | 0.845 | 0.157 | 0.255 | |
Blakala_SRPOL_task4_3 | Blakala_SRPOL_task4_3 | Blakala2022 | 0.95 | 0.293 | 0.527 | 2.757 | 0.037 | 0.182 | 0.328 | 0.234 | 0.422 | |
Blakala_SRPOL_task4_1 | Blakala_SRPOL_task4_1 | Blakala2022 | 1.11 | 0.365 | 0.584 | 2.755 | 0.037 | 0.228 | 0.364 | 0.300 | 0.479 | |
Blakala_SRPOL_task4_2 | Blakala_SRPOL_task4_2 | Blakala2022 | 0.78 | 0.069 | 0.728 | 25.295 | 0.056 | 0.005 | 0.049 | 0.037 | 0.391 | |
Li_USTC_task4_SED_1 | Mean teacher Pseudo labeling system 1 | Li2022b | 1.41 | 0.480 | 0.713 | 11.880 | 0.014 | 0.069 | 0.103 | 1.028 | 1.528 | |
Li_USTC_task4_SED_4 | Mean teacher Pseudo labeling system 4 | Li2022b | 1.34 | 0.429 | 0.723 | 3.564 | 0.009 | 0.207 | 0.348 | 1.445 | 2.437 | |
Li_USTC_task4_SED_2 | Mean teacher Pseudo labeling system 2 | Li2022b | 1.39 | 0.451 | 0.740 | 11.880 | 0.014 | 0.065 | 0.107 | 0.966 | 1.585 | |
Li_USTC_task4_SED_3 | Mean teacher Pseudo labeling system 3 | Li2022b | 1.35 | 0.450 | 0.699 | 3.564 | 0.009 | 0.217 | 0.337 | 1.517 | 2.355 | |
He_BYTEDANCE_task4_4 | DCASE2022 SED mean teacher system 4 | He2022 | 0.82 | 0.053 | 0.810 | 28.066 | 0.424 | 0.003 | 0.050 | 0.004 | 0.057 | |
He_BYTEDANCE_task4_2 | DCASE2022 SED mean teacher system 2 | He2022 | 1.48 | 0.503 | 0.749 | 28.066 | 0.424 | 0.031 | 0.046 | 0.036 | 0.053 | |
He_BYTEDANCE_task4_3 | DCASE2022 SED mean teacher system 3 | He2022 | 1.52 | 0.525 | 0.748 | 28.066 | 0.424 | 0.032 | 0.046 | 0.037 | 0.053 | |
He_BYTEDANCE_task4_1 | DCASE2022 SED mean teacher system 1 | He2022 | 1.36 | 0.454 | 0.696 | 6.067 | 0.410 | 0.129 | 0.197 | 0.033 | 0.051 | |
Li_ICT-TOSHIBA_task4_2 | Hybrid system of SEDT and frame-wise model | Li2022d | 0.79 | 0.090 | 0.709 | 47.417 | 0.030 | 0.003 | 0.026 | 0.090 | 0.709 | |
Li_ICT-TOSHIBA_task4_4 | Hybrid system of SEDT and frame-wise model | Li2022d | 0.75 | 0.075 | 0.692 | 23.850 | 0.024 | 0.005 | 0.050 | 0.094 | 0.865 | |
Li_ICT-TOSHIBA_task4_1 | Hybrid system of SEDT and frame-wise model | Li2022d | 1.26 | 0.439 | 0.612 | 47.417 | 0.030 | 0.016 | 0.022 | 0.439 | 0.612 | |
Li_ICT-TOSHIBA_task4_3 | Hybrid system of SEDT and frame-wise model | Li2022d | 1.20 | 0.411 | 0.597 | 23.850 | 0.024 | 0.030 | 0.043 | 0.514 | 0.746 | |
Baseline (AudioSet) | DCASE2022 SED baseline system (AudioSet) | Ronchini2022 | 1.04 | 0.345 | 0.540 | 2.418 | 0.027 | 0.245 | 0.383 | 0.383 | 0.600 | |
Kim_CAUET_task4_1 | DCASE2022 SED system1 | Kim2022c | 1.02 | 0.317 | 0.565 | 1.201 | 0.021 | 0.453 | 0.807 | 0.450 | 0.803 | |
Kim_CAUET_task4_2 | DCASE2022 SED system2 | Kim2022c | 1.04 | 0.340 | 0.544 | 1.114 | 0.021 | 0.525 | 0.839 | 0.484 | 0.774 | |
Kim_CAUET_task4_3 | DCASE2022 SED system3 | Kim2022c | 1.04 | 0.338 | 0.554 | 0.748 | 0.020 | 0.776 | 1.272 | 0.505 | 0.827 | |
Li_XJU_task4_1 | DCASE2022 SED system 1 | Li2022c | 1.10 | 0.364 | 0.570 | 2.718 | 0.017 | 0.230 | 0.360 | 0.643 | 1.007 | |
Li_XJU_task4_3 | DCASE2022 SED system 3 | Li2022c | 1.17 | 0.371 | 0.635 | 3.791 | 0.010 | 0.168 | 0.287 | 1.112 | 1.904 | |
Li_XJU_task4_4 | DCASE2022 SED system 4 | Li2022c | 0.93 | 0.195 | 0.683 | 3.317 | 0.015 | 0.101 | 0.354 | 0.390 | 1.367 | |
Li_XJU_task4_2 | DCASE2022 SED system 2 | Li2022c | 0.75 | 0.086 | 0.671 | 3.771 | 0.006 | 0.039 | 0.305 | 0.432 | 3.353 |
System characteristics
General characteristics
Rank | Code |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS 1 (Evaluation dataset) |
PSDS 2 (Evaluation dataset) |
Data augmentation |
Features |
---|---|---|---|---|---|---|---|
Zhang_UCAS_task4_2 | Xiao2022 | 1.41 | 0.484 | 0.697 | specaugment, mixup, frame_shift, FilterAug | log-mel energies | |
Zhang_UCAS_task4_1 | Xiao2022 | 1.39 | 0.472 | 0.700 | specaugment, mixup, frame_shift, FilterAug | log-mel energies | |
Zhang_UCAS_task4_3 | Xiao2022 | 1.21 | 0.420 | 0.599 | specaugment, mixup, frame_shift, FilterAug | log-mel energies | |
Zhang_UCAS_task4_4 | Xiao2022 | 0.79 | 0.049 | 0.784 | specaugment, mixup, frame_shift, FilterAug | log-mel energies | |
Liu_NSYSU_task4_2 | Liu2022 | 0.06 | 0.000 | 0.063 | mix-up | log-mel energies | |
Liu_NSYSU_task4_3 | Liu2022 | 0.29 | 0.070 | 0.194 | mix-up | log-mel energies | |
Huang_NSYSU_task4_1 | Huang2022 | 1.28 | 0.434 | 0.650 | mixup, frame shifting | log-mel energies | |
Liu_NSYSU_task4_4 | Liu2022 | 0.21 | 0.046 | 0.151 | mix-up | log-mel energies | |
Suh_ReturnZero_task4_1 | Suh2022 | 1.22 | 0.393 | 0.650 | time shifting, time masking, Mixup, add noise, FilterAugment, SpecAugment | log-mel energies | |
Suh_ReturnZero_task4_4 | Suh2022 | 0.81 | 0.062 | 0.774 | time shifting, time masking, Mixup, add noise, FilterAugment, SpecAugment, | log-mel energies | |
Suh_ReturnZero_task4_2 | Suh2022 | 1.39 | 0.458 | 0.721 | time shifting, time masking, Mixup, add noise, FilterAugment, SpecAugment | log-mel energies | |
Suh_ReturnZero_task4_3 | Suh2022 | 1.42 | 0.478 | 0.719 | time shifting, time masking, Mixup, add noise, FilterAugment, SpecAugment, | log-mel energies | |
Cheng_CHT_task4_2 | Cheng2022 | 0.93 | 0.276 | 0.543 | mixup, time shift | MelSpectrogram | |
Cheng_CHT_task4_1 | Cheng2022 | 1.03 | 0.314 | 0.582 | mixup, FilterAugment algorithm | MelSpectrogram | |
Liu_SRCN_task4_2 | Liu2022 | 0.90 | 0.129 | 0.758 | mixup | log-mel energies | |
Liu_SRCN_task4_1 | Liu2022 | 0.79 | 0.051 | 0.777 | mixup | log-mel energies | |
Liu_SRCN_task4_4 | Liu2022 | 0.24 | 0.025 | 0.219 | mixup | log-mel energies | |
Liu_SRCN_task4_3 | Liu2022 | 1.25 | 0.425 | 0.634 | frame shift, mixup, spec augment, filter augment | log-mel energies | |
Kim_LGE_task4_1 | Kim2022a | 1.34 | 0.444 | 0.697 | frame shifting, time masking, frequeny masking, mix-up, filter augment | log-mel energies | |
Kim_LGE_task4_3 | Kim2022a | 0.81 | 0.062 | 0.781 | frame shifting, time masking, frequeny masking, mix-up, filter augment | log-mel energies | |
Kim_LGE_task4_4 | Kim2022a | 1.17 | 0.305 | 0.750 | frame shifting, time masking, frequeny masking, mix-up, filter augment | log-mel energies | |
Kim_LGE_task4_2 | Kim2022a | 1.34 | 0.444 | 0.695 | frame shifting, time masking, frequeny masking, mix-up, filter augment | log-mel energies | |
Ryu_Deeply_task4_1 | Ryu2022 | 0.83 | 0.257 | 0.461 | log-mel energies | ||
Ryu_Deeply_task4_2 | Ryu2022 | 0.66 | 0.156 | 0.449 | log-mel energies | ||
Giannakopoulos_UNIPI_task4_2 | Giannakopoulos2022 | 0.21 | 0.029 | 0.184 | log-mel energies | ||
Giannakopoulos_UNIPI_task4_1 | Giannakopoulos2022 | 0.35 | 0.104 | 0.196 | log-mel energies | ||
Mizobuchi_PCO_task4_4 | Mizobuchi2022 | 0.82 | 0.062 | 0.787 | filter augmentation, MixUp, Frame shift, Time mask | log-mel energies | |
Mizobuchi_PCO_task4_2 | Mizobuchi2022 | 1.26 | 0.439 | 0.611 | filter augmentation, MixUp, Frame shift, Time mask | log-mel energies | |
Mizobuchi_PCO_task4_3 | Mizobuchi2022 | 0.88 | 0.197 | 0.620 | filter augmentation, MixUp, Frame shift, Time mask | log-mel energies | |
Mizobuchi_PCO_task4_1 | Mizobuchi2022 | 1.15 | 0.398 | 0.571 | filter augmentation, MixUp, Frame shift, Time mask | log-mel energies | |
KIM_HYU_task4_2 | Sojeong2022 | 1.28 | 0.421 | 0.664 | time shifting, mix up, frequency masking | log-mel energies | |
KIM_HYU_task4_4 | Sojeong2022 | 1.27 | 0.423 | 0.651 | time shifting, mix up, frequency masking | log-mel energies | |
KIM_HYU_task4_1 | Sojeong2022 | 1.19 | 0.390 | 0.620 | time shifting, mix up, frequency masking | log-mel energies | |
KIM_HYU_task4_3 | Sojeong2022 | 1.24 | 0.415 | 0.634 | time shifting, mix up, frequency masking | log-mel energies | |
Baseline | Turpault2022 | 1.00 | 0.315 | 0.543 | mixup | log-mel energies | |
Dinkel_XiaoRice_task4_1 | Dinkel2022 | 1.29 | 0.422 | 0.679 | specaugment, mixup | log-mel energies | |
Dinkel_XiaoRice_task4_2 | Dinkel2022 | 1.15 | 0.373 | 0.613 | specaugment, mixup | log-mel energies | |
Dinkel_XiaoRice_task4_4 | Dinkel2022 | 0.92 | 0.104 | 0.824 | specaugment, mixup | log-mel energies | |
Dinkel_XiaoRice_task4_3 | Dinkel2022 | 1.38 | 0.451 | 0.727 | specaugment, mixup | log-mel energies | |
Hao_UNISOC_task4_2 | Hao2022 | 0.78 | 0.078 | 0.723 | noise | log-mel energies | |
Hao_UNISOC_task4_1 | Hao2022 | 1.24 | 0.425 | 0.615 | noise | log-mel energies | |
Hao_UNISOC_task4_3 | Hao2022 | 1.09 | 0.373 | 0.547 | noise | log-mel energies | |
Khandelwal_FMSG-NTU_task4_1 | Khandelwal2022 | 0.83 | 0.158 | 0.633 | time-masking, frame-shifting, mixup, filter-augmentation | log-mel energies | |
Khandelwal_FMSG-NTU_task4_2 | Khandelwal2022 | 0.80 | 0.082 | 0.731 | time-masking, frame-shifting, mixup, Gaussian noise | log-mel energies | |
Khandelwal_FMSG-NTU_task4_3 | Khandelwal2022 | 1.26 | 0.410 | 0.664 | time-masking, frame-shifting, mixup, filter-augmentation | log-mel energies | |
Khandelwal_FMSG-NTU_task4_4 | Khandelwal2022 | 1.20 | 0.386 | 0.643 | time-masking, frame-shifting, mixup, filter-augmentation, Gaussian noise | log-mel energies | |
deBenito_AUDIAS_task4_4 | deBenito2022 | 1.28 | 0.432 | 0.649 | mixup, time shifting | log-mel energies | |
deBenito_AUDIAS_task4_1 | deBenito2022 | 1.23 | 0.400 | 0.646 | mixup, time shifting | log-mel energies | |
deBenito_AUDIAS_task4_2 | deBenito2022 | 1.08 | 0.310 | 0.642 | mixup, time shifting | log-mel energies | |
deBenito_AUDIAS_task4_3 | deBenito2022 | 1.23 | 0.407 | 0.643 | mixup, time shifting | log-mel energies | |
Li_WU_task4_4 | Shao2022 | 1.41 | 0.486 | 0.694 | hard mixup, time masking, filter augmentation, time shifting, frequency masking | log-mel energies | |
Li_WU_task4_2 | Shao2022 | 1.36 | 0.476 | 0.666 | hard mixup, time masking, frequency masking, time shifting | log-mel energies | |
Li_WU_task4_3 | Shao2022 | 1.40 | 0.482 | 0.693 | hard mixup, time masking, filter augmentation, time shifting | log-mel energies | |
Li_WU_task4_1 | Shao2022 | 1.13 | 0.368 | 0.594 | hard mixup, time masking, frequency masking, filter augmentation, time shifting | log-mel energies | |
Kim_GIST_task4_3 | Kim2022b | 1.43 | 0.500 | 0.695 | mix-up, specaugment, time-frequency shifting | log-mel energies | |
Kim_GIST_task4_1 | Kim2022b | 1.47 | 0.514 | 0.713 | mix-up, specaugment, time-frequency shifting | log-mel energies | |
Kim_GIST_task4_2 | Kim2022b | 1.46 | 0.510 | 0.711 | mix-up, specaugment, time-frequency shifting | log-mel energies | |
Kim_GIST_task4_4 | Kim2022b | 0.65 | 0.215 | 0.335 | mixup, time masking, filter augment, gaussian noise | log-mel energies | |
Ebbers_UPB_task4_4 | Ebbers2022 | 1.49 | 0.509 | 0.742 | time-/frequency warping, time-/frequency-masking, superposition, random noise | log-mel energies | |
Ebbers_UPB_task4_2 | Ebbers2022 | 0.83 | 0.047 | 0.824 | time-/frequency warping, time-/frequency-masking, superposition, random noise | log-mel energies | |
Ebbers_UPB_task4_1 | Ebbers2022 | 1.59 | 0.552 | 0.786 | time-/frequency warping, time-/frequency-masking, superposition, random noise | log-mel energies | |
Ebbers_UPB_task4_3 | Ebbers2022 | 1.46 | 0.527 | 0.679 | time-/frequency warping, time-/frequency-masking, superposition, random noise | log-mel energies | |
Xu_SRCB-BIT_task4_2 | Xu2022 | 1.41 | 0.482 | 0.702 | specaugment, mixup, frame-shift, Filteraugment | log-mel energies | |
Xu_SRCB-BIT_task4_1 | Xu2022 | 1.32 | 0.452 | 0.662 | specaugment, mixup, frame-shift, Filteraugment | log-mel energies | |
Xu_SRCB-BIT_task4_3 | Xu2022 | 0.79 | 0.054 | 0.774 | specaugment, mixup, frame-shift, Filteraugment | log-mel energies | |
Xu_SRCB-BIT_task4_4 | Xu2022 | 0.75 | 0.049 | 0.738 | specaugment, mixup, frame-shift, Filteraugment | log-mel energies | |
Nam_KAIST_task4_SED_2 | Nam2022 | 1.25 | 0.409 | 0.656 | time shifiting, mixup, time masking, FilterAugment | log-mel energies | |
Nam_KAIST_task4_SED_3 | Nam2022 | 0.77 | 0.057 | 0.747 | time shifiting, mixup, time masking, FilterAugment | log-mel energies | |
Nam_KAIST_task4_SED_4 | Nam2022 | 0.77 | 0.055 | 0.747 | time shifiting, mixup, time masking, FilterAugment | log-mel energies | |
Nam_KAIST_task4_SED_1 | Nam2022 | 1.24 | 0.404 | 0.653 | time shifiting, mixup, time masking, FilterAugment | log-mel energies | |
Blakala_SRPOL_task4_3 | Blakala2022 | 0.95 | 0.293 | 0.527 | time warping, Brownian noise | log-mel energies | |
Blakala_SRPOL_task4_1 | Blakala2022 | 1.11 | 0.365 | 0.584 | pitch shifting | log-mel energies | |
Blakala_SRPOL_task4_2 | Blakala2022 | 0.78 | 0.069 | 0.728 | time warping, Brownian noise | log-mel energies | |
Li_USTC_task4_SED_1 | Li2022b | 1.41 | 0.480 | 0.713 | spec-augment, time-shifting | log-mel energies | |
Li_USTC_task4_SED_4 | Li2022b | 1.34 | 0.429 | 0.723 | spec-augment, time-shifting | log-mel energies | |
Li_USTC_task4_SED_2 | Li2022b | 1.39 | 0.451 | 0.740 | spec-augment, time-shifting | log-mel energies | |
Li_USTC_task4_SED_3 | Li2022b | 1.35 | 0.450 | 0.699 | spec-augment, time-shifting | log-mel energies | |
Bertola_UPF_task4_1 | Bertola2022 | 0.98 | 0.318 | 0.520 | mixup, time-masking, frequency-masking | log-mel energies | |
He_BYTEDANCE_task4_4 | He2022 | 0.82 | 0.053 | 0.810 | time mask, frame shift, mixup, ict, sct, FilterAugment | log-mel energies | |
He_BYTEDANCE_task4_2 | He2022 | 1.48 | 0.503 | 0.749 | time mask, frame shift, mixup, ict, sct, FilterAugment | log-mel energies | |
He_BYTEDANCE_task4_3 | He2022 | 1.52 | 0.525 | 0.748 | time mask, frame shift, mixup, ict, sct, FilterAugment | log-mel energies | |
He_BYTEDANCE_task4_1 | He2022 | 1.36 | 0.454 | 0.696 | time mask, frame shift, mixup, ict, sct, FilterAugment | log-mel energies | |
Li_ICT-TOSHIBA_task4_2 | Li2022d | 0.79 | 0.090 | 0.709 | mixup, frequency mask (only SEDT), frequency shift (only SEDT), time mask (only SEDT) | log-mel energies (frame-wise model), log-mel spectrogram (SEDT) | |
Li_ICT-TOSHIBA_task4_4 | Li2022d | 0.75 | 0.075 | 0.692 | mixup, frequency mask (only SEDT), frequency shift (only SEDT), time mask (only SEDT) | log-mel energies (frame-wise model), log-mel spectrogram (SEDT) | |
Li_ICT-TOSHIBA_task4_1 | Li2022d | 1.26 | 0.439 | 0.612 | mixup, frequency mask (only SEDT), frequency shift (only SEDT), time mask (only SEDT) | log-mel energies (frame-wise model), log-mel spectrogram (SEDT) | |
Li_ICT-TOSHIBA_task4_3 | Li2022d | 1.20 | 0.411 | 0.597 | mixup, frequency mask (only SEDT), frequency shift (only SEDT), time mask (only SEDT) | log-mel energies (frame-wise model), log-mel spectrogram (SEDT) | |
Xie_UESTC_task4_2 | Xie2022 | 0.83 | 0.062 | 0.800 | mixup, SpecAug | log-mel energies | |
Xie_UESTC_task4_3 | Xie2022 | 1.06 | 0.300 | 0.641 | mixup, SpecAug | log-mel energies | |
Xie_UESTC_task4_1 | Xie2022 | 1.36 | 0.418 | 0.757 | mixup, SpecAug | log-mel energies | |
Xie_UESTC_task4_4 | Xie2022 | 1.38 | 0.426 | 0.766 | mixup, SpecAug | log-mel energies | |
Baseline (AudioSet) | Ronchini2022 | 1.04 | 0.345 | 0.540 | mixup | log-mel energies | |
Kim_CAUET_task4_1 | Kim2022c | 1.02 | 0.317 | 0.565 | frame shift, mixup, time mask, filter augmentation | log-mel energies | |
Kim_CAUET_task4_2 | Kim2022c | 1.04 | 0.340 | 0.544 | frame shift, mixup, time mask, filter augmentation | log-mel energies | |
Kim_CAUET_task4_3 | Kim2022c | 1.04 | 0.338 | 0.554 | time shift, mixup, time mask, filter augmentation | log-mel energies | |
Li_XJU_task4_1 | Li2022c | 1.10 | 0.364 | 0.570 | mixup,filteraugment,cutout | log-mel energies | |
Li_XJU_task4_3 | Li2022c | 1.17 | 0.371 | 0.635 | mixup,filteraugment,cutout | log-mel energies | |
Li_XJU_task4_4 | Li2022c | 0.93 | 0.195 | 0.683 | mixup,filteraugment,cutout | log-mel energies | |
Li_XJU_task4_2 | Li2022c | 0.75 | 0.086 | 0.671 | mixup,filteraugment,cutout | log-mel energies | |
Castorena_UV_task4_3 | Castorena2022 | 0.91 | 0.267 | 0.531 | log-mel energies | ||
Castorena_UV_task4_1 | Castorena2022 | 1.01 | 0.334 | 0.524 | log-mel energies | ||
Castorena_UV_task4_2 | Castorena2022 | 0.63 | 0.072 | 0.559 | log-mel energies |
Machine learning characteristics
Rank | Code |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS 1 (Evaluation dataset) |
PSDS 2 (Evaluation dataset) |
Classifier | Semi-supervised approach | Post-processing |
Segmentation method |
Decision making |
---|---|---|---|---|---|---|---|---|---|---|
Zhang_UCAS_task4_2 | Xiao2022 | 1.41 | 0.484 | 0.697 | CRNN,CNN | mean-teacher student | classwise median filtering | mean | ||
Zhang_UCAS_task4_1 | Xiao2022 | 1.39 | 0.472 | 0.700 | CRNN,CNN | mean-teacher student | classwise median filtering | mean | ||
Zhang_UCAS_task4_3 | Xiao2022 | 1.21 | 0.420 | 0.599 | CRNN,CNN | mean-teacher student | classwise median filtering | mean | ||
Zhang_UCAS_task4_4 | Xiao2022 | 0.79 | 0.049 | 0.784 | CRNN,CNN | mean-teacher student | classwise median filtering | mean | ||
Liu_NSYSU_task4_2 | Liu2022 | 0.06 | 0.000 | 0.063 | CRNN | mean-teacher student | median filtering (93ms) | |||
Liu_NSYSU_task4_3 | Liu2022 | 0.29 | 0.070 | 0.194 | CRNN | mean-teacher student | median filtering (93ms) | |||
Huang_NSYSU_task4_1 | Huang2022 | 1.28 | 0.434 | 0.650 | CRNN, ensemble | mean-teacher student, knowledge distillation | median filtering (93ms) | average | ||
Liu_NSYSU_task4_4 | Liu2022 | 0.21 | 0.046 | 0.151 | CRNN | mean-teacher student | median filtering (93ms) | |||
Suh_ReturnZero_task4_1 | Suh2022 | 1.22 | 0.393 | 0.650 | CRNN | mean-teacher student | median filtering | averaging | ||
Suh_ReturnZero_task4_4 | Suh2022 | 0.81 | 0.062 | 0.774 | CRNN | mean-teacher student | weak SED | averaging | ||
Suh_ReturnZero_task4_2 | Suh2022 | 1.39 | 0.458 | 0.721 | CRNN | mean-teacher student | median filtering | averaging | ||
Suh_ReturnZero_task4_3 | Suh2022 | 1.42 | 0.478 | 0.719 | CRNN | mean-teacher student | median filtering | averaging | ||
Cheng_CHT_task4_2 | Cheng2022 | 0.93 | 0.276 | 0.543 | CRNN, Multiscale CNN | mean-teacher student | median filtering (0.45s) | attention layers | ||
Cheng_CHT_task4_1 | Cheng2022 | 1.03 | 0.314 | 0.582 | CRNN, Multiscale CNN | mean-teacher student | median filtering (0.45s) | attention layers | ||
Liu_SRCN_task4_2 | Liu2022 | 0.90 | 0.129 | 0.758 | Transformer, RNN | mean-teacher student | median filtering | attention layers | mean | |
Liu_SRCN_task4_1 | Liu2022 | 0.79 | 0.051 | 0.777 | Transformer, RNN | mean-teacher student | median filtering | attention layers | mean | |
Liu_SRCN_task4_4 | Liu2022 | 0.24 | 0.025 | 0.219 | CNN | mean-teacher student | median filtering | attention layers | mean | |
Liu_SRCN_task4_3 | Liu2022 | 1.25 | 0.425 | 0.634 | CRNN | mean-teacher student | median filtering | mean | ||
Kim_LGE_task4_1 | Kim2022a | 1.34 | 0.444 | 0.697 | FDY-CRNN | mean-teacher student, ICT, FixMatch | median filtering (329ms) | mean | ||
Kim_LGE_task4_3 | Kim2022a | 0.81 | 0.062 | 0.781 | FDY-CRNN | mean-teacher student, ICT, FixMatch | median filtering (329ms) | mean | ||
Kim_LGE_task4_4 | Kim2022a | 1.17 | 0.305 | 0.750 | FDY-CRNN | mean-teacher student, ICT, FixMatch | median filtering (329ms) | mean | ||
Kim_LGE_task4_2 | Kim2022a | 1.34 | 0.444 | 0.695 | FDY-CRNN | mean-teacher student, ICT, FixMatch | median filtering (329ms) | mean | ||
Ryu_Deeply_task4_1 | Ryu2022 | 0.83 | 0.257 | 0.461 | SKATTN | mean-teacher student | median filtering (93ms) | |||
Ryu_Deeply_task4_2 | Ryu2022 | 0.66 | 0.156 | 0.449 | SKATTN | mean-teacher student | median filtering (93ms) | |||
Giannakopoulos_UNIPI_task4_2 | Giannakopoulos2022 | 0.21 | 0.029 | 0.184 | RNN | multi-task learning | median filtering (456ms) | |||
Giannakopoulos_UNIPI_task4_1 | Giannakopoulos2022 | 0.35 | 0.104 | 0.196 | RNN | multi-task learning | median filtering (456ms) | |||
Mizobuchi_PCO_task4_4 | Mizobuchi2022 | 0.82 | 0.062 | 0.787 | CRNN | mean-teacher student | median filtering, probability correction | |||
Mizobuchi_PCO_task4_2 | Mizobuchi2022 | 1.26 | 0.439 | 0.611 | CRNN | mean-teacher student | median filtering, probability correction | |||
Mizobuchi_PCO_task4_3 | Mizobuchi2022 | 0.88 | 0.197 | 0.620 | CRNN | mean-teacher student | median filtering, probability correction | |||
Mizobuchi_PCO_task4_1 | Mizobuchi2022 | 1.15 | 0.398 | 0.571 | CRNN | mean-teacher student | median filtering, probability correction | |||
KIM_HYU_task4_2 | Sojeong2022 | 1.28 | 0.421 | 0.664 | CRNN | mean-teacher student | median filtering (93ms) | patch attention layers | ||
KIM_HYU_task4_4 | Sojeong2022 | 1.27 | 0.423 | 0.651 | CRNN | mean-teacher student | median filtering (93ms) | patch attention layers | ||
KIM_HYU_task4_1 | Sojeong2022 | 1.19 | 0.390 | 0.620 | CRNN | mean-teacher student | median filtering (93ms) | patch attention layers | ||
KIM_HYU_task4_3 | Sojeong2022 | 1.24 | 0.415 | 0.634 | CRNN | mean-teacher student | median filtering (93ms) | patch attention layers | ||
Baseline | Turpault2022 | 1.00 | 0.315 | 0.543 | CRNN | mean-teacher student | ||||
Dinkel_XiaoRice_task4_1 | Dinkel2022 | 1.29 | 0.422 | 0.679 | CRNN, RCRNN | uda, mean-teacher student | median filtering (443ms) | avg | ||
Dinkel_XiaoRice_task4_2 | Dinkel2022 | 1.15 | 0.373 | 0.613 | CRNN, RCRNN | mean-teacher student | median filtering (443ms) | |||
Dinkel_XiaoRice_task4_4 | Dinkel2022 | 0.92 | 0.104 | 0.824 | CNN, Transformer | uda, mean-teacher student | median filtering (443ms) | avg | ||
Dinkel_XiaoRice_task4_3 | Dinkel2022 | 1.38 | 0.451 | 0.727 | CRNN, RCRNN, Transformer | uda, mean-teacher student, noisystudent | median filtering (443ms) | avg | ||
Hao_UNISOC_task4_2 | Hao2022 | 0.78 | 0.078 | 0.723 | CRNN | domain adaptation | median filtering with adaptive window size | mean | ||
Hao_UNISOC_task4_1 | Hao2022 | 1.24 | 0.425 | 0.615 | CRNN | domain adaptation | median filtering with adaptive window size | mean | ||
Hao_UNISOC_task4_3 | Hao2022 | 1.09 | 0.373 | 0.547 | CRNN | domain adaptation | median filtering with adaptive window size | mean | ||
Khandelwal_FMSG-NTU_task4_1 | Khandelwal2022 | 0.83 | 0.158 | 0.633 | CRNN | mean-teacher student, pseudo-labelling, interpolation consistency training | class-wise median filtering | mean | ||
Khandelwal_FMSG-NTU_task4_2 | Khandelwal2022 | 0.80 | 0.082 | 0.731 | CRNN | mean-teacher student, interpolation consistency training | class-wise median filtering | mean | ||
Khandelwal_FMSG-NTU_task4_3 | Khandelwal2022 | 1.26 | 0.410 | 0.664 | CRNN | mean-teacher student, pseudo-labelling, interpolation consistency training | class-wise median filtering | mean | ||
Khandelwal_FMSG-NTU_task4_4 | Khandelwal2022 | 1.20 | 0.386 | 0.643 | CRNN | mean-teacher student, pseudo-labelling, interpolation consistency training | class-wise median filtering | mean | ||
deBenito_AUDIAS_task4_4 | deBenito2022 | 1.28 | 0.432 | 0.649 | CRNN, conformer | mean-teacher student | median filtering (class dependent) | averaging | ||
deBenito_AUDIAS_task4_1 | deBenito2022 | 1.23 | 0.400 | 0.646 | CRNN, conformer | mean-teacher student | median filtering (450ms) | averaging | ||
deBenito_AUDIAS_task4_2 | deBenito2022 | 1.08 | 0.310 | 0.642 | CRNN, conformer | mean-teacher student | median filtering (class dependent) | averaging | ||
deBenito_AUDIAS_task4_3 | deBenito2022 | 1.23 | 0.407 | 0.643 | CRNN, conformer | mean-teacher student | median filtering (450ms) | averaging | ||
Li_WU_task4_4 | Shao2022 | 1.41 | 0.486 | 0.694 | CRNN, ATST | mean-teacher student, RCT | temperature, median filter | averaging | ||
Li_WU_task4_2 | Shao2022 | 1.36 | 0.476 | 0.666 | CRNN, ATST | mean-teacher student, RCT | temperature, median filter | |||
Li_WU_task4_3 | Shao2022 | 1.40 | 0.482 | 0.693 | CRNN, ATST | mean-teacher student, RCT | temperature | |||
Li_WU_task4_1 | Shao2022 | 1.13 | 0.368 | 0.594 | CRNN | mean-teacher student, RCT | median filtering (112ms) | |||
Kim_GIST_task4_3 | Kim2022b | 1.43 | 0.500 | 0.695 | RCRNN | mean-teacher student, noisy student | classwise median filtering | average | ||
Kim_GIST_task4_1 | Kim2022b | 1.47 | 0.514 | 0.713 | RCRNN | mean-teacher student, noisy student | classwise median filtering | average | ||
Kim_GIST_task4_2 | Kim2022b | 1.46 | 0.510 | 0.711 | RCRNN | mean-teacher student, noisy student | classwise median filtering | average | ||
Kim_GIST_task4_4 | Kim2022b | 0.65 | 0.215 | 0.335 | RCRNN | mean-teacher student | classwise median filtering | |||
Ebbers_UPB_task4_4 | Ebbers2022 | 1.49 | 0.509 | 0.742 | CRNN | self-training | median filtering (event-specific lengths) | MIL | average | |
Ebbers_UPB_task4_2 | Ebbers2022 | 0.83 | 0.047 | 0.824 | FBCRNN | self-training | median filtering (event-specific lengths) | MIL | average | |
Ebbers_UPB_task4_1 | Ebbers2022 | 1.59 | 0.552 | 0.786 | CRNN | self-training | median filtering (event-specific lengths) | MIL | average | |
Ebbers_UPB_task4_3 | Ebbers2022 | 1.46 | 0.527 | 0.679 | CRNN | self-training | median filtering (event-specific lengths) | MIL | average | |
Xu_SRCB-BIT_task4_2 | Xu2022 | 1.41 | 0.482 | 0.702 | FDY-CRNN | mean-teacher student | classwise median filtering | averaging | ||
Xu_SRCB-BIT_task4_1 | Xu2022 | 1.32 | 0.452 | 0.662 | FDY-CRNN | mean-teacher student | median filtering (93ms) | averaging | ||
Xu_SRCB-BIT_task4_3 | Xu2022 | 0.79 | 0.054 | 0.774 | FDY-CRNN | mean-teacher student | median filtering | mean | ||
Xu_SRCB-BIT_task4_4 | Xu2022 | 0.75 | 0.049 | 0.738 | FDY-CRNN | mean-teacher student | median filtering | mean | ||
Nam_KAIST_task4_SED_2 | Nam2022 | 1.25 | 0.409 | 0.656 | CRNN, ensemble | mean-teacher student | class-wise median filtering, weak prediction masking | mean | ||
Nam_KAIST_task4_SED_3 | Nam2022 | 0.77 | 0.057 | 0.747 | CRNN, ensemble | mean-teacher student | class-wise median filtering, weak prediction masking | mean | ||
Nam_KAIST_task4_SED_4 | Nam2022 | 0.77 | 0.055 | 0.747 | CRNN, ensemble | mean-teacher student | class-wise median filtering, weak prediction masking | mean | ||
Nam_KAIST_task4_SED_1 | Nam2022 | 1.24 | 0.404 | 0.653 | CRNN, ensemble | mean-teacher student | class-wise median filtering, weak prediction masking | mean | ||
Blakala_SRPOL_task4_3 | Blakala2022 | 0.95 | 0.293 | 0.527 | CRNN | mean-teacher student | median filtering (160ms) | |||
Blakala_SRPOL_task4_1 | Blakala2022 | 1.11 | 0.365 | 0.584 | CRNN | mean-teacher student | median filtering (160ms) | |||
Blakala_SRPOL_task4_2 | Blakala2022 | 0.78 | 0.069 | 0.728 | CRNN | mean-teacher student | median filtering (160ms) | |||
Li_USTC_task4_SED_1 | Li2022b | 1.41 | 0.480 | 0.713 | CRNN | mean-teacher student, pseudo-labelling | median filtering (340ms) | averaging | ||
Li_USTC_task4_SED_4 | Li2022b | 1.34 | 0.429 | 0.723 | CRNN | mean-teacher student, pseudo-labelling | median filtering (340ms) | averaging | ||
Li_USTC_task4_SED_2 | Li2022b | 1.39 | 0.451 | 0.740 | CRNN | mean-teacher student, pseudo-labelling | median filtering (340ms) | averaging | ||
Li_USTC_task4_SED_3 | Li2022b | 1.35 | 0.450 | 0.699 | CRNN | mean-teacher student, pseudo-labelling | median filtering (340ms) | averaging | ||
Bertola_UPF_task4_1 | Bertola2022 | 0.98 | 0.318 | 0.520 | CRNN | mean-teacher student | median filtering (93ms) | |||
He_BYTEDANCE_task4_4 | He2022 | 0.82 | 0.053 | 0.810 | SK-CRNN, FDY-CRNN | mean-teacher student | median filtering | MIL | averaging | |
He_BYTEDANCE_task4_2 | He2022 | 1.48 | 0.503 | 0.749 | SK-CRNN, FDY-CRNN | mean-teacher student | median filtering | MIL | averaging | |
He_BYTEDANCE_task4_3 | He2022 | 1.52 | 0.525 | 0.748 | SK-CRNN, FDY-CRNN | mean-teacher student | classwise median filtering | MIL | averaging | |
He_BYTEDANCE_task4_1 | He2022 | 1.36 | 0.454 | 0.696 | SK-CRNN, FDY-CRNN | mean-teacher student | median filtering | MIL | averaging | |
Li_ICT-TOSHIBA_task4_2 | Li2022d | 0.79 | 0.090 | 0.709 | transformer (SEDT), CNN (frame-wise model), ensemble | mean-teacher student (frame-wise model), pseudo-labelling (SEDT) | median filtering with adaptive window size (only frame-wise model) | attention layers (only frame-wise model) | majority vote (only SEDT), weighted averaging (frame-wise model and ensemble model) | |
Li_ICT-TOSHIBA_task4_4 | Li2022d | 0.75 | 0.075 | 0.692 | transformer (SEDT), CNN (frame-wise model), ensemble | mean-teacher student (frame-wise model), pseudo-labelling (SEDT) | median filtering with adaptive window size (only frame-wise model) | attention layers (only frame-wise model) | majority vote (only SEDT), weighted averaging (frame-wise model and ensemble model) | |
Li_ICT-TOSHIBA_task4_1 | Li2022d | 1.26 | 0.439 | 0.612 | transformer (SEDT), CNN (frame-wise model), ensemble | mean-teacher student (frame-wise model), pseudo-labelling (SEDT) | median filtering with adaptive window size (only frame-wise model) | attention layers (only frame-wise model) | majority vote (only SEDT), weighted averaging (frame-wise model and ensemble model) | |
Li_ICT-TOSHIBA_task4_3 | Li2022d | 1.20 | 0.411 | 0.597 | transformer (SEDT), CNN (frame-wise model), ensemble | mean-teacher student (frame-wise model), pseudo-labelling (SEDT) | median filtering with adaptive window size (only frame-wise model) | attention layers (only frame-wise model) | majority vote (only SEDT), weighted averaging (frame-wise model and ensemble model) | |
Xie_UESTC_task4_2 | Xie2022 | 0.83 | 0.062 | 0.800 | CRNN | average | ||||
Xie_UESTC_task4_3 | Xie2022 | 1.06 | 0.300 | 0.641 | CRNN | mean-teacher student | median filtering (560ms) | average | ||
Xie_UESTC_task4_1 | Xie2022 | 1.36 | 0.418 | 0.757 | CRNN | mean-teacher student | median filtering (560ms) | average | ||
Xie_UESTC_task4_4 | Xie2022 | 1.38 | 0.426 | 0.766 | CRNN | mean-teacher student | median filtering (560ms) | average | ||
Baseline (AudioSet) | Ronchini2022 | 1.04 | 0.345 | 0.540 | CRNN | mean-teacher student | ||||
Kim_CAUET_task4_1 | Kim2022c | 1.02 | 0.317 | 0.565 | RCRNN | mean-teacher student | median filtering | attention layers | ||
Kim_CAUET_task4_2 | Kim2022c | 1.04 | 0.340 | 0.544 | CRNN with cbam attetion | mean-teacher student | median filtering | attention layers | ||
Kim_CAUET_task4_3 | Kim2022c | 1.04 | 0.338 | 0.554 | CRNN | mean-teacher student | median filtering | attention layers | ||
Li_XJU_task4_1 | Li2022c | 1.10 | 0.364 | 0.570 | CRNN | mean-teacher student | median filtering (93ms) | linearsoftmax layer, attention layer | ||
Li_XJU_task4_3 | Li2022c | 1.17 | 0.371 | 0.635 | CRNN | mean-teacher student | median filtering (93ms) | linearsoftmax layer, attention layer | ||
Li_XJU_task4_4 | Li2022c | 0.93 | 0.195 | 0.683 | CRNN | mean-teacher student | median filtering | linearsoftmax layer, mean | ||
Li_XJU_task4_2 | Li2022c | 0.75 | 0.086 | 0.671 | CRNN | mean-teacher student | median filtering | linearsoftmax layer, mean | ||
Castorena_UV_task4_3 | Castorena2022 | 0.91 | 0.267 | 0.531 | CRNN | mean-teacher student | median filtering (93ms) | |||
Castorena_UV_task4_1 | Castorena2022 | 1.01 | 0.334 | 0.524 | CRNN | mean-teacher student | median filtering (93ms) | |||
Castorena_UV_task4_2 | Castorena2022 | 0.63 | 0.072 | 0.559 | CRNN | mean-teacher student | median filtering (93ms) |
Complexity
Rank | Code |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS 1 (Evaluation dataset) |
PSDS 2 (Evaluation dataset) |
Model complexity |
Ensemble subsystems |
Training time |
---|---|---|---|---|---|---|---|---|
Zhang_UCAS_task4_2 | Xiao2022 | 1.41 | 0.484 | 0.697 | 11325746 | 5 | 18h (1 Tesla P100 ) | |
Zhang_UCAS_task4_1 | Xiao2022 | 1.39 | 0.472 | 0.700 | 11325746 | 5 | 18h (1 Tesla P100 ) | |
Zhang_UCAS_task4_3 | Xiao2022 | 1.21 | 0.420 | 0.599 | 4282496 | 5 | 12h (1 Tesla P100 ) | |
Zhang_UCAS_task4_4 | Xiao2022 | 0.79 | 0.049 | 0.784 | 2359672 | 5 | 8h (1 Tesla P100 ) | |
Liu_NSYSU_task4_2 | Liu2022 | 0.06 | 0.000 | 0.063 | 3251508 | 7h (1 GTX 1080 Ti) | ||
Liu_NSYSU_task4_3 | Liu2022 | 0.29 | 0.070 | 0.194 | 16257540 | 5 | 35h (1 GTX 1080 Ti) | |
Huang_NSYSU_task4_1 | Huang2022 | 1.28 | 0.434 | 0.650 | 14973876 | 6 | 10h * 6 (3060ti) | |
Liu_NSYSU_task4_4 | Liu2022 | 0.21 | 0.046 | 0.151 | 13006032 | 4 | 28h (1 GTX 1080 Ti) | |
Suh_ReturnZero_task4_1 | Suh2022 | 1.22 | 0.393 | 0.650 | 116400000 | 12 | 10h 8m 25s (1 NVIDIA A100-SXM4-80GB) | |
Suh_ReturnZero_task4_4 | Suh2022 | 0.81 | 0.062 | 0.774 | 116400000 | 12 | (1 NVIDIA A100-SXM4-80GB) | |
Suh_ReturnZero_task4_2 | Suh2022 | 1.39 | 0.458 | 0.721 | 116400000 | 12 | 10h 52m 16s(1 NVIDIA A100-SXM4-80GB) | |
Suh_ReturnZero_task4_3 | Suh2022 | 1.42 | 0.478 | 0.719 | 116400000 | 12 | 13h 46m 27s (1 NVIDIA A100-SXM4-80GB) | |
Cheng_CHT_task4_2 | Cheng2022 | 0.93 | 0.276 | 0.543 | 4721326 | 18h (nvidia A100) | ||
Cheng_CHT_task4_1 | Cheng2022 | 1.03 | 0.314 | 0.582 | 4729921 | 18h (nvidia A100) | ||
Liu_SRCN_task4_2 | Liu2022 | 0.90 | 0.129 | 0.758 | 89500000 | 36h (1 NVIDIA A100 40Gb) | ||
Liu_SRCN_task4_1 | Liu2022 | 0.79 | 0.051 | 0.777 | 89500000 | 36h (1 NVIDIA A100 40Gb) | ||
Liu_SRCN_task4_4 | Liu2022 | 0.24 | 0.025 | 0.219 | 79700000 | 11h (1 RTX 2080 Ti) | ||
Liu_SRCN_task4_3 | Liu2022 | 1.25 | 0.425 | 0.634 | 11061000 | 6h (1 NVIDIA A100 40Gb) | ||
Kim_LGE_task4_1 | Kim2022a | 1.34 | 0.444 | 0.697 | 11061000 | 8h (1 RTX A5000) | ||
Kim_LGE_task4_3 | Kim2022a | 0.81 | 0.062 | 0.781 | 11061000 | 8h (1 RTX A5000) | ||
Kim_LGE_task4_4 | Kim2022a | 1.17 | 0.305 | 0.750 | 11061000 | 8h (1 RTX A5000) | ||
Kim_LGE_task4_2 | Kim2022a | 1.34 | 0.444 | 0.695 | 11061000 | 8h (1 RTX A5000) | ||
Ryu_Deeply_task4_1 | Ryu2022 | 0.83 | 0.257 | 0.461 | 625K | 25h (4 A100 GPUs) | ||
Ryu_Deeply_task4_2 | Ryu2022 | 0.66 | 0.156 | 0.449 | 625K | 16.9h (4 A100 GPUs) | ||
Giannakopoulos_UNIPI_task4_2 | Giannakopoulos2022 | 0.21 | 0.029 | 0.184 | 4213258 | 6h (1 GTX 2080 Ti) | ||
Giannakopoulos_UNIPI_task4_1 | Giannakopoulos2022 | 0.35 | 0.104 | 0.196 | 4213258 | 6h (1 GTX 2080 Ti) | ||
Mizobuchi_PCO_task4_4 | Mizobuchi2022 | 0.82 | 0.062 | 0.787 | 52793884 | 11 | 77h (1 NVIDIA Tesla V100 SXM2) | |
Mizobuchi_PCO_task4_2 | Mizobuchi2022 | 1.26 | 0.439 | 0.611 | 70847296 | 16 | 48h (1 NVIDIA Tesla V100 SXM2) | |
Mizobuchi_PCO_task4_3 | Mizobuchi2022 | 0.88 | 0.197 | 0.620 | 44279560 | 10 | 30h (1 NVIDIA Tesla V100 SXM2) | |
Mizobuchi_PCO_task4_1 | Mizobuchi2022 | 1.15 | 0.398 | 0.571 | 35423648 | 8 | 24h (1 NVIDIA Tesla V100 SXM2) | |
KIM_HYU_task4_2 | Sojeong2022 | 1.28 | 0.421 | 0.664 | 1112420 | 5 | 6h (1 GTX 2080 Ti) | |
KIM_HYU_task4_4 | Sojeong2022 | 1.27 | 0.423 | 0.651 | 1112420 | 5 | 6h (1 GTX 2080 Ti) | |
KIM_HYU_task4_1 | Sojeong2022 | 1.19 | 0.390 | 0.620 | 1112420 | 3 | 6h (1 GTX 2080 Ti) | |
KIM_HYU_task4_3 | Sojeong2022 | 1.24 | 0.415 | 0.634 | 1112420 | 2 | 4h (1 GTX 3090 Ti) | |
Baseline | Turpault2022 | 1.00 | 0.315 | 0.543 | 2200000 | 6h (1 GTX 1080 Ti) | ||
Dinkel_XiaoRice_task4_1 | Dinkel2022 | 1.29 | 0.422 | 0.679 | 8430844 | 9 | 3 h | |
Dinkel_XiaoRice_task4_2 | Dinkel2022 | 1.15 | 0.373 | 0.613 | 148852 | 3 h | ||
Dinkel_XiaoRice_task4_4 | Dinkel2022 | 0.92 | 0.104 | 0.824 | 27992026 | 6 | 24 h | |
Dinkel_XiaoRice_task4_3 | Dinkel2022 | 1.38 | 0.451 | 0.727 | 37451786 | 11 | 24 h | |
Hao_UNISOC_task4_2 | Hao2022 | 0.78 | 0.078 | 0.723 | 4590228 | 3 | 36h (1 RTX 6000) | |
Hao_UNISOC_task4_1 | Hao2022 | 1.24 | 0.425 | 0.615 | 4590228 | 3 | 36h (1 RTX 6000) | |
Hao_UNISOC_task4_3 | Hao2022 | 1.09 | 0.373 | 0.547 | 4590228 | 36h (1 RTX 6000) | ||
Khandelwal_FMSG-NTU_task4_1 | Khandelwal2022 | 0.83 | 0.158 | 0.633 | 2770884 | 20h (1 NVIDIA Quadro RTX 5000) | ||
Khandelwal_FMSG-NTU_task4_2 | Khandelwal2022 | 0.80 | 0.082 | 0.731 | 118567907 | 24h (1 NVIDIA Quadro RTX 5000) | ||
Khandelwal_FMSG-NTU_task4_3 | Khandelwal2022 | 1.26 | 0.410 | 0.664 | 2770884 | 20h (1 NVIDIA Quadro RTX 5000) | ||
Khandelwal_FMSG-NTU_task4_4 | Khandelwal2022 | 1.20 | 0.386 | 0.643 | 2770884 | 20h (1 NVIDIA Quadro RTX 5000) | ||
deBenito_AUDIAS_task4_4 | deBenito2022 | 1.28 | 0.432 | 0.649 | 10659182 | 7 | 77h (1 GeForce RTX 2080 Ti) | |
deBenito_AUDIAS_task4_1 | deBenito2022 | 1.23 | 0.400 | 0.646 | 15911270 | 10 | 111h (1 GeForce RTX 2080 Ti) | |
deBenito_AUDIAS_task4_2 | deBenito2022 | 1.08 | 0.310 | 0.642 | 15911270 | 10 | 111h (1 GeForce RTX 2080 Ti) | |
deBenito_AUDIAS_task4_3 | deBenito2022 | 1.23 | 0.407 | 0.643 | 10659182 | 7 | 77h (1 GeForce RTX 2080 Ti) | |
Li_WU_task4_4 | Shao2022 | 1.41 | 0.486 | 0.694 | 475547380 | 5 | 8.3h (1 A100-SXM4-80GB) | |
Li_WU_task4_2 | Shao2022 | 1.36 | 0.476 | 0.666 | 29986148 | 6.6h (1 A100-SXM4-80GB) | ||
Li_WU_task4_3 | Shao2022 | 1.40 | 0.482 | 0.693 | 95109476 | 8.3h (1 A100-SXM4-80GB) | ||
Li_WU_task4_1 | Shao2022 | 1.13 | 0.368 | 0.594 | 1112420 | 4h (1 A100-SXM4-80GB) | ||
Kim_GIST_task4_3 | Kim2022b | 1.43 | 0.500 | 0.695 | 1691694 | 10 | 74h (5 RTX 2080ti) | |
Kim_GIST_task4_1 | Kim2022b | 1.47 | 0.514 | 0.713 | 1691694 | 10 | 74h (5 RTX 2080ti) | |
Kim_GIST_task4_2 | Kim2022b | 1.46 | 0.510 | 0.711 | 1691694 | 10 | 74h (5 RTX 2080ti) | |
Kim_GIST_task4_4 | Kim2022b | 0.65 | 0.215 | 0.335 | 792228 | 18h (1 RTX A6000) | ||
Ebbers_UPB_task4_4 | Ebbers2022 | 1.49 | 0.509 | 0.742 | 134119060 | 30 | 2d (10 A100) | |
Ebbers_UPB_task4_2 | Ebbers2022 | 0.83 | 0.047 | 0.824 | 499812480 | 40 | 5d (10 A100) | |
Ebbers_UPB_task4_1 | Ebbers2022 | 1.59 | 0.552 | 0.786 | 779623240 | 60 | 5d (10 A100) | |
Ebbers_UPB_task4_3 | Ebbers2022 | 1.46 | 0.527 | 0.679 | 780237640 | 60 | 5d (10 A100) | |
Xu_SRCB-BIT_task4_2 | Xu2022 | 1.41 | 0.482 | 0.702 | 11066748 | 10 | 4h (1 RTX 3090) | |
Xu_SRCB-BIT_task4_1 | Xu2022 | 1.32 | 0.452 | 0.662 | 11066748 | 5 | 4h (1 RTX 3090) | |
Xu_SRCB-BIT_task4_3 | Xu2022 | 0.79 | 0.054 | 0.774 | 11117798 | 5 | 4h (1 RTX 3090) | |
Xu_SRCB-BIT_task4_4 | Xu2022 | 0.75 | 0.049 | 0.738 | 11081958 | 2 | 4h (1 RTX 3090) | |
Nam_KAIST_task4_SED_2 | Nam2022 | 1.25 | 0.409 | 0.656 | 11061468 | 12 | 6h (1 RTX Titan) | |
Nam_KAIST_task4_SED_3 | Nam2022 | 0.77 | 0.057 | 0.747 | 11061468 | 53 | 6h (1 RTX Titan) | |
Nam_KAIST_task4_SED_4 | Nam2022 | 0.77 | 0.055 | 0.747 | 11061468 | 150 | 6h (1 RTX Titan) | |
Nam_KAIST_task4_SED_1 | Nam2022 | 1.24 | 0.404 | 0.653 | 11061468 | 31 | 6h (1 RTX Titan) | |
Blakala_SRPOL_task4_3 | Blakala2022 | 0.95 | 0.293 | 0.527 | 1.2M | 4.5h (1 RTX 2080) | ||
Blakala_SRPOL_task4_1 | Blakala2022 | 1.11 | 0.365 | 0.584 | 1177663 | 8h (1 RTX 2080) | ||
Blakala_SRPOL_task4_2 | Blakala2022 | 0.78 | 0.069 | 0.728 | 5.3M | 29h (1 RTX 2080) | ||
Li_USTC_task4_SED_1 | Li2022b | 1.41 | 0.480 | 0.713 | 26842020 | 10 | 20h (2 GTX 3090) | |
Li_USTC_task4_SED_4 | Li2022b | 1.34 | 0.429 | 0.723 | 8052606 | 10 | 6h (2 GTX 3090) | |
Li_USTC_task4_SED_2 | Li2022b | 1.39 | 0.451 | 0.740 | 26842020 | 10 | 20h (2 GTX 3090) | |
Li_USTC_task4_SED_3 | Li2022b | 1.35 | 0.450 | 0.699 | 8052606 | 10 | 6h (2 GTX 3090) | |
Bertola_UPF_task4_1 | Bertola2022 | 0.98 | 0.318 | 0.520 | 1112420 | 3h (1 GTX 1080 Ti) | ||
He_BYTEDANCE_task4_4 | He2022 | 0.82 | 0.053 | 0.810 | 15919068 | 40 | 8h (1 A100) | |
He_BYTEDANCE_task4_2 | He2022 | 1.48 | 0.503 | 0.749 | 15919068 | 40 | 8h (1 A100) | |
He_BYTEDANCE_task4_3 | He2022 | 1.52 | 0.525 | 0.748 | 15919068 | 16 | 8h (1 A100) | |
He_BYTEDANCE_task4_1 | He2022 | 1.36 | 0.454 | 0.696 | 11061468 | 8 | 3h (1 A100) | |
Li_ICT-TOSHIBA_task4_2 | Li2022d | 0.79 | 0.090 | 0.709 | 224997445 | 10 (5 SEDT, 5 frame-wise model) | 186 h (1 RTX A4000) + 35 h (3 RTX 2080 Ti) | |
Li_ICT-TOSHIBA_task4_4 | Li2022d | 0.75 | 0.075 | 0.692 | 188469803 | 9 (4 SEDT, 5 frame-wise model) | 53h (1 RTX A4000) + 30 h (3 RTX 2080 Ti) | |
Li_ICT-TOSHIBA_task4_1 | Li2022d | 1.26 | 0.439 | 0.612 | 224997445 | 10 (5 SEDT, 5 frame-wise model) | 186 h (1 RTX A4000) + 35 h (3 RTX 2080 Ti) | |
Li_ICT-TOSHIBA_task4_3 | Li2022d | 1.20 | 0.411 | 0.597 | 188469803 | 9 (4 SEDT, 5 frame-wise model) | 53h (1 RTX A4000) + 30 h (3 RTX 2080 Ti) | |
Xie_UESTC_task4_2 | Xie2022 | 0.83 | 0.062 | 0.800 | 166283314 | 8 | 20min (1 GTX 3080 Ti) | |
Xie_UESTC_task4_3 | Xie2022 | 1.06 | 0.300 | 0.641 | 25054503 | 2 | 3h (1 GTX 3080 Ti) | |
Xie_UESTC_task4_1 | Xie2022 | 1.36 | 0.418 | 0.757 | 225490527 | 8 | 2h (1 GTX 3080 Ti) | |
Xie_UESTC_task4_4 | Xie2022 | 1.38 | 0.426 | 0.766 | 225490527 | 8 | 2h (1 GTX 3080 Ti) | |
Baseline (AudioSet) | Ronchini2022 | 1.04 | 0.345 | 0.540 | 2200000 | 6h (1 GTX 1080 Ti) | ||
Kim_CAUET_task4_1 | Kim2022c | 1.02 | 0.317 | 0.565 | Trainable 1.7 M non-Trainable 1.7M | 10h (1 RTX 2080 Ti) | ||
Kim_CAUET_task4_2 | Kim2022c | 1.04 | 0.340 | 0.544 | Trainable 1.1 M non-Trainable 1.1M | 9h (1 RTX 2080 Ti) | ||
Kim_CAUET_task4_3 | Kim2022c | 1.04 | 0.338 | 0.554 | Trainable 1.1 M non-Trainable 1.1M | 9h (1 RTX 2080 Ti) | ||
Li_XJU_task4_1 | Li2022c | 1.10 | 0.364 | 0.570 | 4.2MB | 7h (1 Titan RTX) | ||
Li_XJU_task4_3 | Li2022c | 1.17 | 0.371 | 0.635 | 4.2MB | 7h (1 Titan RTX) | ||
Li_XJU_task4_4 | Li2022c | 0.93 | 0.195 | 0.683 | 4.2MB | 7h (1 Titan RTX) | ||
Li_XJU_task4_2 | Li2022c | 0.75 | 0.086 | 0.671 | 4.2MB | 7h (1 Titan RTX) | ||
Castorena_UV_task4_3 | Castorena2022 | 0.91 | 0.267 | 0.531 | 1100000 | 4h (1 GTX 3060 Ti) | ||
Castorena_UV_task4_1 | Castorena2022 | 1.01 | 0.334 | 0.524 | 1100000 | 4h (1 GTX 3060 Ti) | ||
Castorena_UV_task4_2 | Castorena2022 | 0.63 | 0.072 | 0.559 | 1100000 | 4h (1 GTX 3060 Ti) |
Technical reports
Data Augmentation Methods Exploration For Sound Event Detection
Bertola, Marco
Universitat Pompeu Fabra, Barcelona, Spain
Bertola_UPF_task4_1
Data Augmentation Methods Exploration For Sound Event Detection
Bertola, Marco
Universitat Pompeu Fabra, Barcelona, Spain
Abstract
In this technical report is describe the submission of a system for DCASE2022 Task4: Sound Event Detection in Domestic Environments 2022 [1]. Sound Event Detection (SED) systems have gained great attention in the past few years, motivated by emerging applications in several different fields such as smart homes, autonomous cars, and healthcare. Their performances can heavily depend on the availability of a large amount of strongly labeled data. Generating or retrieving this data is often difficult and costly. The aim of this work is to explore, combine and compare different data augmentation techniques to balance out the lack of strongly labeled data. As conclusion, the best result is submitted to DCASE 2022 Task4 challenge.
System characteristics
Dcase 2022 Task 4 Technical Report
Kornel, Błakała and Sikorski, Olaf
Samsung R&D Intsitute Poland, Warsaw, Poland
Blakala_SRPOL_task4_1 Blakala_SRPOL_task4_2 Blakala_SRPOL_task4_3
Dcase 2022 Task 4 Technical Report
Kornel, Błakała and Sikorski, Olaf
Samsung R&D Intsitute Poland, Warsaw, Poland
Abstract
This paper describes our solution for Task 4 of the 2022 edition of the Detection and Classification of Acoustic Scenes and Events competition. Our solution practically consists of two specialised systems that excel in either of the two scenarios in the challenge. Both utilise the CRNN model architecture and mean-teacher training setup proposed in the baseline solution. The modifications that they share are the replacement of the CNN extractor with a ResNet-18 architecture and the reduction of the FFT window from 2048 to 1024 samples. The systems diverge in four aspects: the set of augmentations selected and whether they use any additional techniques during training. For Scenario 1 we observed improvement when using pitch shift, while all other data augmentation methods resulted in lower PSDS. On the other hand, Scenario 2 benefited greatly from spectrogram time warping and adding brown noise. Further improvement on Scenario 2 was achieved by replacing attention with mean aggregation for weak predictions, incorporating per-frame embeddings from Audio Spectrogram Transformer (AST) and injecting Gaussian noise between teacher and student during consistency loss calculation. Curiously, these modifications diminished performance on Scenario 1. The system specialising in Scenario 1 scored [0.3743, 0.5826] and the system specialising in Scenario 2 scored [0.0701, 0.7938] in [P SDS1, P SDS2] respectively.
System characteristics
Sound Event Detection System With Multiscale Channel Attention And Multiple Consistency Training For Dcase 2022 Task 4
Cheng, Yu-Han and Lu, Chung-Li and Chan, Bo-Cheng and Chuang, Hsiang-Feng
Chunghwa Telecom Laboratories, Taiwan
Cheng_CHT_task4_1 Cheng_CHT_task4_2
Sound Event Detection System With Multiscale Channel Attention And Multiple Consistency Training For Dcase 2022 Task 4
Cheng, Yu-Han and Lu, Chung-Li and Chan, Bo-Cheng and Chuang, Hsiang-Feng
Chunghwa Telecom Laboratories, Taiwan
Abstract
In this technical report, we describe our submission system for DCASE 2022 Task4: sound event detection and separation in domestic environments. The proposed system is based on mean-teacher framework of semi-supervised learning and neural networks of CRNN. We employ consistency training of interpolation (ICT), shift (SCT), and clip-level (CCT) to enhance the generalization and representation. A multiscale CNN block is applied to extract various features to mitigate the influence of the event length diversity for the network. An efficient channel attention network (ECA-Net) and attention pooling enable the model to obtain definite sound event predictions. To further improve the performance, we use data augmentation including mixup, time shift, and filter augmentation. Our best system achieves the PSDS-scenario1 of 36.20% and PSDS-scenario2 of 63.45% on the validation set, significantly outperforming that of the baseline score of 32.93% and 53.22%, respectively.
System characteristics
Multi-Resolution Combination Of CRNN And Conformers For Dcase 2022 Task 4
de Benito-Gorron, Diego and Barahona, Sara and Segovia, Sergio and Ramos, Daniel and Toledano Doroteo
AUDIAS Research Group, Universidad Autónoma de Madrid, Madrid, Spain
deBenito_AUDIAS_task4_1 deBenito_AUDIAS_task4_2 deBenito_AUDIAS_task4_3 deBenito_AUDIAS_task4_4
Multi-Resolution Combination Of CRNN And Conformers For Dcase 2022 Task 4
de Benito-Gorron, Diego and Barahona, Sara and Segovia, Sergio and Ramos, Daniel and Toledano Doroteo
AUDIAS Research Group, Universidad Autónoma de Madrid, Madrid, Spain
Abstract
This technical report describes our submission to DCASE 2022 Task 4: Sound event detection in domestic environments. We follow a multi-resolution approach consisting on a late fusion of systems that are trained with different feature extraction parameters, aiming to leverage the characteristics of different event categories in time and frequency. Our systems are built upon the Convolutional-Recurrent Neural Network (CRNN) proposed by the baseline system and the Conformer structure proposed by the winners of the 2020 challenge.
System characteristics
A Large Multi-Modal Ensemble For Sound Event Detection
Dinkel, Heinrich and Yan, Zhiyong and Wang, Yongqing and Song, Meixu and Zhang, Junbo and Wang, Wang
Xiaomi Corporation, Beijing, China
Dinkel_XiaoRice_task4_1 Dinkel_XiaoRice_task4_2 Dinkel_XiaoRice_task4_3 Dinkel_XiaoRice_task4_4
A Large Multi-Modal Ensemble For Sound Event Detection
Dinkel, Heinrich and Yan, Zhiyong and Wang, Yongqing and Song, Meixu and Zhang, Junbo and Wang, Wang
Xiaomi Corporation, Beijing, China
Abstract
This paper is a system description of the XiaoRice team submission to the DCASE 2022 Task 4 challenge. Our method focuses on merging commonly used convolutional neural networks (CNNs) with transformer-based methods and recurrent-neural networks (RNNs). We deliberately divide our efforts into optimizing the two evaluation metrics for the challenge: the onset and offset sensitive PSDS-1 score and the clip-level PSDS-2 score. This work shows that a large ensemble of differently trained architectures and frameworks can lead to significant gains. Our PSDS-1 optimized system consists of an 11-way convolutional recurrent neural network (CRNN), Vision transformer (ViT) fusion, and achieves a PSDS-1 score of 48.19. Further, our PSDS-2 system comprised of a 6-way CNN and ViT fusion achieved a PSDS-2 score of 87.70 on the development dataset.
System characteristics
Pre-Training And Self-Training For Sound Event Detection In Domestic Environments
Ebbers, Janek and Haeb-Umbach, Reinhold
Paderborn University, Paderborn, Germany
Abstract
In this report we present our system for the Detection and Classification of Acoustic Scenes and Events (DCASE) 2022 Challenge Task 4: Sound Event Detection in Domestic Environments 1 . As in previous editions of the Challenge, we use forward-backward convolutional recurrent neural networks (FBCRNNs) [1, 2] for weakly labeled and semi-supervised sound event detection (SED) and eventually generate strong pseudo labels for weakly labeled and unlabeled data. Then, (tag-conditioned) bidirectional CRNNs (Bi-CRNNs) [1, 2] are trained in a strongly supervised manner as our final SED models. In each of the training stages we use multiple iterations of self-training. Compared to previous editions, we improved our system performance by 1) some tweaks regarding data augmentation, pseudo labeling and inference 2) using weakly labeled AudioSet data [3] for pretraining larger networks and 3) augmenting the DESED data [4] with strongly labeled AudioSet data [5] for finetuning of the networks. Source code is publicly available at https://github.com/fgnt/pb_sed.
System characteristics
Semi-Supervised Sound Event Detection Based On Mean Teacher With Selective Kernel Multiscale Convolution And Resident Cam Clastering
Qiao, Ziling and Gan, Yanggang and Wu, Juan and Cai, Xichang and Wu, Menglong and Dong, Hongxia and Zhang, Lin Zhang and Liu, Zihan
North China University of Technology, Beijing, China
Gan_NCUT_task4_1 Gan_NCUT_task4_2
Semi-Supervised Sound Event Detection Based On Mean Teacher With Selective Kernel Multiscale Convolution And Resident Cam Clastering
Qiao, Ziling and Gan, Yanggang and Wu, Juan and Cai, Xichang and Wu, Menglong and Dong, Hongxia and Zhang, Lin Zhang and Liu, Zihan
North China University of Technology, Beijing, China
Abstract
In this technical report, we present our submission system for DCASE 2022 Task4: sound event detection in domestic environments. The proposed system is based on mean teacher framework of semi-supervised learning and Selective Kernel Convolution Network. We use Multi-scale convolution to extract more abundant features of sound events. In order to improve the localization ability of the system, we use a dynamically selected attention mechanism called SK unit in CNN, which allows each neuron to adaptively adjust the size of its receptive field according to multiple scales of input information. Our system finally achieves the PSDS-scenario1 of 39.0% and PSDS-scenario2 of 58.50% on the validation set. In terms of innovative methods, this technical report will provide a technical description of system 2 submitted by the NCUT team. In system 2, the team selected the audio event monitoring method based on grad CAM clustering. This method attempts to use PANNs based migration learning network to generate grad CAM class activation diagram to locate the time point of the event. Finally, the adaptability of several different network models is evaluated, and the models with higher scores and better adaptability are probability fused to obtain the reasoning of events. Finally, the system 2 based on CAM clustering achieved 9.963% PSDS-scenario1 and 69.877% PSDS-scenario2 scores in the development data set.
System characteristics
Multi-Task Learning For Sound Event Detection Using Variational Autoencoders
Giannakopoulos, Petros1 and Pikrakis, Aggelos2
1National and Kapodistrian University of Athens, Athens, Greece 2University of Piraeus, Piraeus, Greece
Giannakopoulos_UNIPI_task4_1 Giannakopoulos_UNIPI_task4_2
Multi-Task Learning For Sound Event Detection Using Variational Autoencoders
Giannakopoulos, Petros1 and Pikrakis, Aggelos2
1National and Kapodistrian University of Athens, Athens, Greece 2University of Piraeus, Piraeus, Greece
Abstract
This technical report presents a multi-task learning model based on recurrent variational autoencoders (VAEs). The proposed method employs recurrent VAEs with shared parameters to simultaneously learn the tasks of strong labeling, weak labeling and feature sequence reconstruction. During the training stage, the model receives as input strongly labeled, weakly labeled data and unlabeled data and it simultaneously optimizes frame-based and file-based cross-entropy losses for strongly labeled and weakly labeled data, respectively, as well as the reconstruction loss for the unlabeled data. Using a shared posterior among all task branches, the model projects the input data for each task into a common latent space. The decoding of latents sampled from this common latent space, in combination with the shared parameters among task branches act jointly as a regularizer that prevents the model from overfitting to the individual tasks. The proposed method is evaluated on the DCASE-2022 Task4 dataset on which it achieves an event-based macro F1 score of 32.5% on the validation set and 31.8% on the public evaluation set.
System characteristics
Dcase 2022 Task4 Challenge Technical Report
Hao, Junyong and Ye, Shunzhou and Lu, Cheng and Dong, Fei and Liu, Jingang
UNISOC, Chongqing, China
Hao_UNISOC_task4_1 Hao_UNISOC_task4_2Hao_UNISOC_task4_3
Dcase 2022 Task4 Challenge Technical Report
Hao, Junyong and Ye, Shunzhou and Lu, Cheng and Dong, Fei and Liu, Jingang
UNISOC, Chongqing, China
Abstract
This report proposes a polyphonic sound event detection (SED) method for the DCASE 2022 Challenge Task 4-Sound Event Detection in Domestic Environments. We use the dataset of DESED to train our model, contains strongly labeled synthetic data, large unlabeled data, weakly labeled data and strongly labeled real data. To perform this task, we propose a DACRNN network for joint learning of SED and domain adaptation (DA).We consider the impact of the distribution within a single sound on the generalization performance of the model by mitigating the impact of complex background noise on event detection and the self-correlation consistency regularization of clip-level sound event classification, these make the intra-domain of a single sound smoother; for cross-domain adaptation, adversarial learning through feature extraction network with weighted frame-level domain discriminator. Experiments on the DCASE 2022 task4 validation dataset and public-evaluation dataset demonstrate the effectiveness of the techniques used in our system. Specifically, PSDS1 scores of 0.448 and PSDS2 scores of 0.853 are achieved for validation dataset, PSDS1 scores of 0.553 and PSDS2 scores of 0.836 are achieved for public-evaluation dataset.
System characteristics
Semi-Supervised Sound Event Detection System For Dcase 2022 Task 4
He, Kexin and Shu, Xin and Jia, Shaoyong and He, Yi
Bytedance AI Lab, Beijing, China
He_BYTEDANCE_task4_1 He_BYTEDANCE_task4_2He_BYTEDANCE_task4_3He_BYTEDANCE_task4_4
Semi-Supervised Sound Event Detection System For Dcase 2022 Task 4
He, Kexin and Shu, Xin and Jia, Shaoyong and He, Yi
Bytedance AI Lab, Beijing, China
Abstract
In this report, we describe our submissions for the task 4 of Detection and Classification of Acoustic Scenes and Events (DCASE) 2022 Challenge: Sound Event Detection in Domestic Environments. Our methods are mainly based on two types of deep learning models: Convolutional Recurrent Neural Network with selective kernel convolution (SK-CRNN) and frequency dynamic convolution (FDY-CRNN). In order to prevent overfitting, we adopt data augmentation using mixup strategy, FilterAugment, Interpolation Consistency Training (ICT) and Shift Consistency Training (SCT). Besides, we utilize external data and pretrained model to further improve performance, and try an ensemble of multiple subsystems to enhance the generalization capability of our system. Our final systems achieve a PSDS1/PSDS2 score of 0.5331/0.8569 on development dataset.
System characteristics
Cht+Nsysu Sound Event Detection System With Different Kinds Of Pretrained Models For Dcase 2022 Task 4
Huang, Sung-Jen1 and Liu, Chia-Chuan1 and Chen, Chia-Ping1 and Lu, Chung-Li2 and Chan, Bo-Cheng2 and Cheng, Yu-Han2 and Chuang, Hsiang-Feng2
1National Sun Yat-Sen University, Taiwan 2Chunghwa Telecom Laboratories, Taiwan
Huang_NSYSU_task4_1 Liu_NSYSU_task4_2 Liu_NSYSU_task4_3Liu_NSYSU_task4_4
Cht+Nsysu Sound Event Detection System With Different Kinds Of Pretrained Models For Dcase 2022 Task 4
Huang, Sung-Jen1 and Liu, Chia-Chuan1 and Chen, Chia-Ping1 and Lu, Chung-Li2 and Chan, Bo-Cheng2 and Cheng, Yu-Han2 and Chuang, Hsiang-Feng2
1National Sun Yat-Sen University, Taiwan 2Chunghwa Telecom Laboratories, Taiwan
Abstract
In this technical report, we describe our submission system for DCASE 2022 Task4: sound event detection in domestic environments. We proposed two kinds of systems. One is trained by combining the mean teacher framework and knowledge distillation (one student model and two teacher models) without external data. While training this system, we first trained a mean teacher model to be a pretrained model. Our next step is to select the better one, the teacher or student model, to be the trained model for knowledge distillation. Afterword, we trained another mean teacher model with a different architecture using knowledge distillation. Finally, we repeat the select model step and knowledge distillation several times. The mean teacher model in the final round is composed of a VGG block, selective kernels and a clip level consistency branch. Comparing to the PSDS-scenario1 of 35.1% and PSDS-scenario2 of 55.2% of the baseline system trained without external data, the ensemble of this kind of system can achieve 43.7% and 68.0%, respectively. The other system can be separated into two parts. The first part is the top three layers of pretrained PANNs, while the second part is a similar system to baseline with only three convolution blocks. Then we trained the whole system (included PANNs) with DESED data. Ensembleing this system, the PSDS-scenario1 and 2 of 46.5% and 76.7% outperforms the baseline system (trained with AST embedding) of 31.3% and 72.2%.
System characteristics
Fmsg-Ntu Submission For Dcase 2022 Task 4 On Sound Event Detection In Domestic Environments
Khandelwal, Tanmay1,2 and Das, Rohan Kumar1 and Koh, Andrew2 and Chng, Eng Siong2
1Fortemedia Singapore, Singapore 2Nanyang Technological University (NTU), Singapore
Khandelwal_FMSG-NTU_task4_1 Khandelwal_FMSG-NTU_task4_2 Khandelwal_FMSG-NTU_task4_3Khandelwal_FMSG-NTU_task4_4
Fmsg-Ntu Submission For Dcase 2022 Task 4 On Sound Event Detection In Domestic Environments
Khandelwal, Tanmay1,2 and Das, Rohan Kumar1 and Koh, Andrew2 and Chng, Eng Siong2
1Fortemedia Singapore, Singapore 2Nanyang Technological University (NTU), Singapore
Abstract
In this work, we describe the jointly submitted systems by Fortemedia Singapore (FMSG) and Nanyang Technological University (NTU) for DCASE 2022 Task 4: sound event detection in domestic environments. The proposed framework is divided into two stages: Stage-1 focuses on the audio-tagging system, which assists the sound event detection system in Stage-2. We train the Stage-1 utilizing a strongly labeled set converted into weak predictions, a weakly labeled set, and an unlabeled set to develop an effective audio-tagging system. This audio-tagging system is then used to infer on the unlabeled set to generate reliable pseudo-weak labels, which are used together with the strongly labeled set and weakly labeled set to train the sound event detection system at Stage-2. In Stage-1, we used two different networks, which are frequency dynamic (FDY)-convolutional recurrent neural network (CRNN) and convolutional neural network (CNN)-14 based pretrained audio neural networks (PANNs) for our developed systems. While the system at Stage-2 is based on FDY-CRNN for all the systems submitted to the challenge. It is noted that the systems at both stages employ data augmentation to reduce the risk of overfitting, and apply adaptive post-processing techniques to further enhance the performance. On the DESED real validation dataset, we obtain the highest PSDS1 and PSDS2 of 0.474 and 0.840, respectively.
System characteristics
Sound Event Detection System Using Fixmatch For Dcase 2022 Challenge Task 4
Kim, Changmin and Yang, Siyoung
LG Electronics, Seoul, South Korea
Kim_LGE_task4_1 Kim_LGE_task4_2Kim_LGE_task4_3Kim_LGE_task4_4
Sound Event Detection System Using Fixmatch For Dcase 2022 Challenge Task 4
Kim, Changmin and Yang, Siyoung
LG Electronics, Seoul, South Korea
Abstract
This technical report proposes a sound event detection (SED) system in domestic environments for DCASE 2022 challenge task 4. In this system, the training method consists of two stages. In the stage 1, mean teacher (MT) and interpolation consistency training (ICT) are used. In the stage 2, FixMatch is additionally applied. We adopted the frequency dynamic convolution recurrent neural network (FDY-CRNN) structure as our model. In order to further improve the performance of polyphonic sound detection score (PSDS) scenario 2, three techniques were used. First, we applied a temperature parameter to the sigmoid function to obtain soft confidence value. Second, we used a weak SED that is a method that uses only weak predictions and sets the timestamp equal to the total duration of the audio clip. Third, the FSD50K dataset was added to the weakly labeled dataset, which helped the PSDS scenario 2. As a result, we obtained the best PSDS scenario 1 of 0.473, and best PSDS scenario 2 of 0.695 on the domestic environment SED real validation dataset.
System characteristics
Semi-Supervised Learning-Based Sound Event Detection Using Frequency-Channel-Wise Selective Kernel For Dcase Challenge 2022 Task 4
Kim, Ji Won Kim1 and Lee, Geon Woo1 and Kim, Hong Kook1,2 and Seo, Yeon Sik3 and Song, Il Hoon3
1AI Graduate School, Gwangju, Korea 2Gwangju Institude of Science and Technology, Gwangju, Korea 3I Lab., R&D Center, Hanwha Techwin, Gyeonggi-do, Korea
Kim_GIST_task4_1 Kim_GIST_task4_2 Kim_GIST_task4_3Kim_GIST_task4_4
Semi-Supervised Learning-Based Sound Event Detection Using Frequency-Channel-Wise Selective Kernel For Dcase Challenge 2022 Task 4
Kim, Ji Won Kim1 and Lee, Geon Woo1 and Kim, Hong Kook1,2 and Seo, Yeon Sik3 and Song, Il Hoon3
1AI Graduate School, Gwangju, Korea 2Gwangju Institude of Science and Technology, Gwangju, Korea 3I Lab., R&D Center, Hanwha Techwin, Gyeonggi-do, Korea
Abstract
In this report, we propose a mean-teacher model-based sound event detection (SED) model that uses semi-supervised learning to the labeled data deficiency problem for the DCASE 2022 Challenge Task 4. The mean-teacher model of the proposed SED model is based on a residual convolutional recurrent neural network (RCRNN) architecture, and the residual convolutional blocks in the RCRNN are modified to include the frequency-wise and/or channel-wise selective kernel attention (SKA), which is hereafter referred to as SKA-RCRNN. This enables the RCRNN to have an adaptive receptive field for different lengths of audio. In particular, the proposed SKA-RCRNN-based SED model is first trained on the training dataset, during which it generated pseudo-labeled data for weakly labeled and unlabeled data. Next, the noisy student model, which is also based on SKA-RCRNN, in the second stage is optimized via semi-supervised learning by using strongly labeled and pseudo-labeled data. Finally, several ensemble models are obtained from fivefold cross-validation SED models with various hyper-parameters, and some of them are selected as the submitted models that show higher F1 and polyphonic sound detection scores on the validation dataset of the DCASE 2022 Challenge Task 4 are selected for submission.
System characteristics
The Cau-Et For Dcase 2022 Challenge Technical Reports
Kim, Narin and Lee, Sumi Lee and Kwak, Il youp
Chung-Ang University, Department of Applied Statistics, Seoul, South Korea
Kim_CAUET_task4_1 Kim_CAUET_task4_2Kim_CAUET_task4_3
The Cau-Et For Dcase 2022 Challenge Technical Reports
Kim, Narin and Lee, Sumi Lee and Kwak, Il youp
Chung-Ang University, Department of Applied Statistics, Seoul, South Korea
Abstract
In this technical report, We present a semi-supervised learning method using RCRNN for DCASE 2022 challenge Task 4. We applied three main methods to improve the performance of sound event detection(SED). The first is semi-supervised network using RCRNN based on mean teacher model. The CNN part consists of residual convolution block with a CBAM[1] self-attention module which is stacked 5-layers, and the classification was performed with the RNN part. The second is the application of different data augmentation to features with different types of labels. Mix up, frame shift, time shift, time masking, and filter augmentation were applied to features, mix up was applied differently to the strong label and the weak label, and time masking was applied only to the strong labeled data. The third is to feed features that give different noise to student models and teacher models through data augmentation.The weight of the student model was shared with the teacher model by injecting different feature noise so that it could converge to the global optical faster through consistency loss.
System characteristics
A Two-Stage Training Method For Dcase 2022 Challenge Task4
Li, Kang and Zheng, Xu and Song, Yan
University of Science and Technology of China, Hefei, China
Li_USTC_task4_SED_1 Li_USTC_task4_SED_2Li_USTC_task4_SED_3Li_USTC_task4_SED_4
A Two-Stage Training Method For Dcase 2022 Challenge Task4
Li, Kang and Zheng, Xu and Song, Yan
University of Science and Technology of China, Hefei, China
Abstract
The goal of DCASE 2022 CHALLENGE TASK4 is to evaluate systems for the detection of sound events using real data either weakly labeled or unlabeled, simulated data that is strongly labeled and external data. In this technical report, we present a two-stage learning strategy based method to explore synthetic strong data and real strong data (from AudioSet). Specifically, a CRNN model is used as the baseline SED system for this year’s challenge. According to different supervisory signals from weakly-labeled and strongly-labeled data, the frame-level and clip-level tasks (i.e. SED and Audio Tagging (AT)) are designed. In the first stage, the model is trained on weakly labeled, unlabeled and synthetic data with strong labels under the semi-supervised learning framework, i.e. Mean Teacher (MT). There are two types of MT, including frame-level MT and clip-level MT, corresponding to the subsets with different supervisory signals. In the second stage, a new model is trained using pseudo-labeling scheme, in which the pre-trained teacher model is utilized to provide the pseudo-label of the real weakly and unlabeled data. Furthermore, we explore the strongly labeled real data as external one in both stages. Results on the DCASE2022 Task4 validation set verify the effectiveness of our proposed method with PSDS1 and PSDS2 of 0.479 and 0.785, outperforming the baseline results of 0.351 and 0.552 respectively.
System characteristics
An Effective Consistency Regularization Training Based Mean Teacher Method For Sound Event Detection
Li, Yunlong1,2 and Hu, Ying1,2 and Zhu, Xiujuan1,2 and Xie, Yin1,2 and Hou, Shijing1,2 and Wang, Liusong1,2 and Chen, Zihao1,2 Wang, Mingyu1,2 and Fang, Wenjie1,2
1Xinjiang University, Urumqi, China2Key Laboratory of Signal Detection and Processing in Xinjiang, Urumqi, China
Li_XJU_task4_1 Li_XJU_task4_2Li_XJU_task4_3Li_XJU_task4_4
An Effective Consistency Regularization Training Based Mean Teacher Method For Sound Event Detection
Li, Yunlong1,2 and Hu, Ying1,2 and Zhu, Xiujuan1,2 and Xie, Yin1,2 and Hou, Shijing1,2 and Wang, Liusong1,2 and Chen, Zihao1,2 Wang, Mingyu1,2 and Fang, Wenjie1,2
1Xinjiang University, Urumqi, China2Key Laboratory of Signal Detection and Processing in Xinjiang, Urumqi, China
Abstract
This technical report describes the system we submitted to DCASE2021 Task4: Sound Event Detection in Domestic Environments. Specifically, we apply three main techniques to improve the performance of the official baseline system. Firstly, to improve the detection and classification ability of the CRNN model, we propose to add an auxiliary branch to the CRNN network. Consistency loss of mean teacher method is improved by auxiliary branch. Secondly, we propose to add an MDTC module to the CRNN network so that the receptive fields of the network can be adjusted according to the short-term and long-term correlation. Thirdly, several data-augmentation strategies are adopted to improve the generalization capability of the network. Experiments on the DCASE2022 Task4 validation dataset demonstrate the effectiveness of the techniques used in our system. As a result, the best PSDS1 is 0.408 and the best PSDS2 is 0.754.
System characteristics
A Hybrid System Of Sound Event Detection Transformer And Frame-Wise Model For Dcase 2022 Task 4
Li, Yiming1,2 and Guo, Zhifang1,2 and Ye, Zhirong1,2 and Wang , Xiangdong1,2 and Liu, Hong1 and Qian, Yueliang1 and Tao, Rui3 and Yan, Long3 and Ouchi, Kazushige3
1Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China 3Toshiba China R&D Center, Beijing, China
Li_ICT-TOSHIBA_task4_1 Li_ICT-TOSHIBA_task4_2Li_ICT-TOSHIBA_task4_3Li_ICT-TOSHIBA_task4_4
Judges’ award
A Hybrid System Of Sound Event Detection Transformer And Frame-Wise Model For Dcase 2022 Task 4
Li, Yiming1,2 and Guo, Zhifang1,2 and Ye, Zhirong1,2 and Wang , Xiangdong1,2 and Liu, Hong1 and Qian, Yueliang1 and Tao, Rui3 and Yan, Long3 and Ouchi, Kazushige3
1Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China 3Toshiba China R&D Center, Beijing, China
Abstract
In this technical report, we describe in detail our system for DCASE 2022 Task4. The system combines two considerably different models: an end-to-end Sound Event Detection Transformer (SEDT) and a frame-wise model (MLFL-CNN). The former is an event-wise model which learns event-level representations and predicts sound event categories and boundaries directly, while the latter is based on the widely-adopted frame-classification scheme, under which each frame is classified into event categories and event boundaries are obtained by post-processing such as thresholding and smoothing. For SEDT, self-supervised pre-training using unlabeled data is applied, and semi-supervised learning is adopted by using an online teacher, which is updated from the student model using the EMA strategy and generates reliable pseudo labels for weakly-labeled and unlabeled data. For the frame-wise model, the ICT-TOSHIBA system of DCASE 2021 Task 4 is used, which incorporates techniques such as focal loss and metric learning into a CNN model to form the MLFL-CNN model, adopts mean-teacher for semi-supervised learning, and uses a tag-condition CNN model to predict final results using the output of MLFL-CNN. Experimental results show that the hybrid system considerably outperforms either individual model, and achieves psds1 of 0.420 and psds2 of 0.783 on the validation set without external data. The code is available at https://github.com/965694547/Hybrid-system-of-frame-wise-model-and-SEDT.
Awards: Judges’ award
System characteristics
Dcase 2022 Challenge Task4 Technical Report
Chen, Minjun1 and Wang, Tian1 and Shao, Jun1 and Tang, Yiqi1 and Liu, Yangyang1 and Peng, Bo1 and Chen, Jie1 and Shao, Xi2
1Samsung Research China-Nanjing, Nanjing, China 2College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China
Liu_SRCN_task4_1 Liu_SRCN_task4_2Liu_SRCN_task4_3Liu_SRCN_task4_4
Dcase 2022 Challenge Task4 Technical Report
Chen, Minjun1 and Wang, Tian1 and Shao, Jun1 and Tang, Yiqi1 and Liu, Yangyang1 and Peng, Bo1 and Chen, Jie1 and Shao, Xi2
1Samsung Research China-Nanjing, Nanjing, China 2College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China
Abstract
We describe our submitted systems for DCASE2022 Task4 in this technical report: Sound Event Detection in Domestic Environments. We propose three models to solve this problem. In the first model, we try to utilize all the training data provided. To be specific, firstly, we employ a joint model both for event classification and location based on strongly labeled data and weakly labeled data to propagate the clip level annotations on the unlabeled dataset, which is so called pseudo-label dataset. In order to link frame level strongly annotations with the weakly annotations, we introduce weighted average pooling scheme. Finally, the joint model trained on strongly labeled data, weakly labeled data and pseudo-label data are employed to solve the Task 4 problem. To utilize the external dataset and pre-trained model, we proposal a system which use pre-trained model to extract embedding, and to train a RNN decode to generate prediction finally. And the third system with some data augmentation methods based on the baseline CRNN. Our proposed systems achieve poly-phonic sound event detection scores (PSDS-scores) of 0.4428 (PSDS1) and 0.8266 (PSDS-scenario2) respectively on development dataset.
System characteristics
Mizobuchi Pco Team’s Submission For Dcase2022 Task4 -- Sound Event Detection Using External Resources
Mizobuchi, Shohei and Ohashi, Hiromasa and Izumi, Akitoshi and Kodama, Nobutaka
Advanced Research Lab., R&D Division, Panasonic Connect Co., Ltd., Fukuoka, Japan
Mizobuchi_PCO_task4_1 Mizobuchi_PCO_task4_2Mizobuchi_PCO_task4_3Mizobuchi_PCO_task4_4
Mizobuchi Pco Team’s Submission For Dcase2022 Task4 -- Sound Event Detection Using External Resources
Mizobuchi, Shohei and Ohashi, Hiromasa and Izumi, Akitoshi and Kodama, Nobutaka
Advanced Research Lab., R&D Division, Panasonic Connect Co., Ltd., Fukuoka, Japan
Abstract
In this Technical report, we describe an overview and performance of the system we submitted for DCASE 2022 Task 4. We submitted the following 4 systems. System 1 is aimed to improve the performance of PSDS1 under the condition that external resources are not used. System 2 uses AudioSet as additional training dataset on System 1. System 3 uses System 1 with additional training dataset including not only AudioSet dataset but also synthetic dataset generated by ourselves, and changes the training conditions to improve the performance of PSDS2. System 4 adds PANNs pretrained model to System 3. The highest performance evaluated using “ development dataset ” in these systems is 0.4489 for PSDS1 and 0.8519 for PSDS2. Details will be described below.
System characteristics
Frequency Dependent Sound Event Detection For Dcase 2022 Challenge Task 4
Nam, Hyeonuk and Kim, Seong-Hu and Min, Deokki and Ko, Byeong-Yun and Choi, Seung-Deok and Park, Yong-Hwa
Korea Advanced Institute of Science and Technology, Daejeon, South Korea
Nam_KAIST_task4_1 Nam_KAIST_task4_2Nam_KAIST_task4_3Nam_KAIST_task4_4
Frequency Dependent Sound Event Detection For Dcase 2022 Challenge Task 4
Nam, Hyeonuk and Kim, Seong-Hu and Min, Deokki and Ko, Byeong-Yun and Choi, Seung-Deok and Park, Yong-Hwa
Korea Advanced Institute of Science and Technology, Daejeon, South Korea
Abstract
While many deep learning methods on other domains have been applied to sound event detection (SED), differences between original domains of the methods and SED have not been appropriately considered so far. As SED uses audio data with two dimensions (time and frequency) for input, thorough comprehension on these two dimensions is essential for application of methods from other domains on SED. Previous works proved that methods those address on frequency dimension are especially powerful in SED. By applying FilterAugment and frequency dynamic convolution those are frequency dependent methods proposed to enhance SED performance, our submitted models achieved best PSDS 1 of 0.4704 and best PSDS 2 of 0.8224.
System characteristics
SKATTN team’s submission for DCASE 2022 Task 4 -- Sound Event Detection in Domestic Environments
Ryu, Myeonghoon and Byun, Jeunghyun and Oh, Hongseok and Lee, Suji and Park, Han
Deeply Inc. Seoul, South Korea
skattn_task4_1 skattn_task4_2
SKATTN team’s submission for DCASE 2022 Task 4 -- Sound Event Detection in Domestic Environments
Ryu, Myeonghoon and Byun, Jeunghyun and Oh, Hongseok and Lee, Suji and Park, Han
Deeply Inc. Seoul, South Korea
Abstract
In this technical report, we present our submitted system for DCASE 2022 Task4: Sound Event Detection in Domestic Environments. There are two main aspects we considered to improve the performance of the official baseline system: (1) use of external datasets (2) designing a novel model SKATTN. Our newly proposed SKATNN model combines Selective Kernel Network (SKNet) with the self-attention blocks from the Transformer model. Motivated from the SKNet’s successful applications in Computer Vision and Audio domains, we adopted SKNet as a feature extractor for processing the input mel-spectrogram. We used self-attention blocks to process the spectro-temporal features since they are flexible in modeling short and long-range dependencies while being less susceptible to vanishing gradients which commonly occur in RNNs. Experiments on DCASE2022 task 4 validation dataset demonstrate that our system achieves PSDS1 + PSDS2 = 1.372 on the validation dataset, outperforming 0.872 of the baseline system.
System characteristics
Atst Self-Supervised Plus Rct Semi-Supervised Sound Event Detection: Submission To Dcase 2022 Challenge Task 4
Shao, Nian and Li, Xian and Li, Xiaofei
Westlake University & Westlake Institute for Advanced Study, Hangzhou, China
RCT-ATST_Westlake_task4_1 RCT-ATST_Westlake_task4_2RCT-ATST_Westlake_task4_3RCT-ATST_Westlake_task4_4
Atst Self-Supervised Plus Rct Semi-Supervised Sound Event Detection: Submission To Dcase 2022 Challenge Task 4
Shao, Nian and Li, Xian and Li, Xiaofei
Westlake University & Westlake Institute for Advanced Study, Hangzhou, China
Abstract
In this report, we present our methods proposed for participating the Detection and Classification of Acoustic Scenes and Events (DCASE) 2022 Challenge Task 4: Sound Event Detection in Domestic Environments. The proposed methods integrate a semi-supervised sound event detection model (called random consistency training, RCT) trained with the relatively small official dataset of the challenge, and a self-supervised model (called audio teacher-student transformer, ATST) trained with the very large AudioSet. RCT uses the baseline convolutional recurrent neural network (CRNN) of the challenge, and adopts a newly proposed semi-supervised learning scheme based on random data augmentation and a self-consistency loss. To integrate ATST into RCT, the feature extracted by ATST is concatenated with the feature extracted by the convolutional layers of RCT, and then fed to the RNN layers of RCT. It is found that these two types of feature are complementary and the performance can be largely improved by combining them. In development, RCT individually achieves 39.80% and 61.12% of P SDS 1 and P SDS 2 , respectively, which are improved to 45.99% and 70.65% by integrating the ATST feature, and further to 47.71% and 73.44% by ensembling five models with different training configurations.
System characteristics
Hyu Submission For Dcase 2022 Task 4 -- Pa-Net: Patch-Based Attention For Sound Event Detection
Kim, Sojeong
Hanyang University, Seoul, Korea
KIM_HYU_task4_1 KIM_HYU_task4_2KIM_HYU_task4_3KIM_HYU_task4_4
Hyu Submission For Dcase 2022 Task 4 -- Pa-Net: Patch-Based Attention For Sound Event Detection
Kim, Sojeong
Hanyang University, Seoul, Korea
Abstract
In this paper, we describe details about submitted systems for DCASE 2022 challenge task 4: sound event detection in domestic environments. We focus on how to effectively use a spectrogram as input for SED model since it has different time-frequency characteristics. Frequencies have various characteristics for some reasons like recording devices and type of sound event. Specifically, each time frame has different features from each other due to uncertainty on whether any sound event may happen or not in an audio clip and what type of sound event. Therefore, we propose a patch attention(PA) mechanism capturing patch-range dependencies across input sequences so that the model can learn by training with important local information. We use PA with efficient channel attention for learning important channels in feature maps. In addition, we adopt a strategy called subspectral normalization (SSN), which split the input frequencies into multiple sub-groups and normalizes each group to stand out specific features. Experiments result on the DESED 2022 validation dataset show that our proposed model outperforms the baseline system. Particularly, our model demonstrates improvement in performance on PSDS scores of 0.4438 and 0.683 on scenario1 and scenario2 respectively.
System characteristics
Data Engineering For Noisy Student Model In Sound Event Detection
Suh, Sangwon and Lee, Dong Youn
ReturnZero, Seoul, Korea
Suh_ReturnZero_task4_1 Suh_ReturnZero_task4_2Suh_ReturnZero_task4_3 Suh_ReturnZero_task4_4
Data Engineering For Noisy Student Model In Sound Event Detection
Suh, Sangwon and Lee, Dong Youn
ReturnZero, Seoul, Korea
Abstract
This report describes the Sound Event Detection (SED) system for DCASE2022 Task4. We focused on combining data augmentation techniques for the SED mean-teacher system and selecting trainable samples from AudioSet. The neural architecture follows the baseline CRNN model, but a frequency dynamic convolution replaces each convolution layer except the first one. The cost function was also constructed identically to the baseline, but an asymmetric focal loss was used instead of binary cross-entropy for training the AudioSet. The best metrics in the validation set of our experiments were 0.473, 0.723 for PSDS 1 and 2, and 56.9% for color-based F1 scores.
System characteristics
Pretrained Models In Sound Event Detection For Dcase 2022 Challenge Task4
Xiao, Shengchang
University of Chinese Academy of Sciences, department of Electronic Engineering, Beijing, China
Xiao_UCAS_task4_1 Xiao_UCAS_task4_2Xiao_UCAS_task4_3 Xiao_UCAS_task4_4
Pretrained Models In Sound Event Detection For Dcase 2022 Challenge Task4
Xiao, Shengchang
University of Chinese Academy of Sciences, department of Electronic Engineering, Beijing, China
Abstract
In this technical report, we describe our submitted systems for dcase 2022 Challenge Task4: Sound Event Detection in Domestic Environments. Specifically, we submit two different systems respectively for PSDS1 and PSDS2. As PSDS2 focuses on avoiding confusion between classes rather than the localization of sound events, we only predict weak labels of clips to improve PSDS2. Moreover, we apply the pretrained neural networks including PANNs and SSAST in our systems to improve the generalization and robustness of our models. These pretrained models trained on large-scale datasets such as audioset can effectively alleviate the problems of lack of real training data. We fuse multiple pretrained models to make full use of the information of external data, which significantly improve the performance of our systems. In addition, we use various data augmentation techniques to expand provided data. According to the character of each sound event, we use the classwise median filter and further classify some confusing events. As a result, we achieve the best PSDS1 of of 0.481 and best PSDS2 of 0.826 on the DESED real validation dataset.
System characteristics
Semi-Supervised Sound Event Detection Using Pretrained Model
Xie, Rong and Shi, Chuang and Zhang, Le and Li, Huiyong
School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China.
RongXie_UESTC_task4_1 RongXie_UESTC_task4_2RongXie_UESTC_task4_3 RongXie_UESTC_task4_4
Semi-Supervised Sound Event Detection Using Pretrained Model
Xie, Rong and Shi, Chuang and Zhang, Le and Li, Huiyong
School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China.
Abstract
In this technical report, submitted systems for DCASE 2022 Task4 are described. Early output embeddings of CNN14 in PANNs with a CRNN is designed to achieve a good performance on PSDS-scenario1. The fully connected (FC) layer of CNN14 is replaced by output 10 categories for PSDS-scenario 2. Submitted systms achieve an overall PSDS-scores of 1.31 (0.460 for PSDS scenario 1 and 0.856 for PSDS scenario 2) on test set.
System characteristics
Srcb-Bit Team’s Submission For Dcase2022 Task4
Xu, Liang1,2 and Wang, Lizhong2 and Bi, Sijun1 and Liu, Hanyue1 and Wang, Jing1 and Zhao, Shenghui1 and Zheng, Yuxing2
1School of Information and Electronics, Beijing Institute of Technology, Beijing, China 2Samsung Research China-Beijing (SRC-B), Beijing, China
Xu_SRCB-BIT_task4_1 Xu_SRCB-BIT_task4_2Xu_SRCB-BIT_task4_3 Xu_SRCB-BIT_task4_4
Srcb-Bit Team’s Submission For Dcase2022 Task4
Xu, Liang1,2 and Wang, Lizhong2 and Bi, Sijun1 and Liu, Hanyue1 and Wang, Jing1 and Zhao, Shenghui1 and Zheng, Yuxing2
1School of Information and Electronics, Beijing Institute of Technology, Beijing, China 2Samsung Research China-Beijing (SRC-B), Beijing, China
Abstract
In this technical report, we present our submitted system for DCASE2022 Task4: Sound Event Detection in Domestic Environments. We propose three main ways to improve the performance of the network. First, we use the frequency dynamic convolution (FDY) which applies kernel that adapts to frequency components of input to improve physical inconsistency in 2D convolution on sound event detection (SED). Then, we propose a weight raised temporal contrastive loss based coherence learning to improve the continuity of event prediction and the switching efficiency of event boundaries. Third, we use pre-trained model PANNS in this task and propose two methods to fuse the features from PANNs and our model which improve the PSDS1 and PSDS2 score respectively. The system we submitted is based on the mean-teacher architecture, and the PSDS1 and PSDS2 score on the development dataset can reach 0.482 and 0.835 respectively.