Sound Event Detection in Domestic Environments


Challenge results

Task description

The task evaluates systems for the detection of sound events using weakly labeled data (without timestamps). The target of the systems is to provide not only the event class but also the event time boundaries given that multiple events can be present in an audio recording. The challenge of exploring the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly annotated training set to improve system performance remains. Isolated sound events, background sound files and scripst to design a training set with strongly annotated synthetic data are provided. The labels in all the annotated subsets are verified and can be considered as reliable.

More detailed task description can be found in the task description page

Systems ranking

Rank Submission
code
Submission
name
Technical
Report

Ranking score
(Evaluation dataset)

PSDS 1
(Evaluation dataset)

PSDS 2
(Evaluation dataset)

PSDS 1
(Development dataset)

PSDS 2
(Development dataset)
Zhang_UCAS_task4_2 DCASE2022 pretrained system 2 Xiao2022 1.41 0.484 0.697 0.481 0.694
Zhang_UCAS_task4_1 DCASE2022 pretrained system 1 Xiao2022 1.39 0.472 0.700 0.475 0.688
Zhang_UCAS_task4_3 DCASE2022 base system Xiao2022 1.21 0.420 0.599 0.431 0.645
Zhang_UCAS_task4_4 DCASE2022 weak_pred system Xiao2022 0.79 0.049 0.784 0.051 0.826
Liu_NSYSU_task4_2 DCASE2022 PANNs SED 2 Liu2022 0.06 0.000 0.063 0.451 0.734
Liu_NSYSU_task4_3 DCASE2022 PANNs SED 3 Liu2022 0.29 0.070 0.194 0.457 0.767
Huang_NSYSU_task4_1 DCASE2022 KDmt SED Huang2022 1.28 0.434 0.650 0.437 0.680
Liu_NSYSU_task4_4 DCASE2022 PANNs SED 4 Liu2022 0.21 0.046 0.151 0.465 0.760
Suh_ReturnZero_task4_1 rtzr_dev-only Suh2022 1.22 0.393 0.650
Suh_ReturnZero_task4_4 rtzr_weak-SED Suh2022 0.81 0.062 0.774 0.063 0.814
Suh_ReturnZero_task4_2 rtzr_strong-real Suh2022 1.39 0.458 0.721 0.473 0.723
Suh_ReturnZero_task4_3 rtzr_audioset Suh2022 1.42 0.478 0.719 0.445 0.704
Cheng_CHT_task4_2 DCASE2022_CRNN_ADJ Cheng2022 0.93 0.276 0.543 0.356 0.601
Cheng_CHT_task4_1 DCASE2022_CRNN_IMP Cheng2022 1.03 0.314 0.582 0.362 0.635
Liu_SRCN_task4_2 DCASE2022 task4 Pre-Trained 2 Liu2022 0.90 0.129 0.758 0.177 0.801
Liu_SRCN_task4_1 DCASE2022 task4 Pre-Trained 1 Liu2022 0.79 0.051 0.777 0.067 0.827
Liu_SRCN_task4_4 DCASE2022 task4 without external data Liu2022 0.24 0.025 0.219 0.037 0.244
Liu_SRCN_task4_3 DCASE2022 task4 AudioSet strong Liu2022 1.25 0.425 0.634 0.443 0.660
Kim_LGE_task4_1 DCASE2022 Kim system 1 Kim2022a 1.34 0.444 0.697 0.473 0.693
Kim_LGE_task4_3 DCASE2022 Kim system 3 Kim2022a 0.81 0.062 0.781 0.068 0.830
Kim_LGE_task4_4 DCASE2022 Kim system 4 Kim2022a 1.17 0.305 0.750 0.354 0.756
Kim_LGE_task4_2 DCASE2022 Kim system 2 Kim2022a 1.34 0.444 0.695 0.473 0.695
Ryu_Deeply_task4_1 SKATTN_1 Ryu2022 0.83 0.257 0.461 0.269 0.446
Ryu_Deeply_task4_2 SKATTN_2 Ryu2022 0.66 0.156 0.449 0.161 0.452
Giannakopoulos_UNIPI_task4_2 Multi-Task Learning using Variational AutoEncoders Giannakopoulos2022 0.21 0.029 0.184 0.046 0.165
Giannakopoulos_UNIPI_task4_1 Multi-Task Learning using Variational AutoEncoders Giannakopoulos2022 0.35 0.104 0.196 0.129 0.241
Mizobuchi_PCO_task4_4 PCO_task4_SED_D Mizobuchi2022 0.82 0.062 0.787 0.075 0.852
Mizobuchi_PCO_task4_2 PCO_task4_SED_B Mizobuchi2022 1.26 0.439 0.611 0.449 0.662
Mizobuchi_PCO_task4_3 PCO_task4_SED_C Mizobuchi2022 0.88 0.197 0.620 0.231 0.714
Mizobuchi_PCO_task4_1 PCO_task4_SED_A Mizobuchi2022 1.15 0.398 0.571 0.425 0.625
KIM_HYU_task4_2 single1 Sojeong2022 1.28 0.421 0.664 0.422 0.667
KIM_HYU_task4_4 single2 Sojeong2022 1.27 0.423 0.651 0.480 0.726
KIM_HYU_task4_1 train_ensemble1 Sojeong2022 1.19 0.390 0.620 0.434 0.675
KIM_HYU_task4_3 train_ensemble2 Sojeong2022 1.24 0.415 0.634 0.494 0.748
Baseline DCASE2022 SED baseline system Turpault2022 1.00 0.315 0.543 0.342 0.527
Dinkel_XiaoRice_task4_1 SCRATCH Dinkel2022 1.29 0.422 0.679 0.456 0.713
Dinkel_XiaoRice_task4_2 SMALL Dinkel2022 1.15 0.373 0.613 0.395 0.631
Dinkel_XiaoRice_task4_4 TAG Dinkel2022 0.92 0.104 0.824 0.126 0.877
Dinkel_XiaoRice_task4_3 PRECISE Dinkel2022 1.38 0.451 0.727 0.482 0.757
Hao_UNISOC_task4_2 SUBMISSION FOR DCASE2022 TASK4 Hao2022 0.78 0.078 0.723 0.448 0.700
Hao_UNISOC_task4_1 SUBMISSION FOR DCASE2022 TASK4 Hao2022 1.24 0.425 0.615 0.448 0.700
Hao_UNISOC_task4_3 SUBMISSION FOR DCASE2022 TASK4 Hao2022 1.09 0.373 0.547 0.448 0.700
Khandelwal_FMSG-NTU_task4_1 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 0.83 0.158 0.633 0.088 0.837
Khandelwal_FMSG-NTU_task4_2 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 0.80 0.082 0.731 0.102 0.840
Khandelwal_FMSG-NTU_task4_3 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 1.26 0.410 0.664 0.472 0.721
Khandelwal_FMSG-NTU_task4_4 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 1.20 0.386 0.643 0.474 0.730
deBenito_AUDIAS_task4_4 7-Resolution CRNN+Conformer with class-wise median filtering deBenito2022 1.28 0.432 0.649 0.428 0.655
deBenito_AUDIAS_task4_1 10-Resolution CRNN+Conformer deBenito2022 1.23 0.400 0.646 0.410 0.665
deBenito_AUDIAS_task4_2 10-Resolution CRNN+Conformer with class-wise median filtering deBenito2022 1.08 0.310 0.642 0.347 0.663
deBenito_AUDIAS_task4_3 7-Resolution CRNN+Conformer deBenito2022 1.23 0.407 0.643 0.422 0.656
Li_WU_task4_4 ATST-RCT SED system ATST ensemble Shao2022 1.41 0.486 0.694 0.477 0.734
Li_WU_task4_2 ATST-RCT SED system ATST small Shao2022 1.36 0.476 0.666 0.460 0.698
Li_WU_task4_3 ATST-RCT SED system ATST base Shao2022 1.40 0.482 0.693 0.468 0.702
Li_WU_task4_1 ATST-RCT SED system CRNN with RCT Shao2022 1.13 0.368 0.594 0.398 0.611
Kim_GIST_task4_3 Kim_GIST_task4_3 Kim2022b 1.43 0.500 0.695 0.452 0.682
Kim_GIST_task4_1 Kim_GIST_task4_1 Kim2022b 1.47 0.514 0.713 0.458 0.688
Kim_GIST_task4_2 Kim_GIST_task4_2 Kim2022b 1.46 0.510 0.711 0.456 0.685
Kim_GIST_task4_4 Kim_GIST_task4_4 Kim2022b 0.65 0.215 0.335 0.459 0.744
Ebbers_UPB_task4_4 CRNN ensemble w/o external data Ebbers2022 1.49 0.509 0.742 0.492 0.721
Ebbers_UPB_task4_2 FBCRNN ensemble Ebbers2022 0.83 0.047 0.824 0.080 0.868
Ebbers_UPB_task4_1 CRNN ensemble Ebbers2022 1.59 0.552 0.786 0.512 0.772
Ebbers_UPB_task4_3 tag-conditioned CRNN ensemble Ebbers2022 1.46 0.527 0.679 0.483 0.713
Xu_SRCB-BIT_task4_2 PANNs-FDY-CRNN-wrTCL system 2 Xu2022 1.41 0.482 0.702 0.485 0.725
Xu_SRCB-BIT_task4_1 PANNs-FDY-CRNN-wrTCL system 1 Xu2022 1.32 0.452 0.662 0.481 0.710
Xu_SRCB-BIT_task4_3 PANNs-FDY-CRNN-weak train Xu2022 0.79 0.054 0.774 0.065 0.835
Xu_SRCB-BIT_task4_4 FDY-CRNN-weak train Xu2022 0.75 0.049 0.738 0.058 0.813
Nam_KAIST_task4_SED_2 SED_2 Nam2022 1.25 0.409 0.656 0.470 0.700
Nam_KAIST_task4_SED_3 SED_3 Nam2022 0.77 0.057 0.747 0.061 0.822
Nam_KAIST_task4_SED_4 SED_4 Nam2022 0.77 0.055 0.747 0.058 0.820
Nam_KAIST_task4_SED_1 SED_1 Nam2022 1.24 0.404 0.653 0.470 0.687
Blakala_SRPOL_task4_3 Blakala_SRPOL_task4_3 Blakala2022 0.95 0.293 0.527 0.341 0.596
Blakala_SRPOL_task4_1 Blakala_SRPOL_task4_1 Blakala2022 1.11 0.365 0.584 0.374 0.583
Blakala_SRPOL_task4_2 Blakala_SRPOL_task4_2 Blakala2022 0.78 0.069 0.728 0.070 0.794
Li_USTC_task4_SED_1 Mean teacher Pseudo labeling system 1 Li2022b 1.41 0.480 0.713 0.479 0.735
Li_USTC_task4_SED_4 Mean teacher Pseudo labeling system 4 Li2022b 1.34 0.429 0.723 0.436 0.778
Li_USTC_task4_SED_2 Mean teacher Pseudo labeling system 2 Li2022b 1.39 0.451 0.740 0.462 0.785
Li_USTC_task4_SED_3 Mean teacher Pseudo labeling system 3 Li2022b 1.35 0.450 0.699 0.456 0.726
Bertola_UPF_task4_1 DCASE2022 baseline system Bertola2022 0.98 0.318 0.520 0.356 0.554
He_BYTEDANCE_task4_4 DCASE2022 SED mean teacher system 4 He2022 0.82 0.053 0.810 0.071 0.857
He_BYTEDANCE_task4_2 DCASE2022 SED mean teacher system 2 He2022 1.48 0.503 0.749 0.521 0.771
He_BYTEDANCE_task4_3 DCASE2022 SED mean teacher system 3 He2022 1.52 0.525 0.748 0.533 0.762
He_BYTEDANCE_task4_1 DCASE2022 SED mean teacher system 1 He2022 1.36 0.454 0.696 0.474 0.692
Li_ICT-TOSHIBA_task4_2 Hybrid system of SEDT and frame-wise model Li2022d 0.79 0.090 0.709 0.115 0.816
Li_ICT-TOSHIBA_task4_4 Hybrid system of SEDT and frame-wise model Li2022d 0.75 0.075 0.692 0.099 0.783
Li_ICT-TOSHIBA_task4_1 Hybrid system of SEDT and frame-wise model Li2022d 1.26 0.439 0.612 0.449 0.645
Li_ICT-TOSHIBA_task4_3 Hybrid system of SEDT and frame-wise model Li2022d 1.20 0.411 0.597 0.420 0.618
Xie_UESTC_task4_2 CNN14 FC Xie2022 0.83 0.062 0.800 0.072 0.856
Xie_UESTC_task4_3 CBAM-T CRNN scratch Xie2022 1.06 0.300 0.641 0.360 0.674
Xie_UESTC_task4_1 CBAM-T CRNN 1 Xie2022 1.36 0.418 0.757 0.460 0.768
Xie_UESTC_task4_4 CBAM-T CRNN 2 Xie2022 1.38 0.426 0.766 0.460 0.768
Baseline (AudioSet) DCASE2022 SED baseline system (AudioSet) Ronchini2022 1.04 0.345 0.540 0.342 0.527
Kim_CAUET_task4_1 DCASE2022 SED system1 Kim2022c 1.02 0.317 0.565 0.372 0.592
Kim_CAUET_task4_2 DCASE2022 SED system2 Kim2022c 1.04 0.340 0.544 0.377 0.585
Kim_CAUET_task4_3 DCASE2022 SED system3 Kim2022c 1.04 0.338 0.554 0.373 0.571
Li_XJU_task4_1 DCASE2022 SED system 1 Li2022c 1.10 0.364 0.570 0.408 0.607
Li_XJU_task4_3 DCASE2022 SED system 3 Li2022c 1.17 0.371 0.635 0.398 0.640
Li_XJU_task4_4 DCASE2022 SED system 4 Li2022c 0.93 0.195 0.683 0.215 0.735
Li_XJU_task4_2 DCASE2022 SED system 2 Li2022c 0.75 0.086 0.671 0.095 0.754
Castorena_UV_task4_3 Strong and Max-Weak balanced Castorena2022 0.91 0.267 0.531 0.305 0.587
Castorena_UV_task4_1 Max-Weak balanced Castorena2022 1.01 0.334 0.524 0.343 0.538
Castorena_UV_task4_2 Avg-Weak balanced Castorena2022 0.63 0.072 0.559 0.067 0.641

Supplementary metrics

Rank Submission
code
Submission
name
Technical
Report
PSDS 1
(Evaluation dataset)
PSDS 1
(Public evaluation)
PSDS 1
(Vimeo dataset)
PSDS 2
(Evaluation dataset)
PSDS 2
(Public evaluation)
PSDS 2
(Vimeo dataset)
F-score
(Evaluation dataset)
F-score
(Public evaluation)
F-score
(Vimeo dataset)
Zhang_UCAS_task4_2 DCASE2022 pretrained system 2 Xiao2022 0.484 0.525 0.396 0.697 0.725 0.612 56.5 60.2 47.4
Zhang_UCAS_task4_1 DCASE2022 pretrained system 1 Xiao2022 0.472 0.519 0.384 0.700 0.748 0.577 56.2 61.3 44.0
Zhang_UCAS_task4_3 DCASE2022 base system Xiao2022 0.420 0.468 0.304 0.599 0.649 0.470 51.3 55.4 40.1
Zhang_UCAS_task4_4 DCASE2022 weak_pred system Xiao2022 0.049 0.057 0.019 0.784 0.836 0.651 15.0 17.0 10.7
Liu_NSYSU_task4_2 DCASE2022 PANNs SED 2 Liu2022 0.000 0.003 0.000 0.063 0.077 0.024 10.5 11.8 6.8
Liu_NSYSU_task4_3 DCASE2022 PANNs SED 3 Liu2022 0.070 0.095 0.013 0.194 0.237 0.087 8.3 9.2 5.9
Huang_NSYSU_task4_1 DCASE2022 KDmt SED Huang2022 0.434 0.483 0.324 0.650 0.702 0.521 47.6 50.7 39.3
Liu_NSYSU_task4_4 DCASE2022 PANNs SED 4 Liu2022 0.046 0.069 0.003 0.151 0.180 0.070 7.5 8.3 5.1
Suh_ReturnZero_task4_1 rtzr_dev-only Suh2022 0.393 0.432 0.324 0.650 0.686 0.560 46.8 50.0 38.7
Suh_ReturnZero_task4_4 rtzr_weak-SED Suh2022 0.062 0.072 0.026 0.774 0.807 0.674 12.9 13.9 10.6
Suh_ReturnZero_task4_2 rtzr_strong-real Suh2022 0.458 0.495 0.370 0.721 0.768 0.612 53.1 57.6 42.1
Suh_ReturnZero_task4_3 rtzr_audioset Suh2022 0.478 0.512 0.390 0.719 0.772 0.592 53.8 57.7 44.1
Cheng_CHT_task4_2 DCASE2022_CRNN_ADJ Cheng2022 0.276 0.308 0.212 0.543 0.568 0.470 40.9 43.5 34.3
Cheng_CHT_task4_1 DCASE2022_CRNN_IMP Cheng2022 0.314 0.361 0.223 0.582 0.611 0.497 43.2 46.7 34.5
Liu_SRCN_task4_2 DCASE2022 task4 Pre-Trained 2 Liu2022 0.129 0.139 0.100 0.758 0.791 0.682 19.3 20.0 17.9
Liu_SRCN_task4_1 DCASE2022 task4 Pre-Trained 1 Liu2022 0.051 0.063 0.015 0.777 0.803 0.696 13.6 14.3 12.0
Liu_SRCN_task4_4 DCASE2022 task4 without external data Liu2022 0.025 0.023 0.011 0.219 0.224 0.183 5.2 5.8 3.6
Liu_SRCN_task4_3 DCASE2022 task4 AudioSet strong Liu2022 0.425 0.471 0.319 0.634 0.674 0.512 49.3 52.2 41.6
Kim_LGE_task4_1 DCASE2022 Kim system 1 Kim2022a 0.444 0.503 0.323 0.697 0.740 0.588 51.0 54.8 41.0
Kim_LGE_task4_3 DCASE2022 Kim system 3 Kim2022a 0.062 0.069 0.030 0.781 0.809 0.691 12.8 13.5 11.4
Kim_LGE_task4_4 DCASE2022 Kim system 4 Kim2022a 0.305 0.333 0.234 0.750 0.778 0.683 27.4 28.5 25.1
Kim_LGE_task4_2 DCASE2022 Kim system 2 Kim2022a 0.444 0.502 0.334 0.695 0.738 0.585 51.1 55.0 41.0
Ryu_Deeply_task4_1 SKATTN_1 Ryu2022 0.257 0.280 0.207 0.461 0.514 0.345 30.5 32.8 25.1
Ryu_Deeply_task4_2 SKATTN_2 Ryu2022 0.156 0.171 0.129 0.449 0.477 0.356 19.3 20.0 18.3
Giannakopoulos_UNIPI_task4_2 Multi-Task Learning using Variational AutoEncoders Giannakopoulos2022 0.029 0.033 0.015 0.184 0.214 0.102 9.4 10.2 7.2
Giannakopoulos_UNIPI_task4_1 Multi-Task Learning using Variational AutoEncoders Giannakopoulos2022 0.104 0.121 0.048 0.196 0.216 0.130 26.5 29.4 19.0
Mizobuchi_PCO_task4_4 PCO_task4_SED_D Mizobuchi2022 0.062 0.071 0.029 0.787 0.818 0.693 13.7 14.5 11.5
Mizobuchi_PCO_task4_2 PCO_task4_SED_B Mizobuchi2022 0.439 0.489 0.324 0.611 0.656 0.498 49.7 53.0 40.4
Mizobuchi_PCO_task4_3 PCO_task4_SED_C Mizobuchi2022 0.197 0.218 0.164 0.620 0.660 0.517 21.8 24.7 15.3
Mizobuchi_PCO_task4_1 PCO_task4_SED_A Mizobuchi2022 0.398 0.450 0.285 0.571 0.617 0.452 47.6 50.4 39.5
KIM_HYU_task4_2 single1 Sojeong2022 0.421 0.470 0.314 0.664 0.724 0.524 49.6 53.4 39.8
KIM_HYU_task4_4 single2 Sojeong2022 0.423 0.476 0.308 0.651 0.707 0.509 50.4 55.2 38.0
KIM_HYU_task4_1 train_ensemble1 Sojeong2022 0.390 0.437 0.284 0.620 0.678 0.488 48.1 52.5 37.3
KIM_HYU_task4_3 train_ensemble2 Sojeong2022 0.415 0.467 0.299 0.634 0.698 0.486 48.1 52.8 36.2
Baseline DCASE2022 SED baseline system Turpault2022 0.315 0.360 0.222 0.543 0.591 0.403 37.3 40.8 29.7
Dinkel_XiaoRice_task4_1 SCRATCH Dinkel2022 0.422 0.480 0.298 0.679 0.737 0.528 45.6 49.2 36.1
Dinkel_XiaoRice_task4_2 SMALL Dinkel2022 0.373 0.421 0.250 0.613 0.663 0.459 39.3 42.9 29.6
Dinkel_XiaoRice_task4_4 TAG Dinkel2022 0.104 0.119 0.086 0.824 0.855 0.736 14.2 14.9 12.5
Dinkel_XiaoRice_task4_3 PRECISE Dinkel2022 0.451 0.505 0.325 0.727 0.773 0.605 47.5 51.0 38.3
Hao_UNISOC_task4_2 SUBMISSION FOR DCASE2022 TASK4 Hao2022 0.078 0.091 0.028 0.723 0.772 0.603 10.8 11.5 9.5
Hao_UNISOC_task4_1 SUBMISSION FOR DCASE2022 TASK4 Hao2022 0.425 0.475 0.322 0.615 0.669 0.490 47.1 50.9 36.8
Hao_UNISOC_task4_3 SUBMISSION FOR DCASE2022 TASK4 Hao2022 0.373 0.426 0.249 0.547 0.606 0.400 45.3 48.7 36.4
Khandelwal_FMSG-NTU_task4_1 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 0.158 0.182 0.126 0.633 0.678 0.521 20.3 21.7 17.1
Khandelwal_FMSG-NTU_task4_2 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 0.082 0.093 0.033 0.731 0.762 0.645 13.1 13.8 11.7
Khandelwal_FMSG-NTU_task4_3 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 0.410 0.457 0.310 0.664 0.718 0.531 50.3 54.6 39.4
Khandelwal_FMSG-NTU_task4_4 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 0.386 0.428 0.305 0.643 0.686 0.531 44.7 48.5 35.0
deBenito_AUDIAS_task4_4 7-Resolution CRNN+Conformer with class-wise median filtering deBenito2022 0.432 0.480 0.324 0.649 0.691 0.537 46.5 51.0 35.6
deBenito_AUDIAS_task4_1 10-Resolution CRNN+Conformer deBenito2022 0.400 0.447 0.299 0.646 0.694 0.528 45.0 49.3 34.5
deBenito_AUDIAS_task4_2 10-Resolution CRNN+Conformer with class-wise median filtering deBenito2022 0.310 0.350 0.237 0.642 0.689 0.525 37.7 41.5 28.5
deBenito_AUDIAS_task4_3 7-Resolution CRNN+Conformer deBenito2022 0.407 0.454 0.303 0.643 0.686 0.528 46.5 50.6 36.4
Li_WU_task4_4 ATST-RCT SED system ATST ensemble Shao2022 0.486 0.535 0.378 0.694 0.740 0.589 51.8 55.1 43.8
Li_WU_task4_2 ATST-RCT SED system ATST small Shao2022 0.476 0.524 0.377 0.666 0.713 0.555 51.6 56.1 41.0
Li_WU_task4_3 ATST-RCT SED system ATST base Shao2022 0.482 0.533 0.372 0.693 0.740 0.584 51.8 55.1 43.8
Li_WU_task4_1 ATST-RCT SED system CRNN with RCT Shao2022 0.368 0.409 0.283 0.594 0.644 0.474 45.0 49.0 35.5
Kim_GIST_task4_3 Kim_GIST_task4_3 Kim2022b 0.500 0.551 0.383 0.695 0.738 0.582 55.3 57.6 49.3
Kim_GIST_task4_1 Kim_GIST_task4_1 Kim2022b 0.514 0.559 0.406 0.713 0.756 0.598 55.9 59.0 47.5
Kim_GIST_task4_2 Kim_GIST_task4_2 Kim2022b 0.510 0.555 0.399 0.711 0.752 0.599 55.5 58.8 46.9
Kim_GIST_task4_4 Kim_GIST_task4_4 Kim2022b 0.215 0.239 0.135 0.335 0.358 0.254 31.6 34.7 23.1
Ebbers_UPB_task4_4 CRNN ensemble w/o external data Ebbers2022 0.509 0.552 0.413 0.742 0.797 0.626 57.6 61.5 47.9
Ebbers_UPB_task4_2 FBCRNN ensemble Ebbers2022 0.047 0.055 0.025 0.824 0.866 0.734 11.8 12.4 10.7
Ebbers_UPB_task4_1 CRNN ensemble Ebbers2022 0.552 0.593 0.474 0.786 0.844 0.664 59.8 62.6 53.5
Ebbers_UPB_task4_3 tag-conditioned CRNN ensemble Ebbers2022 0.527 0.568 0.444 0.679 0.729 0.566 65.9 70.1 56.3
Xu_SRCB-BIT_task4_2 PANNs-FDY-CRNN-wrTCL system 2 Xu2022 0.482 0.533 0.354 0.702 0.756 0.582 55.0 58.4 46.2
Xu_SRCB-BIT_task4_1 PANNs-FDY-CRNN-wrTCL system 1 Xu2022 0.452 0.500 0.338 0.662 0.702 0.552 51.7 54.7 43.5
Xu_SRCB-BIT_task4_3 PANNs-FDY-CRNN-weak train Xu2022 0.054 0.064 0.022 0.774 0.799 0.699 13.1 14.2 10.7
Xu_SRCB-BIT_task4_4 FDY-CRNN-weak train Xu2022 0.049 0.057 0.018 0.738 0.771 0.637 12.8 13.5 11.4
Nam_KAIST_task4_SED_2 SED_2 Nam2022 0.409 0.450 0.329 0.656 0.695 0.554 48.9 51.4 42.3
Nam_KAIST_task4_SED_3 SED_3 Nam2022 0.057 0.068 0.021 0.747 0.770 0.668 12.5 13.6 10.3
Nam_KAIST_task4_SED_4 SED_4 Nam2022 0.055 0.066 0.016 0.747 0.770 0.673 12.7 13.6 10.9
Nam_KAIST_task4_SED_1 SED_1 Nam2022 0.404 0.446 0.317 0.653 0.686 0.558 49.8 52.7 42.4
Blakala_SRPOL_task4_3 Blakala_SRPOL_task4_3 Blakala2022 0.293 0.337 0.200 0.527 0.590 0.391 37.9 41.8 28.3
Blakala_SRPOL_task4_1 Blakala_SRPOL_task4_1 Blakala2022 0.365 0.395 0.289 0.584 0.621 0.494 39.5 43.2 30.5
Blakala_SRPOL_task4_2 Blakala_SRPOL_task4_2 Blakala2022 0.069 0.084 0.030 0.728 0.765 0.645 13.9 14.5 12.6
Li_USTC_task4_SED_1 Mean teacher Pseudo labeling system 1 Li2022b 0.480 0.541 0.347 0.713 0.760 0.585 55.1 59.9 42.8
Li_USTC_task4_SED_4 Mean teacher Pseudo labeling system 4 Li2022b 0.429 0.487 0.305 0.723 0.763 0.614 52.4 56.7 41.5
Li_USTC_task4_SED_2 Mean teacher Pseudo labeling system 2 Li2022b 0.451 0.514 0.320 0.740 0.776 0.634 53.8 58.2 42.8
Li_USTC_task4_SED_3 Mean teacher Pseudo labeling system 3 Li2022b 0.450 0.507 0.329 0.699 0.745 0.576 53.1 58.0 40.7
Bertola_UPF_task4_1 DCASE2022 baseline system Bertola2022 0.318 0.352 0.244 0.520 0.563 0.406 37.7 40.4 31.3
He_BYTEDANCE_task4_4 DCASE2022 SED mean teacher system 4 He2022 0.053 0.063 0.024 0.810 0.839 0.729 14.3 14.8 13.2
He_BYTEDANCE_task4_2 DCASE2022 SED mean teacher system 2 He2022 0.503 0.551 0.392 0.749 0.798 0.639 54.5 58.5 44.4
He_BYTEDANCE_task4_3 DCASE2022 SED mean teacher system 3 He2022 0.525 0.578 0.401 0.748 0.795 0.634 55.7 59.7 45.6
He_BYTEDANCE_task4_1 DCASE2022 SED mean teacher system 1 He2022 0.454 0.503 0.338 0.696 0.744 0.596 53.6 58.0 42.2
Li_ICT-TOSHIBA_task4_2 Hybrid system of SEDT and frame-wise model Li2022d 0.090 0.095 0.062 0.709 0.747 0.581 9.4 10.3 7.2
Li_ICT-TOSHIBA_task4_4 Hybrid system of SEDT and frame-wise model Li2022d 0.075 0.085 0.044 0.692 0.731 0.570 9.0 10.3 5.4
Li_ICT-TOSHIBA_task4_1 Hybrid system of SEDT and frame-wise model Li2022d 0.439 0.486 0.321 0.612 0.649 0.508 29.3 32.0 20.9
Li_ICT-TOSHIBA_task4_3 Hybrid system of SEDT and frame-wise model Li2022d 0.411 0.453 0.312 0.597 0.635 0.488 34.6 38.5 24.0
Xie_UESTC_task4_2 CNN14 FC Xie2022 0.062 0.074 0.021 0.800 0.825 0.719 13.7 14.2 12.6
Xie_UESTC_task4_3 CBAM-T CRNN scratch Xie2022 0.300 0.335 0.207 0.641 0.695 0.502 38.3 41.8 29.1
Xie_UESTC_task4_1 CBAM-T CRNN 1 Xie2022 0.418 0.463 0.323 0.757 0.815 0.626 52.7 57.3 41.1
Xie_UESTC_task4_4 CBAM-T CRNN 2 Xie2022 0.426 0.474 0.333 0.766 0.829 0.630 54.7 58.8 44.3
Baseline (AudioSet) DCASE2022 SED baseline system (AudioSet) Ronchini2022 0.345 0.387 0.254 0.540 0.592 0.414 41.1 44.5 32.5
Kim_CAUET_task4_1 DCASE2022 SED system1 Kim2022c 0.317 0.361 0.217 0.565 0.619 0.425 42.4 46.5 33.0
Kim_CAUET_task4_2 DCASE2022 SED system2 Kim2022c 0.340 0.388 0.230 0.544 0.604 0.400 41.1 45.5 31.3
Kim_CAUET_task4_3 DCASE2022 SED system3 Kim2022c 0.338 0.381 0.245 0.554 0.603 0.426 42.4 46.4 32.7
Li_XJU_task4_1 DCASE2022 SED system 1 Li2022c 0.364 0.411 0.265 0.570 0.623 0.444 44.9 48.7 35.4
Li_XJU_task4_3 DCASE2022 SED system 3 Li2022c 0.371 0.408 0.280 0.635 0.688 0.521 47.8 51.9 37.8
Li_XJU_task4_4 DCASE2022 SED system 4 Li2022c 0.195 0.222 0.158 0.683 0.740 0.537 27.8 29.8 23.3
Li_XJU_task4_2 DCASE2022 SED system 2 Li2022c 0.086 0.101 0.060 0.671 0.713 0.561 15.0 15.4 14.6
Castorena_UV_task4_3 Strong and Max-Weak balanced Castorena2022 0.267 0.299 0.184 0.531 0.577 0.405 32.8 35.7 25.0
Castorena_UV_task4_1 Max-Weak balanced Castorena2022 0.334 0.365 0.256 0.524 0.558 0.420 39.2 43.2 29.0
Castorena_UV_task4_2 Avg-Weak balanced Castorena2022 0.072 0.073 0.076 0.559 0.588 0.460 11.2 12.1 9.2

Without external resources

Rank Submission
code
Submission
name
Technical
Report

Ranking score
(Evaluation dataset)

PSDS 1
(Evaluation dataset)

PSDS 2
(Evaluation dataset)

PSDS 1
(Development dataset)

PSDS 2
(Development dataset)
Xiao_UCAS_task4_3 DCASE2022 base system Xiao2022 1.21 0.420 0.599 0.431 0.645
Huang_NSYSU_task4_1 DCASE2022 KDmt SED Huang2022 1.28 0.434 0.650 0.437 0.680
Suh_ReturnZero_task4_1 rtzr_dev-only Suh2022 1.22 0.393 0.650
Cheng_CHT_task4_2 DCASE2022_CRNN_ADJ Cheng2022 0.93 0.276 0.543 0.356 0.601
Cheng_CHT_task4_1 DCASE2022_CRNN_IMP Cheng2022 1.03 0.314 0.582 0.362 0.635
Liu_SRCN_task4_4 DCASE2022 task4 without external data Liu2022 0.24 0.025 0.219 0.037 0.244
Kim_LGE_task4_1 DCASE2022 Kim system 1 Kim2022a 1.34 0.444 0.697 0.473 0.693
Kim_LGE_task4_2 DCASE2022 Kim system 2 Kim2022a 1.34 0.444 0.695 0.473 0.695
Giannakopoulos_UNIPI_task4_2 Multi-Task Learning using Variational AutoEncoders Giannakopoulos2022 0.21 0.029 0.184 0.046 0.165
Mizobuchi_PCO_task4_1 PCO_task4_SED_A Mizobuchi2022 1.15 0.398 0.571 0.425 0.625
KIM_HYU_task4_2 single1 Sojeong2022 1.28 0.421 0.664 0.422 0.667
KIM_HYU_task4_4 single2 Sojeong2022 1.27 0.423 0.651 0.480 0.726
KIM_HYU_task4_1 train_ensemble1 Sojeong2022 1.19 0.390 0.620 0.434 0.675
KIM_HYU_task4_3 train_ensemble2 Sojeong2022 1.24 0.415 0.634 0.494 0.748
Baseline DCASE2022 SED baseline system Turpault2022 1.00 0.315 0.543 0.342 0.527
Dinkel_XiaoRice_task4_1 SCRATCH Dinkel2022 1.29 0.422 0.679 0.456 0.713
Dinkel_XiaoRice_task4_2 SMALL Dinkel2022 1.15 0.373 0.613 0.395 0.631
Dinkel_XiaoRice_task4_4 TAG Dinkel2022 0.92 0.104 0.824 0.126 0.877
Dinkel_XiaoRice_task4_3 PRECISE Dinkel2022 1.38 0.451 0.727 0.482 0.757
Hao_UNISOC_task4_2 SUBMISSION FOR DCASE2022 TASK4 Hao2022 0.78 0.078 0.723 0.448 0.700
Hao_UNISOC_task4_1 SUBMISSION FOR DCASE2022 TASK4 Hao2022 1.24 0.425 0.615 0.448 0.700
Hao_UNISOC_task4_3 SUBMISSION FOR DCASE2022 TASK4 Hao2022 1.09 0.373 0.547 0.448 0.700
Khandelwal_FMSG-NTU_task4_4 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 1.20 0.386 0.643 0.474 0.730
deBenito_AUDIAS_task4_4 7-Resolution CRNN+Conformer with class-wise median filtering deBenito2022 1.28 0.432 0.649 0.428 0.655
deBenito_AUDIAS_task4_1 10-Resolution CRNN+Conformer deBenito2022 1.23 0.400 0.646 0.410 0.665
deBenito_AUDIAS_task4_2 10-Resolution CRNN+Conformer with class-wise median filtering deBenito2022 1.08 0.310 0.642 0.347 0.663
deBenito_AUDIAS_task4_3 7-Resolution CRNN+Conformer deBenito2022 1.23 0.407 0.643 0.422 0.656
Li_WU_task4_1 ATST-RCT SED system CRNN with RCT Shao2022 1.13 0.368 0.594 0.398 0.611
Kim_GIST_task4_3 Kim_GIST_task4_3 Kim2022b 1.43 0.500 0.695 0.452 0.682
Kim_GIST_task4_1 Kim_GIST_task4_1 Kim2022b 1.47 0.514 0.713 0.458 0.688
Kim_GIST_task4_2 Kim_GIST_task4_2 Kim2022b 1.46 0.510 0.711 0.456 0.685
Ebbers_UPB_task4_4 CRNN ensemble w/o external data Ebbers2022 1.49 0.509 0.742 0.492 0.721
Xu_SRCB-BIT_task4_4 FDY-CRNN-weak train Xu2022 0.75 0.049 0.738 0.058 0.813
Nam_KAIST_task4_SED_2 SED_2 Nam2022 1.25 0.409 0.656 0.470 0.700
Nam_KAIST_task4_SED_3 SED_3 Nam2022 0.77 0.057 0.747 0.061 0.822
Nam_KAIST_task4_SED_4 SED_4 Nam2022 0.77 0.055 0.747 0.058 0.820
Nam_KAIST_task4_SED_1 SED_1 Nam2022 1.24 0.404 0.653 0.470 0.687
Blakala_SRPOL_task4_3 Blakala_SRPOL_task4_3 Blakala2022 0.95 0.293 0.527 0.341 0.596
Li_USTC_task4_SED_4 Mean teacher Pseudo labeling system 4 Li2022b 1.34 0.429 0.723 0.436 0.778
Li_USTC_task4_SED_3 Mean teacher Pseudo labeling system 3 Li2022b 1.35 0.450 0.699 0.456 0.726
Bertola_UPF_task4_1 DCASE2022 baseline system Bertola2022 0.98 0.318 0.520 0.356 0.554
Xie_UESTC_task4_3 CBAM-T CRNN scratch Xie2022 1.06 0.300 0.641 0.360 0.674
Kim_CAUET_task4_1 DCASE2022 SED system1 Kim2022c 1.02 0.317 0.565 0.372 0.592
Kim_CAUET_task4_2 DCASE2022 SED system2 Kim2022c 1.04 0.340 0.544 0.377 0.585
Kim_CAUET_task4_3 DCASE2022 SED system3 Kim2022c 1.04 0.338 0.554 0.373 0.571
Li_XJU_task4_2 DCASE2022 SED system 2 Li2022c 0.75 0.086 0.671 0.095 0.754
Li_XJU_task4_1 DCASE2022 SED system 1 Li2022c 1.10 0.364 0.570 0.408 0.607
Li_ICT-TOSHIBA_task4_4 Hybrid system of SEDT and frame-wise model Li2022d 0.75 0.075 0.692 0.099 0.783
Li_ICT-TOSHIBA_task4_3 Hybrid system of SEDT and frame-wise model Li2022d 1.20 0.411 0.597 0.420 0.618
Castorena_UV_task4_3 Strong and Max-Weak balanced Castorena2022 0.91 0.267 0.531 0.305 0.587
Castorena_UV_task4_1 Max-Weak balanced Castorena2022 1.01 0.334 0.524 0.343 0.538
Castorena_UV_task4_2 Avg-Weak balanced Castorena2022 0.63 0.072 0.559 0.067 0.641

With external resources

Rank Submission
code
Submission
name
Technical
Report

Ranking score
(Evaluation dataset)

PSDS 1
(Evaluation dataset)

PSDS 2
(Evaluation dataset)

PSDS 1
(Development dataset)

PSDS 2
(Development dataset)
Zhang_UCAS_task4_2 DCASE2022 pretrained system 2 Xiao2022 1.41 0.484 0.697 0.481 0.694
Zhang_UCAS_task4_1 DCASE2022 pretrained system 1 Xiao2022 1.39 0.472 0.700 0.475 0.688
Zhang_UCAS_task4_4 DCASE2022 weak_pred system Xiao2022 0.79 0.049 0.784 0.051 0.826
Liu_NSYSU_task4_2 DCASE2022 PANNs SED 2 Liu2022 0.06 0.000 0.063 0.451 0.734
Liu_NSYSU_task4_3 DCASE2022 PANNs SED 3 Liu2022 0.29 0.070 0.194 0.457 0.767
Suh_ReturnZero_task4_4 rtzr_weak-SED Suh2022 0.81 0.062 0.774 0.063 0.814
Suh_ReturnZero_task4_2 rtzr_strong-real Suh2022 1.39 0.458 0.721 0.473 0.723
Suh_ReturnZero_task4_3 rtzr_audioset Suh2022 1.42 0.478 0.719 0.445 0.704
Liu_SRCN_task4_2 DCASE2022 task4 Pre-Trained 2 Liu2022 0.90 0.129 0.758 0.177 0.801
Liu_SRCN_task4_1 DCASE2022 task4 Pre-Trained 1 Liu2022 0.79 0.051 0.777 0.067 0.827
Liu_SRCN_task4_3 DCASE2022 task4 AudioSet strong Liu2022 1.25 0.425 0.634 0.443 0.660
Kim_LGE_task4_3 DCASE2022 Kim system 3 Kim2022a 0.81 0.062 0.781 0.068 0.830
Kim_LGE_task4_4 DCASE2022 Kim system 4 Kim2022a 1.17 0.305 0.750 0.354 0.756
Ryu_Deeply_task4_1 SKATTN_1 Ryu2022 0.83 0.257 0.461 0.269 0.446
Ryu_Deeply_task4_2 SKATTN_2 Ryu2022 0.66 0.156 0.449 0.161 0.452
Giannakopoulos_UNIPI_task4_1 Multi-Task Learning using Variational AutoEncoders Giannakopoulos2022 0.35 0.104 0.196 0.129 0.241
Mizobuchi_PCO_task4_4 PCO_task4_SED_D Mizobuchi2022 0.82 0.062 0.787 0.075 0.852
Mizobuchi_PCO_task4_2 PCO_task4_SED_B Mizobuchi2022 1.26 0.439 0.611 0.449 0.662
Mizobuchi_PCO_task4_3 PCO_task4_SED_C Mizobuchi2022 0.88 0.197 0.620 0.231 0.714
Dinkel_XiaoRice_task4_4 TAG Dinkel2022 0.92 0.104 0.824 0.126 0.877
Dinkel_XiaoRice_task4_3 PRECISE Dinkel2022 1.38 0.451 0.727 0.482 0.757
Khandelwal_FMSG-NTU_task4_1 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 0.83 0.158 0.633 0.088 0.837
Khandelwal_FMSG-NTU_task4_2 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 0.80 0.082 0.731 0.102 0.840
Khandelwal_FMSG-NTU_task4_3 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 1.26 0.410 0.664 0.472 0.721
Li_WU_task4_4 ATST-RCT SED system ATST ensemble Shao2022 1.41 0.486 0.694 0.477 0.734
Li_WU_task4_2 ATST-RCT SED system ATST small Shao2022 1.36 0.476 0.666 0.460 0.698
Li_WU_task4_3 ATST-RCT SED system ATST base Shao2022 1.40 0.482 0.693 0.468 0.702
Kim_GIST_task4_4 Kim_GIST_task4_4 Kim2022b 0.65 0.215 0.335 0.459 0.744
Ebbers_UPB_task4_2 FBCRNN ensemble Ebbers2022 0.83 0.047 0.824 0.080 0.868
Ebbers_UPB_task4_1 CRNN ensemble Ebbers2022 1.59 0.552 0.786 0.512 0.772
Ebbers_UPB_task4_3 tag-conditioned CRNN ensemble Ebbers2022 1.46 0.527 0.679 0.483 0.713
Xu_SRCB-BIT_task4_2 PANNs-FDY-CRNN-wrTCL system 2 Xu2022 1.41 0.482 0.702 0.485 0.725
Xu_SRCB-BIT_task4_1 PANNs-FDY-CRNN-wrTCL system 1 Xu2022 1.32 0.452 0.662 0.481 0.710
Xu_SRCB-BIT_task4_3 PANNs-FDY-CRNN-weak train Xu2022 0.79 0.054 0.774 0.065 0.835
Blakala_SRPOL_task4_1 Blakala_SRPOL_task4_1 Blakala2022 1.11 0.365 0.584 0.374 0.583
Blakala_SRPOL_task4_2 Blakala_SRPOL_task4_2 Blakala2022 0.78 0.069 0.728 0.070 0.794
Li_USTC_task4_SED_1 Mean teacher Pseudo labeling system 1 Li2022b 1.41 0.480 0.713 0.479 0.735
Li_USTC_task4_SED_2 Mean teacher Pseudo labeling system 2 Li2022b 1.39 0.451 0.740 0.462 0.785
He_BYTEDANCE_task4_4 DCASE2022 SED mean teacher system 4 He2022 0.82 0.053 0.810 0.071 0.857
He_BYTEDANCE_task4_2 DCASE2022 SED mean teacher system 2 He2022 1.48 0.503 0.749 0.521 0.771
He_BYTEDANCE_task4_3 DCASE2022 SED mean teacher system 3 He2022 1.52 0.525 0.748 0.533 0.762
He_BYTEDANCE_task4_1 DCASE2022 SED mean teacher system 1 He2022 1.36 0.454 0.696 0.474 0.692
Li_ICT-TOSHIBA_task4_2 Hybrid system of SEDT and frame-wise model Li2022d 0.79 0.090 0.709 0.115 0.816
Li_ICT-TOSHIBA_task4_1 Hybrid system of SEDT and frame-wise model Li2022d 1.26 0.439 0.612 0.449 0.645
Xie_UESTC_task4_2 CNN14 FC Xie2022 0.83 0.062 0.800 0.072 0.856
Xie_UESTC_task4_1 CBAM-T CRNN 1 Xie2022 1.36 0.418 0.757 0.460 0.768
Xie_UESTC_task4_4 CBAM-T CRNN 2 Xie2022 1.38 0.426 0.766 0.460 0.768
Baseline (AudioSet) DCASE2022 SED baseline system (AudioSet) Ronchini2022 1.04 0.345 0.540 0.342 0.527
Li_XJU_task4_4 DCASE2022 SED system 4 Li2022c 0.93 0.195 0.683 0.215 0.735
Li_XJU_task4_3 DCASE2022 SED system 3 Li2022c 1.17 0.371 0.635 0.398 0.640

Teams ranking

Table including only the best ranking score per submitting team.

Rank Submission
code
(PSDS 1)
Submission
name
(PSDS 1)
Submission
code
(PSDS 2)
Submission
name
(PSDS 2)
Technical
Report

Ranking score
(Evaluation dataset)

PSDS 1
(Evaluation dataset)

PSDS 2
(Evaluation dataset)
Zhang_UCAS_task4_2 DCASE2022 pretrained system 2 Zhang_UCAS_task4_4 DCASE2022 weak_pred system Xiao2022 1.49 0.484 0.784
Huang_NSYSU_task4_1 DCASE2022 KDmt SED Huang_NSYSU_task4_1 DCASE2022 KDmt SED Huang2022 1.28 0.434 0.650
Suh_ReturnZero_task4_3 rtzr_audioset Suh_ReturnZero_task4_4 rtzr_weak-SED Suh2022 1.47 0.478 0.774
Cheng_CHT_task4_1 DCASE2022_CRNN_IMP Cheng_CHT_task4_1 DCASE2022_CRNN_IMP Cheng2022 1.03 0.314 0.582
Liu_SRCN_task4_3 DCASE2022 task4 AudioSet strong Liu_SRCN_task4_1 DCASE2022 task4 Pre-Trained 1 Liu2022 1.38 0.425 0.777
Kim_LGE_task4_2 DCASE2022 Kim system 2 Kim_LGE_task4_3 DCASE2022 Kim system 3 Kim2022a 1.42 0.444 0.781
Ryu_Deeply_task4_1 SKATTN_1 Ryu_Deeply_task4_1 SKATTN_1 Ryu2022 0.83 0.257 0.461
Giannakopoulos_UNIPI_task4_1 Multi-Task Learning using Variational AutoEncoders Giannakopoulos_UNIPI_task4_1 Multi-Task Learning using Variational AutoEncoders Giannakopoulos2022 0.35 0.104 0.196
Mizobuchi_PCO_task4_2 PCO_task4_SED_B Mizobuchi_PCO_task4_4 PCO_task4_SED_D Mizobuchi2022 1.42 0.439 0.787
KIM_HYU_task4_4 single2 KIM_HYU_task4_2 single1 Sojeong2022 1.28 0.423 0.664
Baseline DCASE2022 SED baseline system Baseline DCASE2022 SED baseline system Turpault2022 1.00 0.315 0.543
Dinkel_XiaoRice_task4_3 PRECISE Dinkel_XiaoRice_task4_4 TAG Dinkel2022 1.47 0.451 0.824
Hao_UNISOC_task4_1 SUBMISSION FOR DCASE2022 TASK4 Hao_UNISOC_task4_2 SUBMISSION FOR DCASE2022 TASK4 Hao2022 1.34 0.425 0.723
Khandelwal_FMSG-NTU_task4_3 FMSG-NTU DCASE2022 SED Model-1 Khandelwal_FMSG-NTU_task4_2 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 1.32 0.410 0.731
deBenito_AUDIAS_task4_4 7-Resolution CRNN+Conformer with class-wise median filtering deBenito_AUDIAS_task4_4 7-Resolution CRNN+Conformer with class-wise median filtering deBenito2022 1.28 0.432 0.649
Li_WU_task4_4 ATST-RCT SED system ATST ensemble Li_WU_task4_4 ATST-RCT SED system ATST ensemble Shao2022 1.41 0.486 0.694
Kim_GIST_task4_1 Kim_GIST_task4_1 Kim_GIST_task4_1 Kim_GIST_task4_1 Kim2022b 1.47 0.514 0.713
Ebbers_UPB_task4_1 CRNN ensemble Ebbers_UPB_task4_2 FBCRNN ensemble Ebbers2022 1.63 0.552 0.824
Xu_SRCB-BIT_task4_2 PANNs-FDY-CRNN-wrTCL system 2 Xu_SRCB-BIT_task4_3 PANNs-FDY-CRNN-weak train Xu2022 1.47 0.482 0.774
Nam_KAIST_task4_SED_2 SED_2 Nam_KAIST_task4_SED_3 SED_3 Nam2022 1.33 0.409 0.747
Blakala_SRPOL_task4_1 Blakala_SRPOL_task4_1 Blakala_SRPOL_task4_2 Blakala_SRPOL_task4_2 Blakala2022 1.25 0.365 0.728
Li_USTC_task4_SED_1 Mean teacher Pseudo labeling system 1 Li_USTC_task4_SED_2 Mean teacher Pseudo labeling system 2 Li2022b 1.44 0.480 0.740
Bertola_UPF_task4_1 DCASE2022 baseline system Bertola_UPF_task4_1 DCASE2022 baseline system Bertola2022 0.98 0.318 0.520
He_BYTEDANCE_task4_3 DCASE2022 SED mean teacher system 3 He_BYTEDANCE_task4_4 DCASE2022 SED mean teacher system 4 He2022 1.57 0.525 0.810
Li_ICT-TOSHIBA_task4_1 Hybrid system of SEDT and frame-wise model Li_ICT-TOSHIBA_task4_2 Hybrid system of SEDT and frame-wise model Li2022d 1.35 0.439 0.709
Xie_UESTC_task4_4 CBAM-T CRNN 2 Xie_UESTC_task4_2 CNN14 FC Xie2022 1.41 0.426 0.800
Baseline (AudioSet) DCASE2022 SED baseline system (AudioSet) Baseline (AudioSet) DCASE2022 SED baseline system (AudioSet) Ronchini2022 1.04 0.345 0.540
Kim_CAUET_task4_2 DCASE2022 SED system2 Kim_CAUET_task4_1 DCASE2022 SED system1 Kim2022c 1.06 0.340 0.565
Li_XJU_task4_3 DCASE2022 SED system 3 Li_XJU_task4_4 DCASE2022 SED system 4 Li2022c 1.21 0.371 0.683
Castorena_UV_task4_1 Max-Weak balanced Castorena_UV_task4_2 Avg-Weak balanced Castorena2022 1.04 0.334 0.559

Supplementary metrics

Rank Submission
code
(PSDS 1)
Submission
name
(PSDS 1)
Submission
code
(PSDS 2)
Submission
name
(PSDS 2)
Technical
Report
Ranking score
(Evaluation dataset)
Ranking score
(Public evaluation)
Ranking score
(Vimeo dataset)
Zhang_UCAS_task4_2 DCASE2022 pretrained system 2 Zhang_UCAS_task4_4 DCASE2022 weak_pred system Xiao2022 1.49 1.43 1.69
Huang_NSYSU_task4_1 DCASE2022 KDmt SED Huang_NSYSU_task4_1 DCASE2022 KDmt SED Huang2022 1.28 1.26 1.37
Suh_ReturnZero_task4_3 rtzr_audioset Suh_ReturnZero_task4_4 rtzr_weak-SED Suh2022 1.47 1.39 1.71
Cheng_CHT_task4_1 DCASE2022_CRNN_IMP Cheng_CHT_task4_1 DCASE2022_CRNN_IMP Cheng2022 1.03 1.01 1.11
Liu_SRCN_task4_3 DCASE2022 task4 AudioSet strong Liu_SRCN_task4_1 DCASE2022 task4 Pre-Trained 1 Liu2022 1.38 1.33 1.57
Kim_LGE_task4_2 DCASE2022 Kim system 2 Kim_LGE_task4_3 DCASE2022 Kim system 3 Kim2022a 1.42 1.38 1.60
Ryu_Deeply_task4_1 SKATTN_1 Ryu_Deeply_task4_1 SKATTN_1 Ryu2022 0.83 0.82 0.89
Giannakopoulos_UNIPI_task4_1 Multi-Task Learning using Variational AutoEncoders Giannakopoulos_UNIPI_task4_1 Multi-Task Learning using Variational AutoEncoders Giannakopoulos2022 0.35 0.35 0.27
Mizobuchi_PCO_task4_2 PCO_task4_SED_B Mizobuchi_PCO_task4_4 PCO_task4_SED_D Mizobuchi2022 1.42 1.37 1.58
KIM_HYU_task4_4 single2 KIM_HYU_task4_2 single1 Sojeong2022 1.28 1.27 1.34
Baseline DCASE2022 SED baseline system Baseline DCASE2022 SED baseline system Turpault2022 1.00 1.00 1.00
Dinkel_XiaoRice_task4_3 PRECISE Dinkel_XiaoRice_task4_4 TAG Dinkel2022 1.47 1.42 1.64
Hao_UNISOC_task4_1 SUBMISSION FOR DCASE2022 TASK4 Hao_UNISOC_task4_2 SUBMISSION FOR DCASE2022 TASK4 Hao2022 1.34 1.31 1.47
Khandelwal_FMSG-NTU_task4_3 FMSG-NTU DCASE2022 SED Model-1 Khandelwal_FMSG-NTU_task4_2 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 1.32 1.28 1.49
deBenito_AUDIAS_task4_4 7-Resolution CRNN+Conformer with class-wise median filtering deBenito_AUDIAS_task4_4 7-Resolution CRNN+Conformer with class-wise median filtering deBenito2022 1.28 1.25 1.39
Li_WU_task4_4 ATST-RCT SED system ATST ensemble Li_WU_task4_4 ATST-RCT SED system ATST ensemble Shao2022 1.41 1.37 1.58
Kim_GIST_task4_1 Kim_GIST_task4_1 Kim_GIST_task4_1 Kim_GIST_task4_1 Kim2022b 1.47 1.41 1.65
Ebbers_UPB_task4_1 CRNN ensemble Ebbers_UPB_task4_2 FBCRNN ensemble Ebbers2022 1.63 1.55 1.97
Xu_SRCB-BIT_task4_2 PANNs-FDY-CRNN-wrTCL system 2 Xu_SRCB-BIT_task4_3 PANNs-FDY-CRNN-weak train Xu2022 1.47 1.41 1.66
Nam_KAIST_task4_SED_2 SED_2 Nam_KAIST_task4_SED_3 SED_3 Nam2022 1.33 1.27 1.56
Blakala_SRPOL_task4_1 Blakala_SRPOL_task4_1 Blakala_SRPOL_task4_2 Blakala_SRPOL_task4_2 Blakala2022 1.25 1.19 1.45
Li_USTC_task4_SED_1 Mean teacher Pseudo labeling system 1 Li_USTC_task4_SED_2 Mean teacher Pseudo labeling system 2 Li2022b 1.44 1.40 1.56
Bertola_UPF_task4_1 DCASE2022 baseline system Bertola_UPF_task4_1 DCASE2022 baseline system Bertola2022 0.98 0.96 1.05
He_BYTEDANCE_task4_3 DCASE2022 SED mean teacher system 3 He_BYTEDANCE_task4_4 DCASE2022 SED mean teacher system 4 He2022 1.57 1.51 1.80
Li_ICT-TOSHIBA_task4_1 Hybrid system of SEDT and frame-wise model Li_ICT-TOSHIBA_task4_2 Hybrid system of SEDT and frame-wise model Li2022d 1.35 1.30 1.44
Xie_UESTC_task4_4 CBAM-T CRNN 2 Xie_UESTC_task4_2 CNN14 FC Xie2022 1.41 1.35 1.64
Baseline (AudioSet) DCASE2022 SED baseline system (AudioSet) Baseline (AudioSet) DCASE2022 SED baseline system (AudioSet) Ronchini2022 1.04 1.04 1.08
Kim_CAUET_task4_2 DCASE2022 SED system2 Kim_CAUET_task4_1 DCASE2022 SED system1 Kim2022c 1.06 1.06 1.04
Li_XJU_task4_3 DCASE2022 SED system 3 Li_XJU_task4_4 DCASE2022 SED system 4 Li2022c 1.21 1.19 1.29
Castorena_UV_task4_1 Max-Weak balanced Castorena_UV_task4_2 Avg-Weak balanced Castorena2022 1.04 1.00 1.14

Without external resources

Rank Submission
code
(PSDS 1)
Submission
name
(PSDS 1)
Submission
code
(PSDS 2)
Submission
name
(PSDS 2)
Technical
Report

Ranking score
(Evaluation dataset)

PSDS 1
(Evaluation dataset)

PSDS 2
(Evaluation dataset)
Xiao_UCAS_task4_3 DCASE2022 base system Xiao_UCAS_task4_3 DCASE2022 base system Xiao2022 1.21 0.420 0.599
Huang_NSYSU_task4_1 DCASE2022 KDmt SED Huang_NSYSU_task4_1 DCASE2022 KDmt SED Huang2022 1.28 0.434 0.650
Suh_ReturnZero_task4_1 rtzr_dev-only Suh_ReturnZero_task4_1 rtzr_dev-only Suh2022 1.22 0.393 0.650
Cheng_CHT_task4_1 DCASE2022_CRNN_IMP Cheng_CHT_task4_1 DCASE2022_CRNN_IMP Cheng2022 1.03 0.314 0.582
Liu_SRCN_task4_4 DCASE2022 task4 without external data Liu_SRCN_task4_4 DCASE2022 task4 without external data Liu2022 0.24 0.025 0.219
Kim_LGE_task4_2 DCASE2022 Kim system 2 Kim_LGE_task4_1 DCASE2022 Kim system 1 Kim2022a 1.34 0.444 0.697
Giannakopoulos_UNIPI_task4_2 Multi-Task Learning using Variational AutoEncoders Giannakopoulos_UNIPI_task4_2 Multi-Task Learning using Variational AutoEncoders Giannakopoulos2022 0.21 0.029 0.184
Mizobuchi_PCO_task4_1 PCO_task4_SED_A Mizobuchi_PCO_task4_1 PCO_task4_SED_A Mizobuchi2022 1.15 0.398 0.571
KIM_HYU_task4_4 single2 KIM_HYU_task4_2 single1 Sojeong2022 1.28 0.423 0.664
Baseline DCASE2022 SED baseline system Baseline DCASE2022 SED baseline system Turpault2022 1.00 0.315 0.543
Dinkel_XiaoRice_task4_3 PRECISE Dinkel_XiaoRice_task4_4 TAG Dinkel2022 1.47 0.451 0.824
Hao_UNISOC_task4_1 SUBMISSION FOR DCASE2022 TASK4 Hao_UNISOC_task4_2 SUBMISSION FOR DCASE2022 TASK4 Hao2022 1.34 0.425 0.723
Khandelwal_FMSG-NTU_task4_4 FMSG-NTU DCASE2022 SED Model-1 Khandelwal_FMSG-NTU_task4_4 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 1.20 0.386 0.643
deBenito_AUDIAS_task4_4 7-Resolution CRNN+Conformer with class-wise median filtering deBenito_AUDIAS_task4_4 7-Resolution CRNN+Conformer with class-wise median filtering deBenito2022 1.28 0.432 0.649
Li_WU_task4_1 ATST-RCT SED system CRNN with RCT Li_WU_task4_1 ATST-RCT SED system CRNN with RCT Shao2022 1.13 0.368 0.594
Kim_GIST_task4_1 Kim_GIST_task4_1 Kim_GIST_task4_1 Kim_GIST_task4_1 Kim2022b 1.47 0.514 0.713
Ebbers_UPB_task4_4 CRNN ensemble w/o external data Ebbers_UPB_task4_4 CRNN ensemble w/o external data Ebbers2022 1.49 0.509 0.742
Xu_SRCB-BIT_task4_4 FDY-CRNN-weak train Xu_SRCB-BIT_task4_4 FDY-CRNN-weak train Xu2022 0.75 0.049 0.738
Nam_KAIST_task4_SED_2 SED_2 Nam_KAIST_task4_SED_3 SED_3 Nam2022 1.33 0.409 0.747
Blakala_SRPOL_task4_3 Blakala_SRPOL_task4_3 Blakala_SRPOL_task4_3 Blakala_SRPOL_task4_3 Blakala2022 0.95 0.293 0.527
Li_USTC_task4_SED_3 Mean teacher Pseudo labeling system 3 Li_USTC_task4_SED_4 Mean teacher Pseudo labeling system 4 Li2022b 1.38 0.450 0.723
Bertola_UPF_task4_1 DCASE2022 baseline system Bertola_UPF_task4_1 DCASE2022 baseline system Bertola2022 0.98 0.318 0.520
Xie_UESTC_task4_3 CBAM-T CRNN scratch Xie_UESTC_task4_3 CBAM-T CRNN scratch Xie2022 1.06 0.300 0.641
Kim_CAUET_task4_2 DCASE2022 SED system2 Kim_CAUET_task4_1 DCASE2022 SED system1 Kim2022c 1.06 0.340 0.565
Li_XJU_task4_1 DCASE2022 SED system 1 Li_XJU_task4_2 DCASE2022 SED system 2 Li2022c 1.19 0.364 0.671
Li_ICT-TOSHIBA_task4_3 Hybrid system of SEDT and frame-wise model Li_ICT-TOSHIBA_task4_4 Hybrid system of SEDT and frame-wise model Li2022d 1.29 0.411 0.692
Castorena_UV_task4_1 Max-Weak balanced Castorena_UV_task4_2 Avg-Weak balanced Castorena2022 1.04 0.334 0.559

With external resources

Rank Submission
code
(PSDS 1)
Submission
name
(PSDS 1)
Submission
code
(PSDS 2)
Submission
name
(PSDS 2)
Technical
Report

Ranking score
(Evaluation dataset)

PSDS 1
(Evaluation dataset)

PSDS 2
(Evaluation dataset)
Zhang_UCAS_task4_2 DCASE2022 pretrained system 2 Zhang_UCAS_task4_4 DCASE2022 weak_pred system Xiao2022 1.49 0.484 0.784
Suh_ReturnZero_task4_3 rtzr_audioset Suh_ReturnZero_task4_4 rtzr_weak-SED Suh2022 1.47 0.478 0.774
Liu_SRCN_task4_3 DCASE2022 task4 AudioSet strong Liu_SRCN_task4_1 DCASE2022 task4 Pre-Trained 1 Liu2022 1.38 0.425 0.777
Kim_LGE_task4_4 DCASE2022 Kim system 4 Kim_LGE_task4_3 DCASE2022 Kim system 3 Kim2022a 1.20 0.305 0.781
Ryu_Deeply_task4_1 SKATTN_1 Ryu_Deeply_task4_1 SKATTN_1 Ryu2022 0.83 0.257 0.461
Giannakopoulos_UNIPI_task4_1 Multi-Task Learning using Variational AutoEncoders Giannakopoulos_UNIPI_task4_1 Multi-Task Learning using Variational AutoEncoders Giannakopoulos2022 0.35 0.104 0.196
Mizobuchi_PCO_task4_2 PCO_task4_SED_B Mizobuchi_PCO_task4_4 PCO_task4_SED_D Mizobuchi2022 1.42 0.439 0.787
Dinkel_XiaoRice_task4_3 PRECISE Dinkel_XiaoRice_task4_4 TAG Dinkel2022 1.47 0.451 0.824
Khandelwal_FMSG-NTU_task4_3 FMSG-NTU DCASE2022 SED Model-1 Khandelwal_FMSG-NTU_task4_2 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 1.32 0.410 0.731
Li_WU_task4_4 ATST-RCT SED system ATST ensemble Li_WU_task4_4 ATST-RCT SED system ATST ensemble Shao2022 1.41 0.486 0.694
Kim_GIST_task4_4 Kim_GIST_task4_4 Kim_GIST_task4_4 Kim_GIST_task4_4 Kim2022b 0.65 0.215 0.335
Ebbers_UPB_task4_1 CRNN ensemble Ebbers_UPB_task4_2 FBCRNN ensemble Ebbers2022 1.63 0.552 0.824
Xu_SRCB-BIT_task4_2 PANNs-FDY-CRNN-wrTCL system 2 Xu_SRCB-BIT_task4_3 PANNs-FDY-CRNN-weak train Xu2022 1.47 0.482 0.774
Blakala_SRPOL_task4_1 Blakala_SRPOL_task4_1 Blakala_SRPOL_task4_2 Blakala_SRPOL_task4_2 Blakala2022 1.25 0.365 0.728
Li_USTC_task4_SED_1 Mean teacher Pseudo labeling system 1 Li_USTC_task4_SED_2 Mean teacher Pseudo labeling system 2 Li2022b 1.44 0.480 0.740
He_BYTEDANCE_task4_3 DCASE2022 SED mean teacher system 3 He_BYTEDANCE_task4_4 DCASE2022 SED mean teacher system 4 He2022 1.57 0.525 0.810
Li_ICT-TOSHIBA_task4_1 Hybrid system of SEDT and frame-wise model Li_ICT-TOSHIBA_task4_2 Hybrid system of SEDT and frame-wise model Li2022d 1.35 0.439 0.709
Xie_UESTC_task4_4 CBAM-T CRNN 2 Xie_UESTC_task4_2 CNN14 FC Xie2022 1.41 0.426 0.800
Baseline (AudioSet) DCASE2022 SED baseline system (AudioSet) Baseline (AudioSet) DCASE2022 SED baseline system (AudioSet) Ronchini2022 1.04 0.345 0.540
Li_XJU_task4_3 DCASE2022 SED system 3 Li_XJU_task4_4 DCASE2022 SED system 4 Li2022c 1.21 0.371 0.683

Class-wise performance

Rank Submission
code
Submission
name
Technical
Report
Ranking score
(Evaluation dataset)
Alarm
Bell
Ringing
Blender Cat Dishes Dog Electric
shave
toothbrush
Frying Running
water
Speech Vacuum
cleaner
Zhang_UCAS_task4_2 DCASE2022 pretrained system 2 Xiao2022 1.41 56.4 66.1 72.2 40.8 45.4 58.2 54.6 39.9 62.4 69.2
Zhang_UCAS_task4_1 DCASE2022 pretrained system 1 Xiao2022 1.39 58.8 61.9 72.8 42.9 47.2 61.3 49.8 35.9 65.2 66.4
Zhang_UCAS_task4_3 DCASE2022 base system Xiao2022 1.21 45.5 54.5 70.3 39.3 49.8 48.8 43.3 36.4 62.4 62.5
Zhang_UCAS_task4_4 DCASE2022 weak_pred system Xiao2022 0.79 6.6 6.1 0.9 0.0 0.3 15.2 52.1 29.8 0.3 38.7
Liu_NSYSU_task4_2 DCASE2022 PANNs SED 2 Liu2022 0.06 13.1 1.6 11.0 0.0 29.5 1.0 49.1 0.0
Liu_NSYSU_task4_3 DCASE2022 PANNs SED 3 Liu2022 0.29 11.3 1.6 17.5 0.5 14.1 0.0 0.0 38.4 0.0
Huang_NSYSU_task4_1 DCASE2022 KDmt SED Huang2022 1.28 45.1 48.9 65.5 35.8 47.8 54.7 41.3 32.5 70.3 34.3
Liu_NSYSU_task4_4 DCASE2022 PANNs SED 4 Liu2022 0.21 8.5 1.6 15.0 0.0 11.2 0.0 37.5 1.5
Suh_ReturnZero_task4_1 rtzr_dev-only Suh2022 1.22 26.0 53.5 71.7 40.3 45.1 39.8 46.0 33.6 52.6 59.6
Suh_ReturnZero_task4_4 rtzr_weak-SED Suh2022 0.81 5.8 2.8 0.5 0.0 0.3 14.7 49.2 18.5 0.2 37.0
Suh_ReturnZero_task4_2 rtzr_strong-real Suh2022 1.39 37.8 65.3 77.9 44.5 45.6 53.4 56.4 33.8 56.4 60.1
Suh_ReturnZero_task4_3 rtzr_audioset Suh2022 1.42 39.7 62.9 77.8 47.1 46.0 52.5 63.0 32.8 55.1 61.1
Cheng_CHT_task4_2 DCASE2022_CRNN_ADJ Cheng2022 0.93 31.2 39.8 67.4 32.8 32.5 41.2 46.2 29.5 53.8 34.4
Cheng_CHT_task4_1 DCASE2022_CRNN_IMP Cheng2022 1.03 31.6 48.9 65.6 28.6 24.0 45.6 45.8 32.4 51.0 59.0
Liu_SRCN_task4_2 DCASE2022 task4 Pre-Trained 2 Liu2022 0.90 10.9 20.8 2.3 0.7 1.7 19.7 58.3 27.1 4.2 47.8
Liu_SRCN_task4_1 DCASE2022 task4 Pre-Trained 1 Liu2022 0.79 4.8 4.5 0.9 0.0 0.3 15.5 49.2 21.8 0.2 38.9
Liu_SRCN_task4_4 DCASE2022 task4 without external data Liu2022 0.24 2.7 2.2 0.0 29.6 0.2 17.7
Liu_SRCN_task4_3 DCASE2022 task4 AudioSet strong Liu2022 1.25 36.3 58.7 69.8 40.6 48.4 36.4 49.6 27.3 69.2 56.9
Kim_LGE_task4_1 DCASE2022 Kim system 1 Kim2022a 1.34 36.8 52.5 73.3 46.0 45.1 38.6 50.8 30.3 70.2 66.7
Kim_LGE_task4_3 DCASE2022 Kim system 3 Kim2022a 0.81 3.3 2.9 0.5 0.0 0.3 11.8 50.2 22.5 0.2 36.5
Kim_LGE_task4_4 DCASE2022 Kim system 4 Kim2022a 1.17 12.2 34.1 12.0 8.8 4.3 17.9 50.2 28.1 47.6 58.8
Kim_LGE_task4_2 DCASE2022 Kim system 2 Kim2022a 1.34 36.8 52.8 73.3 46.1 45.1 39.3 50.8 30.2 70.2 66.9
Ryu_Deeply_task4_1 SKATTN_1 Ryu2022 0.83 23.7 30.1 39.9 1.1 15.1 36.2 46.3 29.1 47.9 36.1
Ryu_Deeply_task4_2 SKATTN_2 Ryu2022 0.66 11.3 4.4 18.4 10.1 5.7 16.8 38.9 18.1 36.8 33.0
Giannakopoulos_UNIPI_task4_2 Multi-Task Learning using Variational AutoEncoders Giannakopoulos2022 0.21 4.5 11.2 6.5 2.5 2.0 8.5 17.4 9.6 3.9 27.9
Giannakopoulos_UNIPI_task4_1 Multi-Task Learning using Variational AutoEncoders Giannakopoulos2022 0.35 22.9 35.1 29.2 19.5 11.5 21.0 27.9 15.9 45.5 37.0
Mizobuchi_PCO_task4_4 PCO_task4_SED_D Mizobuchi2022 0.82 3.9 5.5 0.9 0.0 0.3 14.5 47.5 23.9 0.2 39.8
Mizobuchi_PCO_task4_2 PCO_task4_SED_B Mizobuchi2022 1.26 46.5 44.4 71.4 40.8 43.5 44.8 45.4 37.0 64.7 58.5
Mizobuchi_PCO_task4_3 PCO_task4_SED_C Mizobuchi2022 0.88 11.8 34.8 21.4 1.6 2.5 34.8 27.8 30.4 12.8 39.8
Mizobuchi_PCO_task4_1 PCO_task4_SED_A Mizobuchi2022 1.15 34.6 47.5 69.4 36.5 48.3 40.5 49.4 38.8 61.0 50.2
KIM_HYU_task4_2 single1 Sojeong2022 1.28 43.1 53.5 70.8 33.1 44.3 42.9 50.4 35.3 62.9 59.6
KIM_HYU_task4_4 single2 Sojeong2022 1.27 42.7 58.2 68.7 31.2 43.1 55.3 48.9 32.6 61.3 62.2
KIM_HYU_task4_1 train_ensemble1 Sojeong2022 1.19 42.5 53.6 69.6 29.8 44.1 43.2 42.7 37.0 61.2 57.5
KIM_HYU_task4_3 train_ensemble2 Sojeong2022 1.24 39.9 58.5 68.6 32.0 39.9 48.4 49.4 32.0 59.1 53.3
Baseline DCASE2022 SED baseline system Turpault2022 1.00 32.2 39.0 62.4 28.6 34.5 21.1 37.2 26.4 49.7 42.0
Dinkel_XiaoRice_task4_1 SCRATCH Dinkel2022 1.29 36.7 51.9 61.1 30.9 40.8 47.9 50.4 29.8 60.2 46.6
Dinkel_XiaoRice_task4_2 SMALL Dinkel2022 1.15 36.8 37.6 57.3 28.2 39.2 29.1 46.1 25.6 58.3 34.4
Dinkel_XiaoRice_task4_4 TAG Dinkel2022 0.92 4.4 4.6 0.5 0.0 0.3 13.5 53.1 24.2 0.4 41.0
Dinkel_XiaoRice_task4_3 PRECISE Dinkel2022 1.38 36.5 55.6 65.0 35.0 41.7 48.2 56.0 33.6 51.4 52.0
Hao_UNISOC_task4_2 SUBMISSION FOR DCASE2022 TASK4 Hao2022 0.78 3.5 4.4 0.5 0.0 0.3 15.2 29.6 18.3 0.3 35.8
Hao_UNISOC_task4_1 SUBMISSION FOR DCASE2022 TASK4 Hao2022 1.24 49.8 46.2 72.1 28.7 47.4 49.6 25.8 27.6 65.1 58.7
Hao_UNISOC_task4_3 SUBMISSION FOR DCASE2022 TASK4 Hao2022 1.09 41.6 46.2 71.6 29.7 45.5 42.9 25.7 27.5 64.5 58.4
Khandelwal_FMSG-NTU_task4_1 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 0.83 18.8 14.9 7.4 2.5 2.7 32.0 48.0 23.5 12.9 39.7
Khandelwal_FMSG-NTU_task4_2 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 0.80 3.9 3.0 0.5 0.0 0.3 14.8 45.2 24.5 0.2 38.6
Khandelwal_FMSG-NTU_task4_3 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 1.26 41.5 54.0 69.4 45.8 41.4 47.8 51.0 40.2 51.6 60.5
Khandelwal_FMSG-NTU_task4_4 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 1.20 27.7 46.4 67.2 40.9 30.3 39.3 49.1 40.9 48.7 56.1
deBenito_AUDIAS_task4_4 7-Resolution CRNN+Conformer with class-wise median filtering deBenito2022 1.28 38.6 50.4 65.7 32.6 42.3 45.2 49.8 30.5 51.5 58.2
deBenito_AUDIAS_task4_1 10-Resolution CRNN+Conformer deBenito2022 1.23 39.8 55.0 66.5 26.0 34.1 44.6 42.2 34.5 52.4 54.7
deBenito_AUDIAS_task4_2 10-Resolution CRNN+Conformer with class-wise median filtering deBenito2022 1.08 40.3 49.3 54.3 3.5 8.1 45.1 44.3 34.1 42.3 55.6
deBenito_AUDIAS_task4_3 7-Resolution CRNN+Conformer deBenito2022 1.23 39.3 55.9 69.1 29.0 36.5 45.1 47.0 31.6 54.1 57.3
Li_WU_task4_4 ATST-RCT SED system ATST ensemble Shao2022 1.41 39.6 47.5 73.3 43.3 57.0 51.5 47.3 38.3 61.6 58.3
Li_WU_task4_2 ATST-RCT SED system ATST small Shao2022 1.36 45.1 44.9 77.1 47.6 55.3 43.0 60.8 36.2 66.7 39.7
Li_WU_task4_3 ATST-RCT SED system ATST base Shao2022 1.40 39.6 47.5 73.3 43.3 57.0 51.5 47.3 38.3 61.6 58.3
Li_WU_task4_1 ATST-RCT SED system CRNN with RCT Shao2022 1.13 33.1 48.3 70.2 32.7 26.1 46.8 44.5 37.6 50.9 60.1
Kim_GIST_task4_3 Kim_GIST_task4_3 Kim2022b 1.43 48.4 59.0 71.4 30.8 45.2 56.9 64.0 38.6 71.9 66.7
Kim_GIST_task4_1 Kim_GIST_task4_1 Kim2022b 1.47 49.3 58.2 74.0 38.0 46.5 56.1 61.2 39.7 71.5 64.2
Kim_GIST_task4_2 Kim_GIST_task4_2 Kim2022b 1.46 45.4 61.1 74.1 39.8 46.5 54.3 63.5 38.2 71.7 60.3
Kim_GIST_task4_4 Kim_GIST_task4_4 Kim2022b 0.65 31.5 22.2 59.5 17.8 30.3 35.4 9.4 19.6 57.8 32.0
Ebbers_UPB_task4_4 CRNN ensemble w/o external data Ebbers2022 1.49 40.7 61.9 75.5 38.8 53.6 63.1 65.9 41.7 56.7 78.2
Ebbers_UPB_task4_2 FBCRNN ensemble Ebbers2022 0.83 4.8 4.4 0.9 0.0 0.3 10.3 43.7 17.5 0.2 36.0
Ebbers_UPB_task4_1 CRNN ensemble Ebbers2022 1.59 52.7 64.8 78.1 41.2 51.2 60.6 70.0 40.4 60.3 78.9
Ebbers_UPB_task4_3 tag-conditioned CRNN ensemble Ebbers2022 1.46 55.8 73.2 80.7 48.9 49.9 72.7 72.6 48.5 72.5 84.6
Xu_SRCB-BIT_task4_2 PANNs-FDY-CRNN-wrTCL system 2 Xu2022 1.41 45.2 60.8 74.3 46.4 50.7 44.3 53.9 30.3 74.5 69.8
Xu_SRCB-BIT_task4_1 PANNs-FDY-CRNN-wrTCL system 1 Xu2022 1.32 43.8 48.1 72.0 43.7 47.7 43.4 56.2 33.3 73.3 55.9
Xu_SRCB-BIT_task4_3 PANNs-FDY-CRNN-weak train Xu2022 0.79 4.4 4.2 0.5 0.0 0.3 13.2 44.8 23.4 0.3 40.2
Xu_SRCB-BIT_task4_4 FDY-CRNN-weak train Xu2022 0.75 3.8 3.4 0.5 0.0 0.3 13.0 49.2 20.2 0.2 37.3
Nam_KAIST_task4_SED_2 SED_2 Nam2022 1.25 31.8 58.6 73.1 43.2 41.8 40.2 44.6 31.4 64.9 59.6
Nam_KAIST_task4_SED_3 SED_3 Nam2022 0.77 3.9 3.6 0.5 0.0 0.3 13.3 44.4 21.3 0.2 37.7
Nam_KAIST_task4_SED_4 SED_4 Nam2022 0.77 3.9 3.6 0.5 0.0 0.3 14.1 43.9 22.3 0.2 38.5
Nam_KAIST_task4_SED_1 SED_1 Nam2022 1.24 29.1 59.4 71.6 43.9 44.3 45.4 44.3 33.2 64.5 62.2
Blakala_SRPOL_task4_3 Blakala_SRPOL_task4_3 Blakala2022 0.95 33.3 43.4 58.3 18.4 27.8 41.2 44.3 21.7 50.3 40.4
Blakala_SRPOL_task4_1 Blakala_SRPOL_task4_1 Blakala2022 1.11 29.9 44.8 58.0 29.0 36.4 26.8 42.6 24.3 54.4 48.7
Blakala_SRPOL_task4_2 Blakala_SRPOL_task4_2 Blakala2022 0.78 5.2 4.4 1.3 0.0 0.8 17.0 47.4 22.1 4.5 36.2
Li_USTC_task4_SED_1 Mean teacher Pseudo labeling system 1 Li2022b 1.41 43.9 46.5 75.7 37.8 48.2 61.1 61.7 42.9 65.3 68.0
Li_USTC_task4_SED_4 Mean teacher Pseudo labeling system 4 Li2022b 1.34 33.3 44.9 72.1 36.3 47.6 59.2 60.1 36.1 65.7 68.9
Li_USTC_task4_SED_2 Mean teacher Pseudo labeling system 2 Li2022b 1.39 41.7 43.2 74.2 36.6 48.6 59.1 60.8 39.4 65.5 69.2
Li_USTC_task4_SED_3 Mean teacher Pseudo labeling system 3 Li2022b 1.35 33.0 47.2 74.1 38.5 47.9 59.5 58.3 37.2 65.2 70.0
Bertola_UPF_task4_1 DCASE2022 baseline system Bertola2022 0.98 30.7 45.1 59.8 18.4 38.4 24.6 34.3 21.6 55.3 48.4
He_BYTEDANCE_task4_4 DCASE2022 SED mean teacher system 4 He2022 0.82 6.3 3.8 0.9 0.0 0.3 15.9 48.3 24.4 0.1 42.6
He_BYTEDANCE_task4_2 DCASE2022 SED mean teacher system 2 He2022 1.48 48.5 62.8 71.5 34.1 43.4 65.0 45.9 36.7 70.0 67.2
He_BYTEDANCE_task4_3 DCASE2022 SED mean teacher system 3 He2022 1.52 55.6 61.9 71.8 42.4 52.0 53.5 47.8 34.7 72.2 65.5
He_BYTEDANCE_task4_1 DCASE2022 SED mean teacher system 1 He2022 1.36 32.4 64.0 71.1 39.1 44.5 57.9 54.5 39.1 67.4 65.8
Li_ICT-TOSHIBA_task4_2 Hybrid system of SEDT and frame-wise model Li2022d 0.79 5.0 2.5 0.6 0.0 0.3 5.6 39.8 15.1 0.0 25.4
Li_ICT-TOSHIBA_task4_4 Hybrid system of SEDT and frame-wise model Li2022d 0.75 5.7 1.3 0.0 0.0 0.0 3.5 40.6 12.8 0.0 25.7
Li_ICT-TOSHIBA_task4_1 Hybrid system of SEDT and frame-wise model Li2022d 1.26 27.2 14.5 38.9 46.0 39.0 28.0 15.2 10.8 48.4 25.0
Li_ICT-TOSHIBA_task4_3 Hybrid system of SEDT and frame-wise model Li2022d 1.20 31.6 22.1 48.0 44.0 38.6 30.6 45.9 11.0 38.9 35.3
Xie_UESTC_task4_2 CNN14 FC Xie2022 0.83 6.2 4.7 0.9 0.0 0.3 13.5 48.9 25.3 0.3 36.7
Xie_UESTC_task4_3 CBAM-T CRNN scratch Xie2022 1.06 33.7 36.6 64.9 19.9 17.0 38.8 35.3 34.0 53.9 49.0
Xie_UESTC_task4_1 CBAM-T CRNN 1 Xie2022 1.36 40.5 62.3 71.3 33.3 33.0 59.0 58.6 44.2 58.2 66.4
Xie_UESTC_task4_4 CBAM-T CRNN 2 Xie2022 1.38 43.0 65.3 71.3 32.6 36.9 63.4 60.2 42.8 57.3 74.5
Baseline (AudioSet) DCASE2022 SED baseline system (AudioSet) Ronchini2022 1.04 41.3 42.2 60.4 22.3 40.7 25.3 45.6 28.5 56.2 48.5
Kim_CAUET_task4_1 DCASE2022 SED system1 Kim2022c 1.02 37.1 50.2 64.8 27.5 15.1 44.4 36.6 37.5 49.2 62.2
Kim_CAUET_task4_2 DCASE2022 SED system2 Kim2022c 1.04 39.1 54.5 68.9 36.1 31.0 31.4 27.4 36.4 38.8 47.9
Kim_CAUET_task4_3 DCASE2022 SED system3 Kim2022c 1.04 31.9 48.7 62.1 31.0 36.6 48.5 30.6 33.2 57.3 44.4
Li_XJU_task4_1 DCASE2022 SED system 1 Li2022c 1.10 28.6 48.7 68.6 33.9 38.3 43.3 49.0 34.5 59.9 44.5
Li_XJU_task4_3 DCASE2022 SED system 3 Li2022c 1.17 43.5 50.0 68.7 30.9 32.4 49.8 50.2 39.7 55.8 57.4
Li_XJU_task4_4 DCASE2022 SED system 4 Li2022c 0.93 25.3 34.5 7.4 11.4 4.2 38.3 51.6 26.5 43.1 35.8
Li_XJU_task4_2 DCASE2022 SED system 2 Li2022c 0.75 6.8 9.6 3.1 0.9 1.4 22.0 44.6 20.3 3.2 38.4
Castorena_UV_task4_3 Strong and Max-Weak balanced Castorena2022 0.91 35.2 39.3 44.7 21.0 14.9 37.4 40.2 24.9 22.0 48.1
Castorena_UV_task4_1 Max-Weak balanced Castorena2022 1.01 33.5 41.6 58.7 19.2 37.5 45.7 34.5 26.5 50.6 44.5
Castorena_UV_task4_2 Avg-Weak balanced Castorena2022 0.63 4.8 3.2 3.4 0.2 1.2 22.3 30.9 17.4 3.4 24.9

Energy Consumption

Rank Submission
code
Submission
name
Technical
Report

Ranking score
(Evaluation dataset)

PSDS 1
(Evaluation dataset)

PSDS 2
(Evaluation dataset)

Energy (kWh)
(training)

Energy (kWh)
(Test)

EW-PSDS 1
(training energy)

EW-PSDS 2
(training energy)

EW-PSDS 1
(test energy)

EW-PSDS 2
(test energy)
Zhang_UCAS_task4_2 DCASE2022 pretrained system 2 Xiao2022 1.41 0.484 0.697 4.800 0.060 0.173 0.249 0.242 0.348
Zhang_UCAS_task4_1 DCASE2022 pretrained system 1 Xiao2022 1.39 0.472 0.700 4.800 0.060 0.169 0.250 0.236 0.350
Zhang_UCAS_task4_3 DCASE2022 base system Xiao2022 1.21 0.420 0.599 2.700 0.040 0.267 0.381 0.315 0.449
Zhang_UCAS_task4_4 DCASE2022 weak_pred system Xiao2022 0.79 0.049 0.784 2.100 0.032 0.040 0.641 0.046 0.735
Liu_NSYSU_task4_2 DCASE2022 PANNs SED 2 Liu2022 0.06 0.000 0.063 1.593 0.002 0.000 0.068 0.003 0.943
Liu_NSYSU_task4_3 DCASE2022 PANNs SED 3 Liu2022 0.29 0.070 0.194 7.846 0.004 0.015 0.042 0.525 1.456
Huang_NSYSU_task4_1 DCASE2022 KDmt SED Huang2022 1.28 0.434 0.650 9.563 0.008 0.078 0.117 1.629 2.436
Liu_NSYSU_task4_4 DCASE2022 PANNs SED 4 Liu2022 0.21 0.046 0.151 6.372 0.006 0.012 0.041 0.231 0.754
Suh_ReturnZero_task4_1 rtzr_dev-only Suh2022 1.22 0.393 0.650 21.694 0.031 0.051
Suh_ReturnZero_task4_4 rtzr_weak-SED Suh2022 0.81 0.062 0.774 0.011 0.169 2.110
Suh_ReturnZero_task4_2 rtzr_strong-real Suh2022 1.39 0.458 0.721 22.986 0.010 0.034 0.054 1.379 2.171
Suh_ReturnZero_task4_3 rtzr_audioset Suh2022 1.42 0.478 0.719 46.891 0.074 0.017 0.026 0.194 0.292
Liu_SRCN_task4_2 DCASE2022 task4 Pre-Trained 2 Liu2022 0.90 0.129 0.758 6.751 0.004 0.033 0.193 0.871 5.130
Liu_SRCN_task4_1 DCASE2022 task4 Pre-Trained 1 Liu2022 0.79 0.051 0.777 6.751 0.004 0.013 0.198 0.345 5.259
Liu_SRCN_task4_4 DCASE2022 task4 without external data Liu2022 0.24 0.025 0.219 10.012 0.048 0.004 0.038 0.016 0.138
Liu_SRCN_task4_3 DCASE2022 task4 AudioSet strong Liu2022 1.25 0.425 0.634 0.733 0.004 0.996 1.486 3.275 4.888
Kim_LGE_task4_1 DCASE2022 Kim system 1 Kim2022a 1.34 0.444 0.697 17.000 0.300 0.045 0.070 0.044 0.070
Kim_LGE_task4_3 DCASE2022 Kim system 3 Kim2022a 0.81 0.062 0.781 17.000 0.300 0.006 0.079 0.006 0.078
Kim_LGE_task4_4 DCASE2022 Kim system 4 Kim2022a 1.17 0.305 0.750 17.000 0.300 0.031 0.076 0.030 0.075
Kim_LGE_task4_2 DCASE2022 Kim system 2 Kim2022a 1.34 0.444 0.695 17.000 0.300 0.045 0.070 0.044 0.069
Ryu_Deeply_task4_1 SKATTN_1 Ryu2022 0.83 0.257 0.461 29.850 0.040 0.015 0.027 0.193 0.346
Ryu_Deeply_task4_2 SKATTN_2 Ryu2022 0.66 0.156 0.449 18.780 0.040 0.014 0.041 0.117 0.337
Giannakopoulos_UNIPI_task4_2 Multi-Task Learning using Variational AutoEncoders Giannakopoulos2022 0.21 0.029 0.184 1.717 0.030 0.029 0.184 0.029 0.184
Giannakopoulos_UNIPI_task4_1 Multi-Task Learning using Variational AutoEncoders Giannakopoulos2022 0.35 0.104 0.196 1.717 0.030 0.104 0.196 0.104 0.196
KIM_HYU_task4_2 single1 Sojeong2022 1.28 0.421 0.664 1.780 0.010 0.406 0.640 1.264 1.991
KIM_HYU_task4_4 single2 Sojeong2022 1.27 0.423 0.651 1.800 0.004 0.403 0.621 3.172 4.885
KIM_HYU_task4_1 train_ensemble1 Sojeong2022 1.19 0.390 0.620 1.910 0.010 0.350 0.557 1.169 1.860
KIM_HYU_task4_3 train_ensemble2 Sojeong2022 1.24 0.415 0.634 1.800 0.005 0.396 0.605 2.492 3.804
Baseline DCASE2022 SED baseline system Turpault2022 1.00 0.315 0.543 1.717 0.030 0.315 0.543 0.315 0.543
Dinkel_XiaoRice_task4_2 SMALL Dinkel2022 1.15 0.373 0.613 1.717 0.025 0.373 0.613 0.448 0.736
Hao_UNISOC_task4_3 SUBMISSION FOR DCASE2022 TASK4 Hao2022 1.09 0.373 0.547 1.717 0.030 0.373 0.547 0.373 0.547
Khandelwal_FMSG-NTU_task4_1 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 0.83 0.158 0.633 1.820 0.005 0.149 0.597 0.968 3.876
Khandelwal_FMSG-NTU_task4_2 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 0.80 0.082 0.731 6.100 0.005 0.023 0.206 0.445 3.987
Khandelwal_FMSG-NTU_task4_3 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 1.26 0.410 0.664 3.250 0.005 0.217 0.351 2.676 4.332
Khandelwal_FMSG-NTU_task4_4 FMSG-NTU DCASE2022 SED Model-1 Khandelwal2022 1.20 0.386 0.643 3.630 0.005 0.183 0.304 2.413 4.018
deBenito_AUDIAS_task4_3 7-Resolution CRNN+Conformer with class-wise median filtering deBenito2022 1.28 0.432 0.649 12.872 0.045 0.058 0.087 0.288 0.433
deBenito_AUDIAS_task4_1 10-Resolution CRNN+Conformer deBenito2022 1.23 0.400 0.646 18.162 0.056 0.038 0.061 0.214 0.346
deBenito_AUDIAS_task4_2 10-Resolution CRNN+Conformer with class-wise median filtering deBenito2022 1.08 0.310 0.642 18.162 0.056 0.029 0.061 0.166 0.344
deBenito_AUDIAS_task4_3 7-Resolution CRNN+Conformer deBenito2022 1.23 0.407 0.643 12.872 0.045 0.054 0.086 0.271 0.429
Li_WU_task4_4 ATST-RCT SED system ATST ensemble Shao2022 1.41 0.486 0.694 23.900 1.772 0.035 0.050 0.008 0.012
Li_WU_task4_2 ATST-RCT SED system ATST small Shao2022 1.36 0.476 0.666 3.500 0.624 0.234 0.327 0.023 0.032
Li_WU_task4_3 ATST-RCT SED system ATST base Shao2022 1.40 0.482 0.693 4.800 0.626 0.172 0.248 0.023 0.033
Li_WU_task4_1 ATST-RCT SED system CRNN with RCT Shao2022 1.13 0.368 0.594 2.210 0.450 0.286 0.462 0.025 0.040
Kim_GIST_task4_3 Kim_GIST_task4_3 Kim2022b 1.43 0.500 0.695 151.415 1.190 0.006 0.008 0.013 0.018
Kim_GIST_task4_1 Kim_GIST_task4_1 Kim2022b 1.47 0.514 0.713 151.415 1.190 0.006 0.008 0.013 0.018
Kim_GIST_task4_2 Kim_GIST_task4_2 Kim2022b 1.46 0.510 0.711 151.415 1.190 0.006 0.008 0.013 0.018
Kim_GIST_task4_4 Kim_GIST_task4_4 Kim2022b 0.65 0.215 0.335 3.768 0.246 0.098 0.153 0.026 0.041
Ebbers_UPB_task4_4 CRNN ensemble w/o external data Ebbers2022 1.49 0.509 0.742 27.200 0.020 0.032 0.047 0.764 1.113
Ebbers_UPB_task4_2 FBCRNN ensemble Ebbers2022 0.83 0.047 0.824 36.000 0.020 0.002 0.039 0.070 1.236
Ebbers_UPB_task4_1 CRNN ensemble Ebbers2022 1.59 0.552 0.786 50.000 0.020 0.019 0.027 0.828 1.179
Ebbers_UPB_task4_3 tag-conditioned CRNN ensemble Ebbers2022 1.46 0.527 0.679 50.000 0.020 0.018 0.023 0.791 1.019
Xu_SRCB-BIT_task4_2 PANNs-FDY-CRNN-wrTCL system 2 Xu2022 1.41 0.482 0.702 1.823 0.027 0.454 0.662 0.535 0.781
Xu_SRCB-BIT_task4_1 PANNs-FDY-CRNN-wrTCL system 1 Xu2022 1.32 0.452 0.662 1.823 0.027 0.426 0.624 0.502 0.736
Xu_SRCB-BIT_task4_3 PANNs-FDY-CRNN-weak train Xu2022 0.79 0.054 0.774 1.514 0.027 0.061 0.878 0.060 0.861
Xu_SRCB-BIT_task4_4 FDY-CRNN-weak train Xu2022 0.75 0.049 0.738 1.446 0.027 0.058 0.876 0.054 0.820
Nam_KAIST_task4_SED_2 SED_2 Nam2022 1.25 0.409 0.656 1.327 0.077 0.529 0.849 0.159 0.256
Nam_KAIST_task4_SED_3 SED_3 Nam2022 0.77 0.057 0.747 1.327 0.077 0.074 0.966 0.022 0.291
Nam_KAIST_task4_SED_4 SED_4 Nam2022 0.77 0.055 0.747 1.327 0.077 0.071 0.966 0.021 0.291
Nam_KAIST_task4_SED_1 SED_1 Nam2022 1.24 0.404 0.653 1.327 0.077 0.522 0.845 0.157 0.255
Blakala_SRPOL_task4_3 Blakala_SRPOL_task4_3 Blakala2022 0.95 0.293 0.527 2.757 0.037 0.182 0.328 0.234 0.422
Blakala_SRPOL_task4_1 Blakala_SRPOL_task4_1 Blakala2022 1.11 0.365 0.584 2.755 0.037 0.228 0.364 0.300 0.479
Blakala_SRPOL_task4_2 Blakala_SRPOL_task4_2 Blakala2022 0.78 0.069 0.728 25.295 0.056 0.005 0.049 0.037 0.391
Li_USTC_task4_SED_1 Mean teacher Pseudo labeling system 1 Li2022b 1.41 0.480 0.713 11.880 0.014 0.069 0.103 1.028 1.528
Li_USTC_task4_SED_4 Mean teacher Pseudo labeling system 4 Li2022b 1.34 0.429 0.723 3.564 0.009 0.207 0.348 1.445 2.437
Li_USTC_task4_SED_2 Mean teacher Pseudo labeling system 2 Li2022b 1.39 0.451 0.740 11.880 0.014 0.065 0.107 0.966 1.585
Li_USTC_task4_SED_3 Mean teacher Pseudo labeling system 3 Li2022b 1.35 0.450 0.699 3.564 0.009 0.217 0.337 1.517 2.355
He_BYTEDANCE_task4_4 DCASE2022 SED mean teacher system 4 He2022 0.82 0.053 0.810 28.066 0.424 0.003 0.050 0.004 0.057
He_BYTEDANCE_task4_2 DCASE2022 SED mean teacher system 2 He2022 1.48 0.503 0.749 28.066 0.424 0.031 0.046 0.036 0.053
He_BYTEDANCE_task4_3 DCASE2022 SED mean teacher system 3 He2022 1.52 0.525 0.748 28.066 0.424 0.032 0.046 0.037 0.053
He_BYTEDANCE_task4_1 DCASE2022 SED mean teacher system 1 He2022 1.36 0.454 0.696 6.067 0.410 0.129 0.197 0.033 0.051
Li_ICT-TOSHIBA_task4_2 Hybrid system of SEDT and frame-wise model Li2022d 0.79 0.090 0.709 47.417 0.030 0.003 0.026 0.090 0.709
Li_ICT-TOSHIBA_task4_4 Hybrid system of SEDT and frame-wise model Li2022d 0.75 0.075 0.692 23.850 0.024 0.005 0.050 0.094 0.865
Li_ICT-TOSHIBA_task4_1 Hybrid system of SEDT and frame-wise model Li2022d 1.26 0.439 0.612 47.417 0.030 0.016 0.022 0.439 0.612
Li_ICT-TOSHIBA_task4_3 Hybrid system of SEDT and frame-wise model Li2022d 1.20 0.411 0.597 23.850 0.024 0.030 0.043 0.514 0.746
Baseline (AudioSet) DCASE2022 SED baseline system (AudioSet) Ronchini2022 1.04 0.345 0.540 2.418 0.027 0.245 0.383 0.383 0.600
Kim_CAUET_task4_1 DCASE2022 SED system1 Kim2022c 1.02 0.317 0.565 1.201 0.021 0.453 0.807 0.450 0.803
Kim_CAUET_task4_2 DCASE2022 SED system2 Kim2022c 1.04 0.340 0.544 1.114 0.021 0.525 0.839 0.484 0.774
Kim_CAUET_task4_3 DCASE2022 SED system3 Kim2022c 1.04 0.338 0.554 0.748 0.020 0.776 1.272 0.505 0.827
Li_XJU_task4_1 DCASE2022 SED system 1 Li2022c 1.10 0.364 0.570 2.718 0.017 0.230 0.360 0.643 1.007
Li_XJU_task4_3 DCASE2022 SED system 3 Li2022c 1.17 0.371 0.635 3.791 0.010 0.168 0.287 1.112 1.904
Li_XJU_task4_4 DCASE2022 SED system 4 Li2022c 0.93 0.195 0.683 3.317 0.015 0.101 0.354 0.390 1.367
Li_XJU_task4_2 DCASE2022 SED system 2 Li2022c 0.75 0.086 0.671 3.771 0.006 0.039 0.305 0.432 3.353

System characteristics

General characteristics

Rank Code Technical
Report
Ranking score (Evaluation dataset)
PSDS 1
(Evaluation dataset)

PSDS 2
(Evaluation dataset)
Data
augmentation
Features
Zhang_UCAS_task4_2 Xiao2022 1.41 0.484 0.697 specaugment, mixup, frame_shift, FilterAug log-mel energies
Zhang_UCAS_task4_1 Xiao2022 1.39 0.472 0.700 specaugment, mixup, frame_shift, FilterAug log-mel energies
Zhang_UCAS_task4_3 Xiao2022 1.21 0.420 0.599 specaugment, mixup, frame_shift, FilterAug log-mel energies
Zhang_UCAS_task4_4 Xiao2022 0.79 0.049 0.784 specaugment, mixup, frame_shift, FilterAug log-mel energies
Liu_NSYSU_task4_2 Liu2022 0.06 0.000 0.063 mix-up log-mel energies
Liu_NSYSU_task4_3 Liu2022 0.29 0.070 0.194 mix-up log-mel energies
Huang_NSYSU_task4_1 Huang2022 1.28 0.434 0.650 mixup, frame shifting log-mel energies
Liu_NSYSU_task4_4 Liu2022 0.21 0.046 0.151 mix-up log-mel energies
Suh_ReturnZero_task4_1 Suh2022 1.22 0.393 0.650 time shifting, time masking, Mixup, add noise, FilterAugment, SpecAugment log-mel energies
Suh_ReturnZero_task4_4 Suh2022 0.81 0.062 0.774 time shifting, time masking, Mixup, add noise, FilterAugment, SpecAugment, log-mel energies
Suh_ReturnZero_task4_2 Suh2022 1.39 0.458 0.721 time shifting, time masking, Mixup, add noise, FilterAugment, SpecAugment log-mel energies
Suh_ReturnZero_task4_3 Suh2022 1.42 0.478 0.719 time shifting, time masking, Mixup, add noise, FilterAugment, SpecAugment, log-mel energies
Cheng_CHT_task4_2 Cheng2022 0.93 0.276 0.543 mixup, time shift MelSpectrogram
Cheng_CHT_task4_1 Cheng2022 1.03 0.314 0.582 mixup, FilterAugment algorithm MelSpectrogram
Liu_SRCN_task4_2 Liu2022 0.90 0.129 0.758 mixup log-mel energies
Liu_SRCN_task4_1 Liu2022 0.79 0.051 0.777 mixup log-mel energies
Liu_SRCN_task4_4 Liu2022 0.24 0.025 0.219 mixup log-mel energies
Liu_SRCN_task4_3 Liu2022 1.25 0.425 0.634 frame shift, mixup, spec augment, filter augment log-mel energies
Kim_LGE_task4_1 Kim2022a 1.34 0.444 0.697 frame shifting, time masking, frequeny masking, mix-up, filter augment log-mel energies
Kim_LGE_task4_3 Kim2022a 0.81 0.062 0.781 frame shifting, time masking, frequeny masking, mix-up, filter augment log-mel energies
Kim_LGE_task4_4 Kim2022a 1.17 0.305 0.750 frame shifting, time masking, frequeny masking, mix-up, filter augment log-mel energies
Kim_LGE_task4_2 Kim2022a 1.34 0.444 0.695 frame shifting, time masking, frequeny masking, mix-up, filter augment log-mel energies
Ryu_Deeply_task4_1 Ryu2022 0.83 0.257 0.461 log-mel energies
Ryu_Deeply_task4_2 Ryu2022 0.66 0.156 0.449 log-mel energies
Giannakopoulos_UNIPI_task4_2 Giannakopoulos2022 0.21 0.029 0.184 log-mel energies
Giannakopoulos_UNIPI_task4_1 Giannakopoulos2022 0.35 0.104 0.196 log-mel energies
Mizobuchi_PCO_task4_4 Mizobuchi2022 0.82 0.062 0.787 filter augmentation, MixUp, Frame shift, Time mask log-mel energies
Mizobuchi_PCO_task4_2 Mizobuchi2022 1.26 0.439 0.611 filter augmentation, MixUp, Frame shift, Time mask log-mel energies
Mizobuchi_PCO_task4_3 Mizobuchi2022 0.88 0.197 0.620 filter augmentation, MixUp, Frame shift, Time mask log-mel energies
Mizobuchi_PCO_task4_1 Mizobuchi2022 1.15 0.398 0.571 filter augmentation, MixUp, Frame shift, Time mask log-mel energies
KIM_HYU_task4_2 Sojeong2022 1.28 0.421 0.664 time shifting, mix up, frequency masking log-mel energies
KIM_HYU_task4_4 Sojeong2022 1.27 0.423 0.651 time shifting, mix up, frequency masking log-mel energies
KIM_HYU_task4_1 Sojeong2022 1.19 0.390 0.620 time shifting, mix up, frequency masking log-mel energies
KIM_HYU_task4_3 Sojeong2022 1.24 0.415 0.634 time shifting, mix up, frequency masking log-mel energies
Baseline Turpault2022 1.00 0.315 0.543 mixup log-mel energies
Dinkel_XiaoRice_task4_1 Dinkel2022 1.29 0.422 0.679 specaugment, mixup log-mel energies
Dinkel_XiaoRice_task4_2 Dinkel2022 1.15 0.373 0.613 specaugment, mixup log-mel energies
Dinkel_XiaoRice_task4_4 Dinkel2022 0.92 0.104 0.824 specaugment, mixup log-mel energies
Dinkel_XiaoRice_task4_3 Dinkel2022 1.38 0.451 0.727 specaugment, mixup log-mel energies
Hao_UNISOC_task4_2 Hao2022 0.78 0.078 0.723 noise log-mel energies
Hao_UNISOC_task4_1 Hao2022 1.24 0.425 0.615 noise log-mel energies
Hao_UNISOC_task4_3 Hao2022 1.09 0.373 0.547 noise log-mel energies
Khandelwal_FMSG-NTU_task4_1 Khandelwal2022 0.83 0.158 0.633 time-masking, frame-shifting, mixup, filter-augmentation log-mel energies
Khandelwal_FMSG-NTU_task4_2 Khandelwal2022 0.80 0.082 0.731 time-masking, frame-shifting, mixup, Gaussian noise log-mel energies
Khandelwal_FMSG-NTU_task4_3 Khandelwal2022 1.26 0.410 0.664 time-masking, frame-shifting, mixup, filter-augmentation log-mel energies
Khandelwal_FMSG-NTU_task4_4 Khandelwal2022 1.20 0.386 0.643 time-masking, frame-shifting, mixup, filter-augmentation, Gaussian noise log-mel energies
deBenito_AUDIAS_task4_4 deBenito2022 1.28 0.432 0.649 mixup, time shifting log-mel energies
deBenito_AUDIAS_task4_1 deBenito2022 1.23 0.400 0.646 mixup, time shifting log-mel energies
deBenito_AUDIAS_task4_2 deBenito2022 1.08 0.310 0.642 mixup, time shifting log-mel energies
deBenito_AUDIAS_task4_3 deBenito2022 1.23 0.407 0.643 mixup, time shifting log-mel energies
Li_WU_task4_4 Shao2022 1.41 0.486 0.694 hard mixup, time masking, filter augmentation, time shifting, frequency masking log-mel energies
Li_WU_task4_2 Shao2022 1.36 0.476 0.666 hard mixup, time masking, frequency masking, time shifting log-mel energies
Li_WU_task4_3 Shao2022 1.40 0.482 0.693 hard mixup, time masking, filter augmentation, time shifting log-mel energies
Li_WU_task4_1 Shao2022 1.13 0.368 0.594 hard mixup, time masking, frequency masking, filter augmentation, time shifting log-mel energies
Kim_GIST_task4_3 Kim2022b 1.43 0.500 0.695 mix-up, specaugment, time-frequency shifting log-mel energies
Kim_GIST_task4_1 Kim2022b 1.47 0.514 0.713 mix-up, specaugment, time-frequency shifting log-mel energies
Kim_GIST_task4_2 Kim2022b 1.46 0.510 0.711 mix-up, specaugment, time-frequency shifting log-mel energies
Kim_GIST_task4_4 Kim2022b 0.65 0.215 0.335 mixup, time masking, filter augment, gaussian noise log-mel energies
Ebbers_UPB_task4_4 Ebbers2022 1.49 0.509 0.742 time-/frequency warping, time-/frequency-masking, superposition, random noise log-mel energies
Ebbers_UPB_task4_2 Ebbers2022 0.83 0.047 0.824 time-/frequency warping, time-/frequency-masking, superposition, random noise log-mel energies
Ebbers_UPB_task4_1 Ebbers2022 1.59 0.552 0.786 time-/frequency warping, time-/frequency-masking, superposition, random noise log-mel energies
Ebbers_UPB_task4_3 Ebbers2022 1.46 0.527 0.679 time-/frequency warping, time-/frequency-masking, superposition, random noise log-mel energies
Xu_SRCB-BIT_task4_2 Xu2022 1.41 0.482 0.702 specaugment, mixup, frame-shift, Filteraugment log-mel energies
Xu_SRCB-BIT_task4_1 Xu2022 1.32 0.452 0.662 specaugment, mixup, frame-shift, Filteraugment log-mel energies
Xu_SRCB-BIT_task4_3 Xu2022 0.79 0.054 0.774 specaugment, mixup, frame-shift, Filteraugment log-mel energies
Xu_SRCB-BIT_task4_4 Xu2022 0.75 0.049 0.738 specaugment, mixup, frame-shift, Filteraugment log-mel energies
Nam_KAIST_task4_SED_2 Nam2022 1.25 0.409 0.656 time shifiting, mixup, time masking, FilterAugment log-mel energies
Nam_KAIST_task4_SED_3 Nam2022 0.77 0.057 0.747 time shifiting, mixup, time masking, FilterAugment log-mel energies
Nam_KAIST_task4_SED_4 Nam2022 0.77 0.055 0.747 time shifiting, mixup, time masking, FilterAugment log-mel energies
Nam_KAIST_task4_SED_1 Nam2022 1.24 0.404 0.653 time shifiting, mixup, time masking, FilterAugment log-mel energies
Blakala_SRPOL_task4_3 Blakala2022 0.95 0.293 0.527 time warping, Brownian noise log-mel energies
Blakala_SRPOL_task4_1 Blakala2022 1.11 0.365 0.584 pitch shifting log-mel energies
Blakala_SRPOL_task4_2 Blakala2022 0.78 0.069 0.728 time warping, Brownian noise log-mel energies
Li_USTC_task4_SED_1 Li2022b 1.41 0.480 0.713 spec-augment, time-shifting log-mel energies
Li_USTC_task4_SED_4 Li2022b 1.34 0.429 0.723 spec-augment, time-shifting log-mel energies
Li_USTC_task4_SED_2 Li2022b 1.39 0.451 0.740 spec-augment, time-shifting log-mel energies
Li_USTC_task4_SED_3 Li2022b 1.35 0.450 0.699 spec-augment, time-shifting log-mel energies
Bertola_UPF_task4_1 Bertola2022 0.98 0.318 0.520 mixup, time-masking, frequency-masking log-mel energies
He_BYTEDANCE_task4_4 He2022 0.82 0.053 0.810 time mask, frame shift, mixup, ict, sct, FilterAugment log-mel energies
He_BYTEDANCE_task4_2 He2022 1.48 0.503 0.749 time mask, frame shift, mixup, ict, sct, FilterAugment log-mel energies
He_BYTEDANCE_task4_3 He2022 1.52 0.525 0.748 time mask, frame shift, mixup, ict, sct, FilterAugment log-mel energies
He_BYTEDANCE_task4_1 He2022 1.36 0.454 0.696 time mask, frame shift, mixup, ict, sct, FilterAugment log-mel energies
Li_ICT-TOSHIBA_task4_2 Li2022d 0.79 0.090 0.709 mixup, frequency mask (only SEDT), frequency shift (only SEDT), time mask (only SEDT) log-mel energies (frame-wise model), log-mel spectrogram (SEDT)
Li_ICT-TOSHIBA_task4_4 Li2022d 0.75 0.075 0.692 mixup, frequency mask (only SEDT), frequency shift (only SEDT), time mask (only SEDT) log-mel energies (frame-wise model), log-mel spectrogram (SEDT)
Li_ICT-TOSHIBA_task4_1 Li2022d 1.26 0.439 0.612 mixup, frequency mask (only SEDT), frequency shift (only SEDT), time mask (only SEDT) log-mel energies (frame-wise model), log-mel spectrogram (SEDT)
Li_ICT-TOSHIBA_task4_3 Li2022d 1.20 0.411 0.597 mixup, frequency mask (only SEDT), frequency shift (only SEDT), time mask (only SEDT) log-mel energies (frame-wise model), log-mel spectrogram (SEDT)
Xie_UESTC_task4_2 Xie2022 0.83 0.062 0.800 mixup, SpecAug log-mel energies
Xie_UESTC_task4_3 Xie2022 1.06 0.300 0.641 mixup, SpecAug log-mel energies
Xie_UESTC_task4_1 Xie2022 1.36 0.418 0.757 mixup, SpecAug log-mel energies
Xie_UESTC_task4_4 Xie2022 1.38 0.426 0.766 mixup, SpecAug log-mel energies
Baseline (AudioSet) Ronchini2022 1.04 0.345 0.540 mixup log-mel energies
Kim_CAUET_task4_1 Kim2022c 1.02 0.317 0.565 frame shift, mixup, time mask, filter augmentation log-mel energies
Kim_CAUET_task4_2 Kim2022c 1.04 0.340 0.544 frame shift, mixup, time mask, filter augmentation log-mel energies
Kim_CAUET_task4_3 Kim2022c 1.04 0.338 0.554 time shift, mixup, time mask, filter augmentation log-mel energies
Li_XJU_task4_1 Li2022c 1.10 0.364 0.570 mixup,filteraugment,cutout log-mel energies
Li_XJU_task4_3 Li2022c 1.17 0.371 0.635 mixup,filteraugment,cutout log-mel energies
Li_XJU_task4_4 Li2022c 0.93 0.195 0.683 mixup,filteraugment,cutout log-mel energies
Li_XJU_task4_2 Li2022c 0.75 0.086 0.671 mixup,filteraugment,cutout log-mel energies
Castorena_UV_task4_3 Castorena2022 0.91 0.267 0.531 log-mel energies
Castorena_UV_task4_1 Castorena2022 1.01 0.334 0.524 log-mel energies
Castorena_UV_task4_2 Castorena2022 0.63 0.072 0.559 log-mel energies



Machine learning characteristics

Rank Code Technical
Report
Ranking score (Evaluation dataset)
PSDS 1
(Evaluation dataset)

PSDS 2
(Evaluation dataset)
Classifier Semi-supervised approach Post-processing Segmentation
method
Decision
making
Zhang_UCAS_task4_2 Xiao2022 1.41 0.484 0.697 CRNN,CNN mean-teacher student classwise median filtering mean
Zhang_UCAS_task4_1 Xiao2022 1.39 0.472 0.700 CRNN,CNN mean-teacher student classwise median filtering mean
Zhang_UCAS_task4_3 Xiao2022 1.21 0.420 0.599 CRNN,CNN mean-teacher student classwise median filtering mean
Zhang_UCAS_task4_4 Xiao2022 0.79 0.049 0.784 CRNN,CNN mean-teacher student classwise median filtering mean
Liu_NSYSU_task4_2 Liu2022 0.06 0.000 0.063 CRNN mean-teacher student median filtering (93ms)
Liu_NSYSU_task4_3 Liu2022 0.29 0.070 0.194 CRNN mean-teacher student median filtering (93ms)
Huang_NSYSU_task4_1 Huang2022 1.28 0.434 0.650 CRNN, ensemble mean-teacher student, knowledge distillation median filtering (93ms) average
Liu_NSYSU_task4_4 Liu2022 0.21 0.046 0.151 CRNN mean-teacher student median filtering (93ms)
Suh_ReturnZero_task4_1 Suh2022 1.22 0.393 0.650 CRNN mean-teacher student median filtering averaging
Suh_ReturnZero_task4_4 Suh2022 0.81 0.062 0.774 CRNN mean-teacher student weak SED averaging
Suh_ReturnZero_task4_2 Suh2022 1.39 0.458 0.721 CRNN mean-teacher student median filtering averaging
Suh_ReturnZero_task4_3 Suh2022 1.42 0.478 0.719 CRNN mean-teacher student median filtering averaging
Cheng_CHT_task4_2 Cheng2022 0.93 0.276 0.543 CRNN, Multiscale CNN mean-teacher student median filtering (0.45s) attention layers
Cheng_CHT_task4_1 Cheng2022 1.03 0.314 0.582 CRNN, Multiscale CNN mean-teacher student median filtering (0.45s) attention layers
Liu_SRCN_task4_2 Liu2022 0.90 0.129 0.758 Transformer, RNN mean-teacher student median filtering attention layers mean
Liu_SRCN_task4_1 Liu2022 0.79 0.051 0.777 Transformer, RNN mean-teacher student median filtering attention layers mean
Liu_SRCN_task4_4 Liu2022 0.24 0.025 0.219 CNN mean-teacher student median filtering attention layers mean
Liu_SRCN_task4_3 Liu2022 1.25 0.425 0.634 CRNN mean-teacher student median filtering mean
Kim_LGE_task4_1 Kim2022a 1.34 0.444 0.697 FDY-CRNN mean-teacher student, ICT, FixMatch median filtering (329ms) mean
Kim_LGE_task4_3 Kim2022a 0.81 0.062 0.781 FDY-CRNN mean-teacher student, ICT, FixMatch median filtering (329ms) mean
Kim_LGE_task4_4 Kim2022a 1.17 0.305 0.750 FDY-CRNN mean-teacher student, ICT, FixMatch median filtering (329ms) mean
Kim_LGE_task4_2 Kim2022a 1.34 0.444 0.695 FDY-CRNN mean-teacher student, ICT, FixMatch median filtering (329ms) mean
Ryu_Deeply_task4_1 Ryu2022 0.83 0.257 0.461 SKATTN mean-teacher student median filtering (93ms)
Ryu_Deeply_task4_2 Ryu2022 0.66 0.156 0.449 SKATTN mean-teacher student median filtering (93ms)
Giannakopoulos_UNIPI_task4_2 Giannakopoulos2022 0.21 0.029 0.184 RNN multi-task learning median filtering (456ms)
Giannakopoulos_UNIPI_task4_1 Giannakopoulos2022 0.35 0.104 0.196 RNN multi-task learning median filtering (456ms)
Mizobuchi_PCO_task4_4 Mizobuchi2022 0.82 0.062 0.787 CRNN mean-teacher student median filtering, probability correction
Mizobuchi_PCO_task4_2 Mizobuchi2022 1.26 0.439 0.611 CRNN mean-teacher student median filtering, probability correction
Mizobuchi_PCO_task4_3 Mizobuchi2022 0.88 0.197 0.620 CRNN mean-teacher student median filtering, probability correction
Mizobuchi_PCO_task4_1 Mizobuchi2022 1.15 0.398 0.571 CRNN mean-teacher student median filtering, probability correction
KIM_HYU_task4_2 Sojeong2022 1.28 0.421 0.664 CRNN mean-teacher student median filtering (93ms) patch attention layers
KIM_HYU_task4_4 Sojeong2022 1.27 0.423 0.651 CRNN mean-teacher student median filtering (93ms) patch attention layers
KIM_HYU_task4_1 Sojeong2022 1.19 0.390 0.620 CRNN mean-teacher student median filtering (93ms) patch attention layers
KIM_HYU_task4_3 Sojeong2022 1.24 0.415 0.634 CRNN mean-teacher student median filtering (93ms) patch attention layers
Baseline Turpault2022 1.00 0.315 0.543 CRNN mean-teacher student
Dinkel_XiaoRice_task4_1 Dinkel2022 1.29 0.422 0.679 CRNN, RCRNN uda, mean-teacher student median filtering (443ms) avg
Dinkel_XiaoRice_task4_2 Dinkel2022 1.15 0.373 0.613 CRNN, RCRNN mean-teacher student median filtering (443ms)
Dinkel_XiaoRice_task4_4 Dinkel2022 0.92 0.104 0.824 CNN, Transformer uda, mean-teacher student median filtering (443ms) avg
Dinkel_XiaoRice_task4_3 Dinkel2022 1.38 0.451 0.727 CRNN, RCRNN, Transformer uda, mean-teacher student, noisystudent median filtering (443ms) avg
Hao_UNISOC_task4_2 Hao2022 0.78 0.078 0.723 CRNN domain adaptation median filtering with adaptive window size mean
Hao_UNISOC_task4_1 Hao2022 1.24 0.425 0.615 CRNN domain adaptation median filtering with adaptive window size mean
Hao_UNISOC_task4_3 Hao2022 1.09 0.373 0.547 CRNN domain adaptation median filtering with adaptive window size mean
Khandelwal_FMSG-NTU_task4_1 Khandelwal2022 0.83 0.158 0.633 CRNN mean-teacher student, pseudo-labelling, interpolation consistency training class-wise median filtering mean
Khandelwal_FMSG-NTU_task4_2 Khandelwal2022 0.80 0.082 0.731 CRNN mean-teacher student, interpolation consistency training class-wise median filtering mean
Khandelwal_FMSG-NTU_task4_3 Khandelwal2022 1.26 0.410 0.664 CRNN mean-teacher student, pseudo-labelling, interpolation consistency training class-wise median filtering mean
Khandelwal_FMSG-NTU_task4_4 Khandelwal2022 1.20 0.386 0.643 CRNN mean-teacher student, pseudo-labelling, interpolation consistency training class-wise median filtering mean
deBenito_AUDIAS_task4_4 deBenito2022 1.28 0.432 0.649 CRNN, conformer mean-teacher student median filtering (class dependent) averaging
deBenito_AUDIAS_task4_1 deBenito2022 1.23 0.400 0.646 CRNN, conformer mean-teacher student median filtering (450ms) averaging
deBenito_AUDIAS_task4_2 deBenito2022 1.08 0.310 0.642 CRNN, conformer mean-teacher student median filtering (class dependent) averaging
deBenito_AUDIAS_task4_3 deBenito2022 1.23 0.407 0.643 CRNN, conformer mean-teacher student median filtering (450ms) averaging
Li_WU_task4_4 Shao2022 1.41 0.486 0.694 CRNN, ATST mean-teacher student, RCT temperature, median filter averaging
Li_WU_task4_2 Shao2022 1.36 0.476 0.666 CRNN, ATST mean-teacher student, RCT temperature, median filter
Li_WU_task4_3 Shao2022 1.40 0.482 0.693 CRNN, ATST mean-teacher student, RCT temperature
Li_WU_task4_1 Shao2022 1.13 0.368 0.594 CRNN mean-teacher student, RCT median filtering (112ms)
Kim_GIST_task4_3 Kim2022b 1.43 0.500 0.695 RCRNN mean-teacher student, noisy student classwise median filtering average
Kim_GIST_task4_1 Kim2022b 1.47 0.514 0.713 RCRNN mean-teacher student, noisy student classwise median filtering average
Kim_GIST_task4_2 Kim2022b 1.46 0.510 0.711 RCRNN mean-teacher student, noisy student classwise median filtering average
Kim_GIST_task4_4 Kim2022b 0.65 0.215 0.335 RCRNN mean-teacher student classwise median filtering
Ebbers_UPB_task4_4 Ebbers2022 1.49 0.509 0.742 CRNN self-training median filtering (event-specific lengths) MIL average
Ebbers_UPB_task4_2 Ebbers2022 0.83 0.047 0.824 FBCRNN self-training median filtering (event-specific lengths) MIL average
Ebbers_UPB_task4_1 Ebbers2022 1.59 0.552 0.786 CRNN self-training median filtering (event-specific lengths) MIL average
Ebbers_UPB_task4_3 Ebbers2022 1.46 0.527 0.679 CRNN self-training median filtering (event-specific lengths) MIL average
Xu_SRCB-BIT_task4_2 Xu2022 1.41 0.482 0.702 FDY-CRNN mean-teacher student classwise median filtering averaging
Xu_SRCB-BIT_task4_1 Xu2022 1.32 0.452 0.662 FDY-CRNN mean-teacher student median filtering (93ms) averaging
Xu_SRCB-BIT_task4_3 Xu2022 0.79 0.054 0.774 FDY-CRNN mean-teacher student median filtering mean
Xu_SRCB-BIT_task4_4 Xu2022 0.75 0.049 0.738 FDY-CRNN mean-teacher student median filtering mean
Nam_KAIST_task4_SED_2 Nam2022 1.25 0.409 0.656 CRNN, ensemble mean-teacher student class-wise median filtering, weak prediction masking mean
Nam_KAIST_task4_SED_3 Nam2022 0.77 0.057 0.747 CRNN, ensemble mean-teacher student class-wise median filtering, weak prediction masking mean
Nam_KAIST_task4_SED_4 Nam2022 0.77 0.055 0.747 CRNN, ensemble mean-teacher student class-wise median filtering, weak prediction masking mean
Nam_KAIST_task4_SED_1 Nam2022 1.24 0.404 0.653 CRNN, ensemble mean-teacher student class-wise median filtering, weak prediction masking mean
Blakala_SRPOL_task4_3 Blakala2022 0.95 0.293 0.527 CRNN mean-teacher student median filtering (160ms)
Blakala_SRPOL_task4_1 Blakala2022 1.11 0.365 0.584 CRNN mean-teacher student median filtering (160ms)
Blakala_SRPOL_task4_2 Blakala2022 0.78 0.069 0.728 CRNN mean-teacher student median filtering (160ms)
Li_USTC_task4_SED_1 Li2022b 1.41 0.480 0.713 CRNN mean-teacher student, pseudo-labelling median filtering (340ms) averaging
Li_USTC_task4_SED_4 Li2022b 1.34 0.429 0.723 CRNN mean-teacher student, pseudo-labelling median filtering (340ms) averaging
Li_USTC_task4_SED_2 Li2022b 1.39 0.451 0.740 CRNN mean-teacher student, pseudo-labelling median filtering (340ms) averaging
Li_USTC_task4_SED_3 Li2022b 1.35 0.450 0.699 CRNN mean-teacher student, pseudo-labelling median filtering (340ms) averaging
Bertola_UPF_task4_1 Bertola2022 0.98 0.318 0.520 CRNN mean-teacher student median filtering (93ms)
He_BYTEDANCE_task4_4 He2022 0.82 0.053 0.810 SK-CRNN, FDY-CRNN mean-teacher student median filtering MIL averaging
He_BYTEDANCE_task4_2 He2022 1.48 0.503 0.749 SK-CRNN, FDY-CRNN mean-teacher student median filtering MIL averaging
He_BYTEDANCE_task4_3 He2022 1.52 0.525 0.748 SK-CRNN, FDY-CRNN mean-teacher student classwise median filtering MIL averaging
He_BYTEDANCE_task4_1 He2022 1.36 0.454 0.696 SK-CRNN, FDY-CRNN mean-teacher student median filtering MIL averaging
Li_ICT-TOSHIBA_task4_2 Li2022d 0.79 0.090 0.709 transformer (SEDT), CNN (frame-wise model), ensemble mean-teacher student (frame-wise model), pseudo-labelling (SEDT) median filtering with adaptive window size (only frame-wise model) attention layers (only frame-wise model) majority vote (only SEDT), weighted averaging (frame-wise model and ensemble model)
Li_ICT-TOSHIBA_task4_4 Li2022d 0.75 0.075 0.692 transformer (SEDT), CNN (frame-wise model), ensemble mean-teacher student (frame-wise model), pseudo-labelling (SEDT) median filtering with adaptive window size (only frame-wise model) attention layers (only frame-wise model) majority vote (only SEDT), weighted averaging (frame-wise model and ensemble model)
Li_ICT-TOSHIBA_task4_1 Li2022d 1.26 0.439 0.612 transformer (SEDT), CNN (frame-wise model), ensemble mean-teacher student (frame-wise model), pseudo-labelling (SEDT) median filtering with adaptive window size (only frame-wise model) attention layers (only frame-wise model) majority vote (only SEDT), weighted averaging (frame-wise model and ensemble model)
Li_ICT-TOSHIBA_task4_3 Li2022d 1.20 0.411 0.597 transformer (SEDT), CNN (frame-wise model), ensemble mean-teacher student (frame-wise model), pseudo-labelling (SEDT) median filtering with adaptive window size (only frame-wise model) attention layers (only frame-wise model) majority vote (only SEDT), weighted averaging (frame-wise model and ensemble model)
Xie_UESTC_task4_2 Xie2022 0.83 0.062 0.800 CRNN average
Xie_UESTC_task4_3 Xie2022 1.06 0.300 0.641 CRNN mean-teacher student median filtering (560ms) average
Xie_UESTC_task4_1 Xie2022 1.36 0.418 0.757 CRNN mean-teacher student median filtering (560ms) average
Xie_UESTC_task4_4 Xie2022 1.38 0.426 0.766 CRNN mean-teacher student median filtering (560ms) average
Baseline (AudioSet) Ronchini2022 1.04 0.345 0.540 CRNN mean-teacher student
Kim_CAUET_task4_1 Kim2022c 1.02 0.317 0.565 RCRNN mean-teacher student median filtering attention layers
Kim_CAUET_task4_2 Kim2022c 1.04 0.340 0.544 CRNN with cbam attetion mean-teacher student median filtering attention layers
Kim_CAUET_task4_3 Kim2022c 1.04 0.338 0.554 CRNN mean-teacher student median filtering attention layers
Li_XJU_task4_1 Li2022c 1.10 0.364 0.570 CRNN mean-teacher student median filtering (93ms) linearsoftmax layer, attention layer
Li_XJU_task4_3 Li2022c 1.17 0.371 0.635 CRNN mean-teacher student median filtering (93ms) linearsoftmax layer, attention layer
Li_XJU_task4_4 Li2022c 0.93 0.195 0.683 CRNN mean-teacher student median filtering linearsoftmax layer, mean
Li_XJU_task4_2 Li2022c 0.75 0.086 0.671 CRNN mean-teacher student median filtering linearsoftmax layer, mean
Castorena_UV_task4_3 Castorena2022 0.91 0.267 0.531 CRNN mean-teacher student median filtering (93ms)
Castorena_UV_task4_1 Castorena2022 1.01 0.334 0.524 CRNN mean-teacher student median filtering (93ms)
Castorena_UV_task4_2 Castorena2022 0.63 0.072 0.559 CRNN mean-teacher student median filtering (93ms)

Complexity

Rank Code Technical
Report
Ranking score (Evaluation dataset)
PSDS 1
(Evaluation dataset)

PSDS 2
(Evaluation dataset)
Model
complexity
Ensemble
subsystems
Training time
Zhang_UCAS_task4_2 Xiao2022 1.41 0.484 0.697 11325746 5 18h (1 Tesla P100 )
Zhang_UCAS_task4_1 Xiao2022 1.39 0.472 0.700 11325746 5 18h (1 Tesla P100 )
Zhang_UCAS_task4_3 Xiao2022 1.21 0.420 0.599 4282496 5 12h (1 Tesla P100 )
Zhang_UCAS_task4_4 Xiao2022 0.79 0.049 0.784 2359672 5 8h (1 Tesla P100 )
Liu_NSYSU_task4_2 Liu2022 0.06 0.000 0.063 3251508 7h (1 GTX 1080 Ti)
Liu_NSYSU_task4_3 Liu2022 0.29 0.070 0.194 16257540 5 35h (1 GTX 1080 Ti)
Huang_NSYSU_task4_1 Huang2022 1.28 0.434 0.650 14973876 6 10h * 6 (3060ti)
Liu_NSYSU_task4_4 Liu2022 0.21 0.046 0.151 13006032 4 28h (1 GTX 1080 Ti)
Suh_ReturnZero_task4_1 Suh2022 1.22 0.393 0.650 116400000 12 10h 8m 25s (1 NVIDIA A100-SXM4-80GB)
Suh_ReturnZero_task4_4 Suh2022 0.81 0.062 0.774 116400000 12 (1 NVIDIA A100-SXM4-80GB)
Suh_ReturnZero_task4_2 Suh2022 1.39 0.458 0.721 116400000 12 10h 52m 16s(1 NVIDIA A100-SXM4-80GB)
Suh_ReturnZero_task4_3 Suh2022 1.42 0.478 0.719 116400000 12 13h 46m 27s (1 NVIDIA A100-SXM4-80GB)
Cheng_CHT_task4_2 Cheng2022 0.93 0.276 0.543 4721326 18h (nvidia A100)
Cheng_CHT_task4_1 Cheng2022 1.03 0.314 0.582 4729921 18h (nvidia A100)
Liu_SRCN_task4_2 Liu2022 0.90 0.129 0.758 89500000 36h (1 NVIDIA A100 40Gb)
Liu_SRCN_task4_1 Liu2022 0.79 0.051 0.777 89500000 36h (1 NVIDIA A100 40Gb)
Liu_SRCN_task4_4 Liu2022 0.24 0.025 0.219 79700000 11h (1 RTX 2080 Ti)
Liu_SRCN_task4_3 Liu2022 1.25 0.425 0.634 11061000 6h (1 NVIDIA A100 40Gb)
Kim_LGE_task4_1 Kim2022a 1.34 0.444 0.697 11061000 8h (1 RTX A5000)
Kim_LGE_task4_3 Kim2022a 0.81 0.062 0.781 11061000 8h (1 RTX A5000)
Kim_LGE_task4_4 Kim2022a 1.17 0.305 0.750 11061000 8h (1 RTX A5000)
Kim_LGE_task4_2 Kim2022a 1.34 0.444 0.695 11061000 8h (1 RTX A5000)
Ryu_Deeply_task4_1 Ryu2022 0.83 0.257 0.461 625K 25h (4 A100 GPUs)
Ryu_Deeply_task4_2 Ryu2022 0.66 0.156 0.449 625K 16.9h (4 A100 GPUs)
Giannakopoulos_UNIPI_task4_2 Giannakopoulos2022 0.21 0.029 0.184 4213258 6h (1 GTX 2080 Ti)
Giannakopoulos_UNIPI_task4_1 Giannakopoulos2022 0.35 0.104 0.196 4213258 6h (1 GTX 2080 Ti)
Mizobuchi_PCO_task4_4 Mizobuchi2022 0.82 0.062 0.787 52793884 11 77h (1 NVIDIA Tesla V100 SXM2)
Mizobuchi_PCO_task4_2 Mizobuchi2022 1.26 0.439 0.611 70847296 16 48h (1 NVIDIA Tesla V100 SXM2)
Mizobuchi_PCO_task4_3 Mizobuchi2022 0.88 0.197 0.620 44279560 10 30h (1 NVIDIA Tesla V100 SXM2)
Mizobuchi_PCO_task4_1 Mizobuchi2022 1.15 0.398 0.571 35423648 8 24h (1 NVIDIA Tesla V100 SXM2)
KIM_HYU_task4_2 Sojeong2022 1.28 0.421 0.664 1112420 5 6h (1 GTX 2080 Ti)
KIM_HYU_task4_4 Sojeong2022 1.27 0.423 0.651 1112420 5 6h (1 GTX 2080 Ti)
KIM_HYU_task4_1 Sojeong2022 1.19 0.390 0.620 1112420 3 6h (1 GTX 2080 Ti)
KIM_HYU_task4_3 Sojeong2022 1.24 0.415 0.634 1112420 2 4h (1 GTX 3090 Ti)
Baseline Turpault2022 1.00 0.315 0.543 2200000 6h (1 GTX 1080 Ti)
Dinkel_XiaoRice_task4_1 Dinkel2022 1.29 0.422 0.679 8430844 9 3 h
Dinkel_XiaoRice_task4_2 Dinkel2022 1.15 0.373 0.613 148852 3 h
Dinkel_XiaoRice_task4_4 Dinkel2022 0.92 0.104 0.824 27992026 6 24 h
Dinkel_XiaoRice_task4_3 Dinkel2022 1.38 0.451 0.727 37451786 11 24 h
Hao_UNISOC_task4_2 Hao2022 0.78 0.078 0.723 4590228 3 36h (1 RTX 6000)
Hao_UNISOC_task4_1 Hao2022 1.24 0.425 0.615 4590228 3 36h (1 RTX 6000)
Hao_UNISOC_task4_3 Hao2022 1.09 0.373 0.547 4590228 36h (1 RTX 6000)
Khandelwal_FMSG-NTU_task4_1 Khandelwal2022 0.83 0.158 0.633 2770884 20h (1 NVIDIA Quadro RTX 5000)
Khandelwal_FMSG-NTU_task4_2 Khandelwal2022 0.80 0.082 0.731 118567907 24h (1 NVIDIA Quadro RTX 5000)
Khandelwal_FMSG-NTU_task4_3 Khandelwal2022 1.26 0.410 0.664 2770884 20h (1 NVIDIA Quadro RTX 5000)
Khandelwal_FMSG-NTU_task4_4 Khandelwal2022 1.20 0.386 0.643 2770884 20h (1 NVIDIA Quadro RTX 5000)
deBenito_AUDIAS_task4_4 deBenito2022 1.28 0.432 0.649 10659182 7 77h (1 GeForce RTX 2080 Ti)
deBenito_AUDIAS_task4_1 deBenito2022 1.23 0.400 0.646 15911270 10 111h (1 GeForce RTX 2080 Ti)
deBenito_AUDIAS_task4_2 deBenito2022 1.08 0.310 0.642 15911270 10 111h (1 GeForce RTX 2080 Ti)
deBenito_AUDIAS_task4_3 deBenito2022 1.23 0.407 0.643 10659182 7 77h (1 GeForce RTX 2080 Ti)
Li_WU_task4_4 Shao2022 1.41 0.486 0.694 475547380 5 8.3h (1 A100-SXM4-80GB)
Li_WU_task4_2 Shao2022 1.36 0.476 0.666 29986148 6.6h (1 A100-SXM4-80GB)
Li_WU_task4_3 Shao2022 1.40 0.482 0.693 95109476 8.3h (1 A100-SXM4-80GB)
Li_WU_task4_1 Shao2022 1.13 0.368 0.594 1112420 4h (1 A100-SXM4-80GB)
Kim_GIST_task4_3 Kim2022b 1.43 0.500 0.695 1691694 10 74h (5 RTX 2080ti)
Kim_GIST_task4_1 Kim2022b 1.47 0.514 0.713 1691694 10 74h (5 RTX 2080ti)
Kim_GIST_task4_2 Kim2022b 1.46 0.510 0.711 1691694 10 74h (5 RTX 2080ti)
Kim_GIST_task4_4 Kim2022b 0.65 0.215 0.335 792228 18h (1 RTX A6000)
Ebbers_UPB_task4_4 Ebbers2022 1.49 0.509 0.742 134119060 30 2d (10 A100)
Ebbers_UPB_task4_2 Ebbers2022 0.83 0.047 0.824 499812480 40 5d (10 A100)
Ebbers_UPB_task4_1 Ebbers2022 1.59 0.552 0.786 779623240 60 5d (10 A100)
Ebbers_UPB_task4_3 Ebbers2022 1.46 0.527 0.679 780237640 60 5d (10 A100)
Xu_SRCB-BIT_task4_2 Xu2022 1.41 0.482 0.702 11066748 10 4h (1 RTX 3090)
Xu_SRCB-BIT_task4_1 Xu2022 1.32 0.452 0.662 11066748 5 4h (1 RTX 3090)
Xu_SRCB-BIT_task4_3 Xu2022 0.79 0.054 0.774 11117798 5 4h (1 RTX 3090)
Xu_SRCB-BIT_task4_4 Xu2022 0.75 0.049 0.738 11081958 2 4h (1 RTX 3090)
Nam_KAIST_task4_SED_2 Nam2022 1.25 0.409 0.656 11061468 12 6h (1 RTX Titan)
Nam_KAIST_task4_SED_3 Nam2022 0.77 0.057 0.747 11061468 53 6h (1 RTX Titan)
Nam_KAIST_task4_SED_4 Nam2022 0.77 0.055 0.747 11061468 150 6h (1 RTX Titan)
Nam_KAIST_task4_SED_1 Nam2022 1.24 0.404 0.653 11061468 31 6h (1 RTX Titan)
Blakala_SRPOL_task4_3 Blakala2022 0.95 0.293 0.527 1.2M 4.5h (1 RTX 2080)
Blakala_SRPOL_task4_1 Blakala2022 1.11 0.365 0.584 1177663 8h (1 RTX 2080)
Blakala_SRPOL_task4_2 Blakala2022 0.78 0.069 0.728 5.3M 29h (1 RTX 2080)
Li_USTC_task4_SED_1 Li2022b 1.41 0.480 0.713 26842020 10 20h (2 GTX 3090)
Li_USTC_task4_SED_4 Li2022b 1.34 0.429 0.723 8052606 10 6h (2 GTX 3090)
Li_USTC_task4_SED_2 Li2022b 1.39 0.451 0.740 26842020 10 20h (2 GTX 3090)
Li_USTC_task4_SED_3 Li2022b 1.35 0.450 0.699 8052606 10 6h (2 GTX 3090)
Bertola_UPF_task4_1 Bertola2022 0.98 0.318 0.520 1112420 3h (1 GTX 1080 Ti)
He_BYTEDANCE_task4_4 He2022 0.82 0.053 0.810 15919068 40 8h (1 A100)
He_BYTEDANCE_task4_2 He2022 1.48 0.503 0.749 15919068 40 8h (1 A100)
He_BYTEDANCE_task4_3 He2022 1.52 0.525 0.748 15919068 16 8h (1 A100)
He_BYTEDANCE_task4_1 He2022 1.36 0.454 0.696 11061468 8 3h (1 A100)
Li_ICT-TOSHIBA_task4_2 Li2022d 0.79 0.090 0.709 224997445 10 (5 SEDT, 5 frame-wise model) 186 h (1 RTX A4000) + 35 h (3 RTX 2080 Ti)
Li_ICT-TOSHIBA_task4_4 Li2022d 0.75 0.075 0.692 188469803 9 (4 SEDT, 5 frame-wise model) 53h (1 RTX A4000) + 30 h (3 RTX 2080 Ti)
Li_ICT-TOSHIBA_task4_1 Li2022d 1.26 0.439 0.612 224997445 10 (5 SEDT, 5 frame-wise model) 186 h (1 RTX A4000) + 35 h (3 RTX 2080 Ti)
Li_ICT-TOSHIBA_task4_3 Li2022d 1.20 0.411 0.597 188469803 9 (4 SEDT, 5 frame-wise model) 53h (1 RTX A4000) + 30 h (3 RTX 2080 Ti)
Xie_UESTC_task4_2 Xie2022 0.83 0.062 0.800 166283314 8 20min (1 GTX 3080 Ti)
Xie_UESTC_task4_3 Xie2022 1.06 0.300 0.641 25054503 2 3h (1 GTX 3080 Ti)
Xie_UESTC_task4_1 Xie2022 1.36 0.418 0.757 225490527 8 2h (1 GTX 3080 Ti)
Xie_UESTC_task4_4 Xie2022 1.38 0.426 0.766 225490527 8 2h (1 GTX 3080 Ti)
Baseline (AudioSet) Ronchini2022 1.04 0.345 0.540 2200000 6h (1 GTX 1080 Ti)
Kim_CAUET_task4_1 Kim2022c 1.02 0.317 0.565 Trainable 1.7 M non-Trainable 1.7M 10h (1 RTX 2080 Ti)
Kim_CAUET_task4_2 Kim2022c 1.04 0.340 0.544 Trainable 1.1 M non-Trainable 1.1M 9h (1 RTX 2080 Ti)
Kim_CAUET_task4_3 Kim2022c 1.04 0.338 0.554 Trainable 1.1 M non-Trainable 1.1M 9h (1 RTX 2080 Ti)
Li_XJU_task4_1 Li2022c 1.10 0.364 0.570 4.2MB 7h (1 Titan RTX)
Li_XJU_task4_3 Li2022c 1.17 0.371 0.635 4.2MB 7h (1 Titan RTX)
Li_XJU_task4_4 Li2022c 0.93 0.195 0.683 4.2MB 7h (1 Titan RTX)
Li_XJU_task4_2 Li2022c 0.75 0.086 0.671 4.2MB 7h (1 Titan RTX)
Castorena_UV_task4_3 Castorena2022 0.91 0.267 0.531 1100000 4h (1 GTX 3060 Ti)
Castorena_UV_task4_1 Castorena2022 1.01 0.334 0.524 1100000 4h (1 GTX 3060 Ti)
Castorena_UV_task4_2 Castorena2022 0.63 0.072 0.559 1100000 4h (1 GTX 3060 Ti)

Technical reports

Data Augmentation Methods Exploration For Sound Event Detection

Bertola, Marco
Universitat Pompeu Fabra, Barcelona, Spain

Abstract

In this technical report is describe the submission of a system for DCASE2022 Task4: Sound Event Detection in Domestic Environments 2022 [1]. Sound Event Detection (SED) systems have gained great attention in the past few years, motivated by emerging applications in several different fields such as smart homes, autonomous cars, and healthcare. Their performances can heavily depend on the availability of a large amount of strongly labeled data. Generating or retrieving this data is often difficult and costly. The aim of this work is to explore, combine and compare different data augmentation techniques to balance out the lack of strongly labeled data. As conclusion, the best result is submitted to DCASE 2022 Task4 challenge.

System characteristics
PDF

Dcase 2022 Task 4 Technical Report

Kornel, Błakała and Sikorski, Olaf
Samsung R&D Intsitute Poland, Warsaw, Poland

Abstract

This paper describes our solution for Task 4 of the 2022 edition of the Detection and Classification of Acoustic Scenes and Events competition. Our solution practically consists of two specialised systems that excel in either of the two scenarios in the challenge. Both utilise the CRNN model architecture and mean-teacher training setup proposed in the baseline solution. The modifications that they share are the replacement of the CNN extractor with a ResNet-18 architecture and the reduction of the FFT window from 2048 to 1024 samples. The systems diverge in four aspects: the set of augmentations selected and whether they use any additional techniques during training. For Scenario 1 we observed improvement when using pitch shift, while all other data augmentation methods resulted in lower PSDS. On the other hand, Scenario 2 benefited greatly from spectrogram time warping and adding brown noise. Further improvement on Scenario 2 was achieved by replacing attention with mean aggregation for weak predictions, incorporating per-frame embeddings from Audio Spectrogram Transformer (AST) and injecting Gaussian noise between teacher and student during consistency loss calculation. Curiously, these modifications diminished performance on Scenario 1. The system specialising in Scenario 1 scored [0.3743, 0.5826] and the system specialising in Scenario 2 scored [0.0701, 0.7938] in [P SDS1, P SDS2] respectively.

System characteristics
PDF

Sound Event Detection System With Multiscale Channel Attention And Multiple Consistency Training For Dcase 2022 Task 4

Cheng, Yu-Han and Lu, Chung-Li and Chan, Bo-Cheng and Chuang, Hsiang-Feng
Chunghwa Telecom Laboratories, Taiwan

Abstract

In this technical report, we describe our submission system for DCASE 2022 Task4: sound event detection and separation in domestic environments. The proposed system is based on mean-teacher framework of semi-supervised learning and neural networks of CRNN. We employ consistency training of interpolation (ICT), shift (SCT), and clip-level (CCT) to enhance the generalization and representation. A multiscale CNN block is applied to extract various features to mitigate the influence of the event length diversity for the network. An efficient channel attention network (ECA-Net) and attention pooling enable the model to obtain definite sound event predictions. To further improve the performance, we use data augmentation including mixup, time shift, and filter augmentation. Our best system achieves the PSDS-scenario1 of 36.20% and PSDS-scenario2 of 63.45% on the validation set, significantly outperforming that of the baseline score of 32.93% and 53.22%, respectively.

System characteristics
PDF

Multi-Resolution Combination Of CRNN And Conformers For Dcase 2022 Task 4

de Benito-Gorron, Diego and Barahona, Sara and Segovia, Sergio and Ramos, Daniel and Toledano Doroteo
AUDIAS Research Group, Universidad Autónoma de Madrid, Madrid, Spain

Abstract

This technical report describes our submission to DCASE 2022 Task 4: Sound event detection in domestic environments. We follow a multi-resolution approach consisting on a late fusion of systems that are trained with different feature extraction parameters, aiming to leverage the characteristics of different event categories in time and frequency. Our systems are built upon the Convolutional-Recurrent Neural Network (CRNN) proposed by the baseline system and the Conformer structure proposed by the winners of the 2020 challenge.

System characteristics
PDF

A Large Multi-Modal Ensemble For Sound Event Detection

Dinkel, Heinrich and Yan, Zhiyong and Wang, Yongqing and Song, Meixu and Zhang, Junbo and Wang, Wang
Xiaomi Corporation, Beijing, China

Abstract

This paper is a system description of the XiaoRice team submission to the DCASE 2022 Task 4 challenge. Our method focuses on merging commonly used convolutional neural networks (CNNs) with transformer-based methods and recurrent-neural networks (RNNs). We deliberately divide our efforts into optimizing the two evaluation metrics for the challenge: the onset and offset sensitive PSDS-1 score and the clip-level PSDS-2 score. This work shows that a large ensemble of differently trained architectures and frameworks can lead to significant gains. Our PSDS-1 optimized system consists of an 11-way convolutional recurrent neural network (CRNN), Vision transformer (ViT) fusion, and achieves a PSDS-1 score of 48.19. Further, our PSDS-2 system comprised of a 6-way CNN and ViT fusion achieved a PSDS-2 score of 87.70 on the development dataset.

System characteristics
PDF

Pre-Training And Self-Training For Sound Event Detection In Domestic Environments

Ebbers, Janek and Haeb-Umbach, Reinhold
Paderborn University, Paderborn, Germany

Abstract

In this report we present our system for the Detection and Classification of Acoustic Scenes and Events (DCASE) 2022 Challenge Task 4: Sound Event Detection in Domestic Environments 1 . As in previous editions of the Challenge, we use forward-backward convolutional recurrent neural networks (FBCRNNs) [1, 2] for weakly labeled and semi-supervised sound event detection (SED) and eventually generate strong pseudo labels for weakly labeled and unlabeled data. Then, (tag-conditioned) bidirectional CRNNs (Bi-CRNNs) [1, 2] are trained in a strongly supervised manner as our final SED models. In each of the training stages we use multiple iterations of self-training. Compared to previous editions, we improved our system performance by 1) some tweaks regarding data augmentation, pseudo labeling and inference 2) using weakly labeled AudioSet data [3] for pretraining larger networks and 3) augmenting the DESED data [4] with strongly labeled AudioSet data [5] for finetuning of the networks. Source code is publicly available at https://github.com/fgnt/pb_sed.

System characteristics
PDF

Semi-Supervised Sound Event Detection Based On Mean Teacher With Selective Kernel Multiscale Convolution And Resident Cam Clastering

Qiao, Ziling and Gan, Yanggang and Wu, Juan and Cai, Xichang and Wu, Menglong and Dong, Hongxia and Zhang, Lin Zhang and Liu, Zihan
North China University of Technology, Beijing, China

Abstract

In this technical report, we present our submission system for DCASE 2022 Task4: sound event detection in domestic environments. The proposed system is based on mean teacher framework of semi-supervised learning and Selective Kernel Convolution Network. We use Multi-scale convolution to extract more abundant features of sound events. In order to improve the localization ability of the system, we use a dynamically selected attention mechanism called SK unit in CNN, which allows each neuron to adaptively adjust the size of its receptive field according to multiple scales of input information. Our system finally achieves the PSDS-scenario1 of 39.0% and PSDS-scenario2 of 58.50% on the validation set. In terms of innovative methods, this technical report will provide a technical description of system 2 submitted by the NCUT team. In system 2, the team selected the audio event monitoring method based on grad CAM clustering. This method attempts to use PANNs based migration learning network to generate grad CAM class activation diagram to locate the time point of the event. Finally, the adaptability of several different network models is evaluated, and the models with higher scores and better adaptability are probability fused to obtain the reasoning of events. Finally, the system 2 based on CAM clustering achieved 9.963% PSDS-scenario1 and 69.877% PSDS-scenario2 scores in the development data set.

System characteristics
PDF

Multi-Task Learning For Sound Event Detection Using Variational Autoencoders

Giannakopoulos, Petros1 and Pikrakis, Aggelos2
1National and Kapodistrian University of Athens, Athens, Greece 2University of Piraeus, Piraeus, Greece

Abstract

This technical report presents a multi-task learning model based on recurrent variational autoencoders (VAEs). The proposed method employs recurrent VAEs with shared parameters to simultaneously learn the tasks of strong labeling, weak labeling and feature sequence reconstruction. During the training stage, the model receives as input strongly labeled, weakly labeled data and unlabeled data and it simultaneously optimizes frame-based and file-based cross-entropy losses for strongly labeled and weakly labeled data, respectively, as well as the reconstruction loss for the unlabeled data. Using a shared posterior among all task branches, the model projects the input data for each task into a common latent space. The decoding of latents sampled from this common latent space, in combination with the shared parameters among task branches act jointly as a regularizer that prevents the model from overfitting to the individual tasks. The proposed method is evaluated on the DCASE-2022 Task4 dataset on which it achieves an event-based macro F1 score of 32.5% on the validation set and 31.8% on the public evaluation set.

System characteristics
PDF

Dcase 2022 Task4 Challenge Technical Report

Hao, Junyong and Ye, Shunzhou and Lu, Cheng and Dong, Fei and Liu, Jingang
UNISOC, Chongqing, China

Abstract

This report proposes a polyphonic sound event detection (SED) method for the DCASE 2022 Challenge Task 4-Sound Event Detection in Domestic Environments. We use the dataset of DESED to train our model, contains strongly labeled synthetic data, large unlabeled data, weakly labeled data and strongly labeled real data. To perform this task, we propose a DACRNN network for joint learning of SED and domain adaptation (DA).We consider the impact of the distribution within a single sound on the generalization performance of the model by mitigating the impact of complex background noise on event detection and the self-correlation consistency regularization of clip-level sound event classification, these make the intra-domain of a single sound smoother; for cross-domain adaptation, adversarial learning through feature extraction network with weighted frame-level domain discriminator. Experiments on the DCASE 2022 task4 validation dataset and public-evaluation dataset demonstrate the effectiveness of the techniques used in our system. Specifically, PSDS1 scores of 0.448 and PSDS2 scores of 0.853 are achieved for validation dataset, PSDS1 scores of 0.553 and PSDS2 scores of 0.836 are achieved for public-evaluation dataset.

System characteristics
PDF

Semi-Supervised Sound Event Detection System For Dcase 2022 Task 4

He, Kexin and Shu, Xin and Jia, Shaoyong and He, Yi
Bytedance AI Lab, Beijing, China

Abstract

In this report, we describe our submissions for the task 4 of Detection and Classification of Acoustic Scenes and Events (DCASE) 2022 Challenge: Sound Event Detection in Domestic Environments. Our methods are mainly based on two types of deep learning models: Convolutional Recurrent Neural Network with selective kernel convolution (SK-CRNN) and frequency dynamic convolution (FDY-CRNN). In order to prevent overfitting, we adopt data augmentation using mixup strategy, FilterAugment, Interpolation Consistency Training (ICT) and Shift Consistency Training (SCT). Besides, we utilize external data and pretrained model to further improve performance, and try an ensemble of multiple subsystems to enhance the generalization capability of our system. Our final systems achieve a PSDS1/PSDS2 score of 0.5331/0.8569 on development dataset.

System characteristics
PDF

Cht+Nsysu Sound Event Detection System With Different Kinds Of Pretrained Models For Dcase 2022 Task 4

Huang, Sung-Jen1 and Liu, Chia-Chuan1 and Chen, Chia-Ping1 and Lu, Chung-Li2 and Chan, Bo-Cheng2 and Cheng, Yu-Han2 and Chuang, Hsiang-Feng2
1National Sun Yat-Sen University, Taiwan 2Chunghwa Telecom Laboratories, Taiwan

Abstract

In this technical report, we describe our submission system for DCASE 2022 Task4: sound event detection in domestic environments. We proposed two kinds of systems. One is trained by combining the mean teacher framework and knowledge distillation (one student model and two teacher models) without external data. While training this system, we first trained a mean teacher model to be a pretrained model. Our next step is to select the better one, the teacher or student model, to be the trained model for knowledge distillation. Afterword, we trained another mean teacher model with a different architecture using knowledge distillation. Finally, we repeat the select model step and knowledge distillation several times. The mean teacher model in the final round is composed of a VGG block, selective kernels and a clip level consistency branch. Comparing to the PSDS-scenario1 of 35.1% and PSDS-scenario2 of 55.2% of the baseline system trained without external data, the ensemble of this kind of system can achieve 43.7% and 68.0%, respectively. The other system can be separated into two parts. The first part is the top three layers of pretrained PANNs, while the second part is a similar system to baseline with only three convolution blocks. Then we trained the whole system (included PANNs) with DESED data. Ensembleing this system, the PSDS-scenario1 and 2 of 46.5% and 76.7% outperforms the baseline system (trained with AST embedding) of 31.3% and 72.2%.

System characteristics
PDF

Fmsg-Ntu Submission For Dcase 2022 Task 4 On Sound Event Detection In Domestic Environments

Khandelwal, Tanmay1,2 and Das, Rohan Kumar1 and Koh, Andrew2 and Chng, Eng Siong2
1Fortemedia Singapore, Singapore 2Nanyang Technological University (NTU), Singapore

Abstract

In this work, we describe the jointly submitted systems by Fortemedia Singapore (FMSG) and Nanyang Technological University (NTU) for DCASE 2022 Task 4: sound event detection in domestic environments. The proposed framework is divided into two stages: Stage-1 focuses on the audio-tagging system, which assists the sound event detection system in Stage-2. We train the Stage-1 utilizing a strongly labeled set converted into weak predictions, a weakly labeled set, and an unlabeled set to develop an effective audio-tagging system. This audio-tagging system is then used to infer on the unlabeled set to generate reliable pseudo-weak labels, which are used together with the strongly labeled set and weakly labeled set to train the sound event detection system at Stage-2. In Stage-1, we used two different networks, which are frequency dynamic (FDY)-convolutional recurrent neural network (CRNN) and convolutional neural network (CNN)-14 based pretrained audio neural networks (PANNs) for our developed systems. While the system at Stage-2 is based on FDY-CRNN for all the systems submitted to the challenge. It is noted that the systems at both stages employ data augmentation to reduce the risk of overfitting, and apply adaptive post-processing techniques to further enhance the performance. On the DESED real validation dataset, we obtain the highest PSDS1 and PSDS2 of 0.474 and 0.840, respectively.

System characteristics
PDF

Sound Event Detection System Using Fixmatch For Dcase 2022 Challenge Task 4

Kim, Changmin and Yang, Siyoung
LG Electronics, Seoul, South Korea

Abstract

This technical report proposes a sound event detection (SED) system in domestic environments for DCASE 2022 challenge task 4. In this system, the training method consists of two stages. In the stage 1, mean teacher (MT) and interpolation consistency training (ICT) are used. In the stage 2, FixMatch is additionally applied. We adopted the frequency dynamic convolution recurrent neural network (FDY-CRNN) structure as our model. In order to further improve the performance of polyphonic sound detection score (PSDS) scenario 2, three techniques were used. First, we applied a temperature parameter to the sigmoid function to obtain soft confidence value. Second, we used a weak SED that is a method that uses only weak predictions and sets the timestamp equal to the total duration of the audio clip. Third, the FSD50K dataset was added to the weakly labeled dataset, which helped the PSDS scenario 2. As a result, we obtained the best PSDS scenario 1 of 0.473, and best PSDS scenario 2 of 0.695 on the domestic environment SED real validation dataset.

System characteristics
PDF

Semi-Supervised Learning-Based Sound Event Detection Using Frequency-Channel-Wise Selective Kernel For Dcase Challenge 2022 Task 4

Kim, Ji Won Kim1 and Lee, Geon Woo1 and Kim, Hong Kook1,2 and Seo, Yeon Sik3 and Song, Il Hoon3
1AI Graduate School, Gwangju, Korea 2Gwangju Institude of Science and Technology, Gwangju, Korea 3I Lab., R&D Center, Hanwha Techwin, Gyeonggi-do, Korea

Abstract

In this report, we propose a mean-teacher model-based sound event detection (SED) model that uses semi-supervised learning to the labeled data deficiency problem for the DCASE 2022 Challenge Task 4. The mean-teacher model of the proposed SED model is based on a residual convolutional recurrent neural network (RCRNN) architecture, and the residual convolutional blocks in the RCRNN are modified to include the frequency-wise and/or channel-wise selective kernel attention (SKA), which is hereafter referred to as SKA-RCRNN. This enables the RCRNN to have an adaptive receptive field for different lengths of audio. In particular, the proposed SKA-RCRNN-based SED model is first trained on the training dataset, during which it generated pseudo-labeled data for weakly labeled and unlabeled data. Next, the noisy student model, which is also based on SKA-RCRNN, in the second stage is optimized via semi-supervised learning by using strongly labeled and pseudo-labeled data. Finally, several ensemble models are obtained from fivefold cross-validation SED models with various hyper-parameters, and some of them are selected as the submitted models that show higher F1 and polyphonic sound detection scores on the validation dataset of the DCASE 2022 Challenge Task 4 are selected for submission.

System characteristics
PDF

The Cau-Et For Dcase 2022 Challenge Technical Reports

Kim, Narin and Lee, Sumi Lee and Kwak, Il youp
Chung-Ang University, Department of Applied Statistics, Seoul, South Korea

Abstract

In this technical report, We present a semi-supervised learning method using RCRNN for DCASE 2022 challenge Task 4. We applied three main methods to improve the performance of sound event detection(SED). The first is semi-supervised network using RCRNN based on mean teacher model. The CNN part consists of residual convolution block with a CBAM[1] self-attention module which is stacked 5-layers, and the classification was performed with the RNN part. The second is the application of different data augmentation to features with different types of labels. Mix up, frame shift, time shift, time masking, and filter augmentation were applied to features, mix up was applied differently to the strong label and the weak label, and time masking was applied only to the strong labeled data. The third is to feed features that give different noise to student models and teacher models through data augmentation.The weight of the student model was shared with the teacher model by injecting different feature noise so that it could converge to the global optical faster through consistency loss.

System characteristics
PDF

A Two-Stage Training Method For Dcase 2022 Challenge Task4

Li, Kang and Zheng, Xu and Song, Yan
University of Science and Technology of China, Hefei, China

Abstract

The goal of DCASE 2022 CHALLENGE TASK4 is to evaluate systems for the detection of sound events using real data either weakly labeled or unlabeled, simulated data that is strongly labeled and external data. In this technical report, we present a two-stage learning strategy based method to explore synthetic strong data and real strong data (from AudioSet). Specifically, a CRNN model is used as the baseline SED system for this year’s challenge. According to different supervisory signals from weakly-labeled and strongly-labeled data, the frame-level and clip-level tasks (i.e. SED and Audio Tagging (AT)) are designed. In the first stage, the model is trained on weakly labeled, unlabeled and synthetic data with strong labels under the semi-supervised learning framework, i.e. Mean Teacher (MT). There are two types of MT, including frame-level MT and clip-level MT, corresponding to the subsets with different supervisory signals. In the second stage, a new model is trained using pseudo-labeling scheme, in which the pre-trained teacher model is utilized to provide the pseudo-label of the real weakly and unlabeled data. Furthermore, we explore the strongly labeled real data as external one in both stages. Results on the DCASE2022 Task4 validation set verify the effectiveness of our proposed method with PSDS1 and PSDS2 of 0.479 and 0.785, outperforming the baseline results of 0.351 and 0.552 respectively.

System characteristics
PDF

An Effective Consistency Regularization Training Based Mean Teacher Method For Sound Event Detection

Li, Yunlong1,2 and Hu, Ying1,2 and Zhu, Xiujuan1,2 and Xie, Yin1,2 and Hou, Shijing1,2 and Wang, Liusong1,2 and Chen, Zihao1,2 Wang, Mingyu1,2 and Fang, Wenjie1,2
1Xinjiang University, Urumqi, China2Key Laboratory of Signal Detection and Processing in Xinjiang, Urumqi, China

Abstract

This technical report describes the system we submitted to DCASE2021 Task4: Sound Event Detection in Domestic Environments. Specifically, we apply three main techniques to improve the performance of the official baseline system. Firstly, to improve the detection and classification ability of the CRNN model, we propose to add an auxiliary branch to the CRNN network. Consistency loss of mean teacher method is improved by auxiliary branch. Secondly, we propose to add an MDTC module to the CRNN network so that the receptive fields of the network can be adjusted according to the short-term and long-term correlation. Thirdly, several data-augmentation strategies are adopted to improve the generalization capability of the network. Experiments on the DCASE2022 Task4 validation dataset demonstrate the effectiveness of the techniques used in our system. As a result, the best PSDS1 is 0.408 and the best PSDS2 is 0.754.

System characteristics
PDF

A Hybrid System Of Sound Event Detection Transformer And Frame-Wise Model For Dcase 2022 Task 4

Li, Yiming1,2 and Guo, Zhifang1,2 and Ye, Zhirong1,2 and Wang , Xiangdong1,2 and Liu, Hong1 and Qian, Yueliang1 and Tao, Rui3 and Yan, Long3 and Ouchi, Kazushige3
1Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China 3Toshiba China R&D Center, Beijing, China

Abstract

In this technical report, we describe in detail our system for DCASE 2022 Task4. The system combines two considerably different models: an end-to-end Sound Event Detection Transformer (SEDT) and a frame-wise model (MLFL-CNN). The former is an event-wise model which learns event-level representations and predicts sound event categories and boundaries directly, while the latter is based on the widely-adopted frame-classification scheme, under which each frame is classified into event categories and event boundaries are obtained by post-processing such as thresholding and smoothing. For SEDT, self-supervised pre-training using unlabeled data is applied, and semi-supervised learning is adopted by using an online teacher, which is updated from the student model using the EMA strategy and generates reliable pseudo labels for weakly-labeled and unlabeled data. For the frame-wise model, the ICT-TOSHIBA system of DCASE 2021 Task 4 is used, which incorporates techniques such as focal loss and metric learning into a CNN model to form the MLFL-CNN model, adopts mean-teacher for semi-supervised learning, and uses a tag-condition CNN model to predict final results using the output of MLFL-CNN. Experimental results show that the hybrid system considerably outperforms either individual model, and achieves psds1 of 0.420 and psds2 of 0.783 on the validation set without external data. The code is available at https://github.com/965694547/Hybrid-system-of-frame-wise-model-and-SEDT.

System characteristics
PDF

Dcase 2022 Challenge Task4 Technical Report

Chen, Minjun1 and Wang, Tian1 and Shao, Jun1 and Tang, Yiqi1 and Liu, Yangyang1 and Peng, Bo1 and Chen, Jie1 and Shao, Xi2
1Samsung Research China-Nanjing, Nanjing, China 2College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China

Abstract

We describe our submitted systems for DCASE2022 Task4 in this technical report: Sound Event Detection in Domestic Environments. We propose three models to solve this problem. In the first model, we try to utilize all the training data provided. To be specific, firstly, we employ a joint model both for event classification and location based on strongly labeled data and weakly labeled data to propagate the clip level annotations on the unlabeled dataset, which is so called pseudo-label dataset. In order to link frame level strongly annotations with the weakly annotations, we introduce weighted average pooling scheme. Finally, the joint model trained on strongly labeled data, weakly labeled data and pseudo-label data are employed to solve the Task 4 problem. To utilize the external dataset and pre-trained model, we proposal a system which use pre-trained model to extract embedding, and to train a RNN decode to generate prediction finally. And the third system with some data augmentation methods based on the baseline CRNN. Our proposed systems achieve poly-phonic sound event detection scores (PSDS-scores) of 0.4428 (PSDS1) and 0.8266 (PSDS-scenario2) respectively on development dataset.

System characteristics
PDF

Mizobuchi Pco Team’s Submission For Dcase2022 Task4 -- Sound Event Detection Using External Resources

Mizobuchi, Shohei and Ohashi, Hiromasa and Izumi, Akitoshi and Kodama, Nobutaka
Advanced Research Lab., R&D Division, Panasonic Connect Co., Ltd., Fukuoka, Japan

Abstract

In this Technical report, we describe an overview and performance of the system we submitted for DCASE 2022 Task 4. We submitted the following 4 systems. System 1 is aimed to improve the performance of PSDS1 under the condition that external resources are not used. System 2 uses AudioSet as additional training dataset on System 1. System 3 uses System 1 with additional training dataset including not only AudioSet dataset but also synthetic dataset generated by ourselves, and changes the training conditions to improve the performance of PSDS2. System 4 adds PANNs pretrained model to System 3. The highest performance evaluated using “ development dataset ” in these systems is 0.4489 for PSDS1 and 0.8519 for PSDS2. Details will be described below.

System characteristics
PDF

Frequency Dependent Sound Event Detection For Dcase 2022 Challenge Task 4

Nam, Hyeonuk and Kim, Seong-Hu and Min, Deokki and Ko, Byeong-Yun and Choi, Seung-Deok and Park, Yong-Hwa
Korea Advanced Institute of Science and Technology, Daejeon, South Korea

Abstract

While many deep learning methods on other domains have been applied to sound event detection (SED), differences between original domains of the methods and SED have not been appropriately considered so far. As SED uses audio data with two dimensions (time and frequency) for input, thorough comprehension on these two dimensions is essential for application of methods from other domains on SED. Previous works proved that methods those address on frequency dimension are especially powerful in SED. By applying FilterAugment and frequency dynamic convolution those are frequency dependent methods proposed to enhance SED performance, our submitted models achieved best PSDS 1 of 0.4704 and best PSDS 2 of 0.8224.

System characteristics
PDF

SKATTN team’s submission for DCASE 2022 Task 4 -- Sound Event Detection in Domestic Environments

Ryu, Myeonghoon and Byun, Jeunghyun and Oh, Hongseok and Lee, Suji and Park, Han
Deeply Inc. Seoul, South Korea

Abstract

In this technical report, we present our submitted system for DCASE 2022 Task4: Sound Event Detection in Domestic Environments. There are two main aspects we considered to improve the performance of the official baseline system: (1) use of external datasets (2) designing a novel model SKATTN. Our newly proposed SKATNN model combines Selective Kernel Network (SKNet) with the self-attention blocks from the Transformer model. Motivated from the SKNet’s successful applications in Computer Vision and Audio domains, we adopted SKNet as a feature extractor for processing the input mel-spectrogram. We used self-attention blocks to process the spectro-temporal features since they are flexible in modeling short and long-range dependencies while being less susceptible to vanishing gradients which commonly occur in RNNs. Experiments on DCASE2022 task 4 validation dataset demonstrate that our system achieves PSDS1 + PSDS2 = 1.372 on the validation dataset, outperforming 0.872 of the baseline system.

System characteristics
PDF

Atst Self-Supervised Plus Rct Semi-Supervised Sound Event Detection: Submission To Dcase 2022 Challenge Task 4

Shao, Nian and Li, Xian and Li, Xiaofei
Westlake University & Westlake Institute for Advanced Study, Hangzhou, China

Abstract

In this report, we present our methods proposed for participating the Detection and Classification of Acoustic Scenes and Events (DCASE) 2022 Challenge Task 4: Sound Event Detection in Domestic Environments. The proposed methods integrate a semi-supervised sound event detection model (called random consistency training, RCT) trained with the relatively small official dataset of the challenge, and a self-supervised model (called audio teacher-student transformer, ATST) trained with the very large AudioSet. RCT uses the baseline convolutional recurrent neural network (CRNN) of the challenge, and adopts a newly proposed semi-supervised learning scheme based on random data augmentation and a self-consistency loss. To integrate ATST into RCT, the feature extracted by ATST is concatenated with the feature extracted by the convolutional layers of RCT, and then fed to the RNN layers of RCT. It is found that these two types of feature are complementary and the performance can be largely improved by combining them. In development, RCT individually achieves 39.80% and 61.12% of P SDS 1 and P SDS 2 , respectively, which are improved to 45.99% and 70.65% by integrating the ATST feature, and further to 47.71% and 73.44% by ensembling five models with different training configurations.

System characteristics
PDF

Hyu Submission For Dcase 2022 Task 4 -- Pa-Net: Patch-Based Attention For Sound Event Detection

Kim, Sojeong
Hanyang University, Seoul, Korea

Abstract

In this paper, we describe details about submitted systems for DCASE 2022 challenge task 4: sound event detection in domestic environments. We focus on how to effectively use a spectrogram as input for SED model since it has different time-frequency characteristics. Frequencies have various characteristics for some reasons like recording devices and type of sound event. Specifically, each time frame has different features from each other due to uncertainty on whether any sound event may happen or not in an audio clip and what type of sound event. Therefore, we propose a patch attention(PA) mechanism capturing patch-range dependencies across input sequences so that the model can learn by training with important local information. We use PA with efficient channel attention for learning important channels in feature maps. In addition, we adopt a strategy called subspectral normalization (SSN), which split the input frequencies into multiple sub-groups and normalizes each group to stand out specific features. Experiments result on the DESED 2022 validation dataset show that our proposed model outperforms the baseline system. Particularly, our model demonstrates improvement in performance on PSDS scores of 0.4438 and 0.683 on scenario1 and scenario2 respectively.

System characteristics
PDF

Data Engineering For Noisy Student Model In Sound Event Detection

Suh, Sangwon and Lee, Dong Youn
ReturnZero, Seoul, Korea

Abstract

This report describes the Sound Event Detection (SED) system for DCASE2022 Task4. We focused on combining data augmentation techniques for the SED mean-teacher system and selecting trainable samples from AudioSet. The neural architecture follows the baseline CRNN model, but a frequency dynamic convolution replaces each convolution layer except the first one. The cost function was also constructed identically to the baseline, but an asymmetric focal loss was used instead of binary cross-entropy for training the AudioSet. The best metrics in the validation set of our experiments were 0.473, 0.723 for PSDS 1 and 2, and 56.9% for color-based F1 scores.

System characteristics
PDF

Pretrained Models In Sound Event Detection For Dcase 2022 Challenge Task4

Xiao, Shengchang
University of Chinese Academy of Sciences, department of Electronic Engineering, Beijing, China

Abstract

In this technical report, we describe our submitted systems for dcase 2022 Challenge Task4: Sound Event Detection in Domestic Environments. Specifically, we submit two different systems respectively for PSDS1 and PSDS2. As PSDS2 focuses on avoiding confusion between classes rather than the localization of sound events, we only predict weak labels of clips to improve PSDS2. Moreover, we apply the pretrained neural networks including PANNs and SSAST in our systems to improve the generalization and robustness of our models. These pretrained models trained on large-scale datasets such as audioset can effectively alleviate the problems of lack of real training data. We fuse multiple pretrained models to make full use of the information of external data, which significantly improve the performance of our systems. In addition, we use various data augmentation techniques to expand provided data. According to the character of each sound event, we use the classwise median filter and further classify some confusing events. As a result, we achieve the best PSDS1 of of 0.481 and best PSDS2 of 0.826 on the DESED real validation dataset.

System characteristics
PDF

Semi-Supervised Sound Event Detection Using Pretrained Model

Xie, Rong and Shi, Chuang and Zhang, Le and Li, Huiyong
School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China.

Abstract

In this technical report, submitted systems for DCASE 2022 Task4 are described. Early output embeddings of CNN14 in PANNs with a CRNN is designed to achieve a good performance on PSDS-scenario1. The fully connected (FC) layer of CNN14 is replaced by output 10 categories for PSDS-scenario 2. Submitted systms achieve an overall PSDS-scores of 1.31 (0.460 for PSDS scenario 1 and 0.856 for PSDS scenario 2) on test set.

System characteristics
PDF

Srcb-Bit Team’s Submission For Dcase2022 Task4

Xu, Liang1,2 and Wang, Lizhong2 and Bi, Sijun1 and Liu, Hanyue1 and Wang, Jing1 and Zhao, Shenghui1 and Zheng, Yuxing2
1School of Information and Electronics, Beijing Institute of Technology, Beijing, China 2Samsung Research China-Beijing (SRC-B), Beijing, China

Abstract

In this technical report, we present our submitted system for DCASE2022 Task4: Sound Event Detection in Domestic Environments. We propose three main ways to improve the performance of the network. First, we use the frequency dynamic convolution (FDY) which applies kernel that adapts to frequency components of input to improve physical inconsistency in 2D convolution on sound event detection (SED). Then, we propose a weight raised temporal contrastive loss based coherence learning to improve the continuity of event prediction and the switching efficiency of event boundaries. Third, we use pre-trained model PANNS in this task and propose two methods to fuse the features from PANNs and our model which improve the PSDS1 and PSDS2 score respectively. The system we submitted is based on the mean-teacher architecture, and the PSDS1 and PSDS2 score on the development dataset can reach 0.482 and 0.835 respectively.

System characteristics
PDF