Task description
More detailed task description can be found in the task description page
All confindence intervals are computed based on the three runs per systems and bootstrapping on the evaluation set.
Team Ranking
Tables including only the best ranking score per submitting team without ensembling.
Rank |
Submission code (PSDS 1) |
Submission code (PSDS 2) |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS 1 (Evaluation dataset) |
PSDS 2 (Evaluation dataset) |
---|---|---|---|---|---|---|
Kim_GIST-HanwhaVision_task4a_2 | Kim_GIST-HanwhaVision_task4a_3 | Kim2023 | 1.68 | 0.591 (0.574 - 0.611) | 0.835 (0.826 - 0.846) | |
Zhang_IOA_task4a_6 | Zhang_IOA_task4a_7 | Zhang2023 | 1.63 | 0.562 (0.552 - 0.575) | 0.830 (0.820 - 0.842) | |
Wenxin_TJU_task4a_6 | Wenxin_TJU_task4a_6 | Wenxin2023 | 1.61 | 0.546 (0.536 - 0.556) | 0.831 (0.823 - 0.842) | |
Xiao_FMSG_task4a_4 | Xiao_FMSG_task4a_4 | Xiao2023 | 1.60 | 0.551 (0.543 - 0.562) | 0.813 (0.802 - 0.827) | |
Guan_HIT_task4a_3 | Guan_HIT_task4a_4 | Guan2023 | 1.60 | 0.526 (0.513 - 0.539) | 0.855 (0.844 - 0.867) | |
Chen_CHT_task4a_2 | Chen_CHT_task4a_2 | Chen2023b | 1.58 | 0.563 (0.550 - 0.574) | 0.779 (0.768 - 0.792) | |
Li_USTC_task4a_6 | Li_USTC_task4a_6 | Wenxin2023 | 1.56 | 0.546 (0.529 - 0.562) | 0.783 (0.771 - 0.796) | |
Liu_NSYSU_task4a_7 | Liu_NSYSU_task4a_7 | Liu2023 | 1.55 | 0.521 (0.510 - 0.531) | 0.813 (0.796 - 0.831) | |
Cheimariotis_DUTH_task4a_1 | Cheimariotis_DUTH_task4a_1 | Cheimariotis2023 | 1.53 | 0.516 (0.504 - 0.529) | 0.796 (0.784 - 0.808) | |
Baseline_BEATS | Baseline_BEATS | 1.52 | 0.510 (0.496 - 0.523) | 0.798 (0.782 - 0.811) | ||
Baseline | Baseline | 1.00 | 0.327 (0.317 - 0.339) | 0.538 (0.515 - 0.566) | ||
Wang_XiaoRice_task4a_1 | Wang_XiaoRice_task4a_1 | Wang2023 | 1.50 | 0.494 (0.477 - 0.510) | 0.801 (0.789 - 0.815) | |
Lee_CAUET_task4a_1 | Lee_CAUET_task4a_2 | Lee2023 | 1.28 | 0.425 (0.415 - 0.440) | 0.674 (0.661 - 0.690) | |
Liu_SRCN_task4a_4 | Liu_SRCN_task4a_4 | Chen2023a | 1.25 | 0.412 (0.400 - 0.424) | 0.663 (0.652 - 0.676) | |
Barahona_AUDIAS_task4a_2 | Barahona_AUDIAS_task4a_4 | Barahona2023 | 1.21 | 0.380 (0.361 - 0.406) | 0.673 (0.652 - 0.700) | |
Wu_NCUT_task4a_1 | Wu_NCUT_task4a_1 | Wu2023 | 1.15 | 0.391 (0.379 - 0.405) | 0.596 (0.584 - 0.610) | |
Gan_NCUT_task4a_1 | Gan_NCUT_task4a_1 | Gan2023 | 1.12 | 0.365 (0.353 - 0.377) | 0.603 (0.589 - 0.617) |
With ensembling
Rank |
Submission code (PSDS 1) |
Submission code (PSDS 2) |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS 1 (Evaluation dataset) |
PSDS 2 (Evaluation dataset) |
---|---|---|---|---|---|---|
Zhang_IOA_task4a_4 | Zhang_IOA_task4a_2 | Zhang2023 | 1.80 | 0.625 (0.615 - 0.637) | 0.903 (0.895 - 0.911) | |
Kim_GIST-HanwhaVision_task4a_8 | Kim_GIST-HanwhaVision_task4a_5 | Kim2023 | 1.72 | 0.612 (0.599 - 0.626) | 0.846 (0.838 - 0.855) | |
Liu_SRCN_task4a_1 | Liu_SRCN_task4a_2 | Chen2023a | 1.71 | 0.585 (0.572 - 0.598) | 0.877 (0.867 - 0.885) | |
Chen_CHT_task4a_3 | Chen_CHT_task4a_4 | Chen2023b | 1.67 | 0.596 (0.585 - 0.606) | 0.820 (0.810 - 0.831) | |
Wenxin_TJU_task4a_2 | Wenxin_TJU_task4a_2 | Wenxin2023 | 1.66 | 0.570 (0.559 - 0.580) | 0.844 (0.836 - 0.854) | |
Li_USTC_task4a_2 | Li_USTC_task4a_4 | Li2023 | 1.64 | 0.556 (0.544 - 0.569) | 0.852 (0.843 - 0.863) | |
Xiao_FMSG_task4a_5 | Xiao_FMSG_task4a_8 | Xiao2023 | 1.62 | 0.555 (0.545 - 0.567) | 0.834 (0.824 - 0.847) | |
Liu_NSYSU_task4a_6 | Liu_NSYSU_task4a_6 | Liu2023 | 1.62 | 0.552 (0.540 - 0.563) | 0.838 (0.829 - 0.848) | |
Guan_HIT_task4a_1 | Guan_HIT_task4a_2 | Guan2023 | 1.62 | 0.536 (0.526 - 0.546) | 0.862 (0.852 - 0.872) | |
Gan_NCUT_task4a_2 | Gan_NCUT_task4a_3 | Gan2023 | 1.54 | 0.511 (0.498 - 0.524) | 0.816 (0.805 - 0.828) | |
Wang_XiaoRice_task4a_2 | Wang_XiaoRice_task4a_3 | Wang2023 | 1.53 | 0.497 (0.486 - 0.510) | 0.835 (0.824 - 0.844) | |
Wu_NCUT_task4a_2 | Wu_NCUT_task4a_2 | Wu2023 | 1.53 | 0.519 (0.507 - 0.531) | 0.793 (0.783 - 0.806) | |
Cheimariotis_DUTH_task4a_1 | Cheimariotis_DUTH_task4a_1 | Cheimariotis2023 | 1.53 | 0.516 (0.504 - 0.529) | 0.796 (0.784 - 0.808) | |
Barahona_AUDIAS_task4a_6 | Barahona_AUDIAS_task4a_8 | Barahona2023 | 1.29 | 0.401 (0.390 - 0.414) | 0.729 (0.710 - 0.752) | |
Lee_CAUET_task4a_1 | Lee_CAUET_task4a_2 | Lee2023 | 1.28 | 0.425 (0.415 - 0.440) | 0.674 (0.661 - 0.690) | |
Baseline_BEATS | Baseline_BEATS | 1.52 | 0.510 (0.496 - 0.523) | 0.798 (0.782 - 0.811) | ||
Baseline | Baseline | 1.00 | 0.327 (0.317 - 0.339) | 0.538 (0.515 - 0.566) |
Systems ranking
Performance obtained without ensembling.
Rank |
Submission code |
Submission name |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS 1 (Evaluation dataset) |
PSDS 2 (Evaluation dataset) |
PSDS 1 (Development dataset) |
PSDS 2 (Development dataset) |
---|---|---|---|---|---|---|---|---|
Baseline_task4a_1 | DCASE2023 baseline system | Turpault2023 | 1.00 | 0.327 (0.317 - 0.339) | 0.538 (0.515 - 0.566) | 0.359 | 0.562 | |
Baseline_task4a_2 | DCASE2023 baseline system (Audioset+Beats) | Turpault2023 | 1.52 | 0.510 (0.496 - 0.523) | 0.798 (0.782 - 0.811) | 0.491 | 0.787 | |
Li_USTC_task4a_5 | TAFT and SdMT and single | Li2023 | 1.52 | 0.531 (0.520 - 0.544) | 0.762 (0.751 - 0.773) | 0.555 | 0.791 | |
Li_USTC_task4a_6 | Pseudo labeling and single | Li2023 | 1.56 | 0.546 (0.529 - 0.562) | 0.783 (0.771 - 0.796) | 0.552 | 0.795 | |
Li_USTC_task4a_7 | SKCRNN MT | Li2023 | 1.20 | 0.404 (0.389 - 0.421) | 0.630 (0.612 - 0.648) | 0.451 | 0.662 | |
Liu_NSYSU_task4_3 | DCASE2023 VGGSK_Single | Liu2023 | 1.26 | 0.434 (0.420 - 0.448) | 0.646 (0.633 - 0.660) | 0.437 | 0.682 | |
Liu_NSYSU_task4_4 | DCASE2023 FDY_Single | Liu2023 | 1.24 | 0.413 (0.394 - 0.438) | 0.655 (0.638 - 0.673) | 0.456 | 0.687 | |
Liu_NSYSU_task4_7 | DCASE2023 FDY_BEATs | Liu2023 | 1.55 | 0.521 (0.510 - 0.531) | 0.813 (0.796 - 0.831) | 0.492 | 0.800 | |
Liu_NSYSU_task4_8 | DCASE2023 FDY_BEATs | Liu2023 | 1.53 | 0.515 (0.488 - 0.536) | 0.805 (0.791 - 0.818) | 0.511 | 0.780 | |
Lee_CAU_task4A_1 | CAU_ET | Lee2023 | 1.24 | 0.425 (0.415 - 0.440) | 0.634 (0.618 - 0.648) | 0.437 | 0.654 | |
Lee_CAU_task4A_2 | CAU_ET | Lee2023 | 0.79 | 0.104 (0.090 - 0.117) | 0.674 (0.661 - 0.690) | 0.070 | 0.734 | |
Cheimariotis_DUTH_task4a_1 | DuthApida | Cheimariotis2023 | 1.53 | 0.516 (0.504 - 0.529) | 0.796 (0.784 - 0.808) | 0.496 | 0.788 | |
Cheimariotis_DUTH_task4a_2 | DuthApida | Cheimariotis2023 | 1.45 | 0.487 (0.475 - 0.502) | 0.759 (0.745 - 0.773) | 0.516 | 0.781 | |
Chen_CHT_task4_1 | VGGSK | Chen2023b | 1.25 | 0.441 (0.403 - 0.468) | 0.620 (0.567 - 0.652) | 0.424 | 0.633 | |
Chen_CHT_task4_2 | VGGSK+BEATs | Chen2023b | 1.58 | 0.563 (0.550 - 0.574) | 0.779 (0.768 - 0.792) | 0.529 | 0.780 | |
Xiao_FMSG_task4a_1 | Xiao_FMSG_task4a_1_single_model_without_external | Xiao2023 | 1.23 | 0.403 (0.392 - 0.417) | 0.660 (0.646 - 0.672) | 0.464 | 0.711 | |
Xiao_FMSG_task4a_2 | Xiao_FMSG_task4a_2_single_model | Xiao2023 | 1.55 | 0.525 (0.516 - 0.538) | 0.808 (0.796 - 0.821) | 0.543 | 0.801 | |
Xiao_FMSG_task4a_3 | Xiao_FMSG_task4a_3_single_model_psds2 | Xiao2023 | 0.86 | 0.071 (0.062 - 0.080) | 0.807 (0.796 - 0.818) | 0.098 | 0.845 | |
Xiao_FMSG_task4a_4 | Xiao_FMSG_task4a_4_single_model | Xiao2023 | 1.60 | 0.551 (0.543 - 0.562) | 0.813 (0.802 - 0.827) | 0.539 | 0.793 | |
Guan_HIT_task4a_3 | Guan_HIT_task4a_3 | Guan2023 | 1.55 | 0.526 (0.513 - 0.539) | 0.800 (0.788 - 0.813) | 0.517 | 0.782 | |
Guan_HIT_task4a_4 | Guan_HIT_task4a_4 | Guan2023 | 0.92 | 0.082 (0.073 - 0.091) | 0.855 (0.844 - 0.867) | 0.113 | 0.885 | |
Wang_XiaoRice_task4a_1 | SINGLE | Wang2023 | 1.50 | 0.494 (0.477 - 0.510) | 0.801 (0.789 - 0.815) | 0.527 | 0.790 | |
Zhang_IOA_task4_5 | base system | Zhang2023 | 1.52 | 0.524 (0.513 - 0.537) | 0.774 (0.762 - 0.786) | 0.498 | 0.746 | |
Zhang_IOA_task4_6 | strong_single | Zhang2023 | 1.60 | 0.562 (0.552 - 0.575) | 0.795 (0.786 - 0.805) | 0.552 | 0.794 | |
Zhang_IOA_task4_7 | weak single | Zhang2023 | 0.86 | 0.055 (0.048 - 0.064) | 0.830 (0.820 - 0.842) | 0.065 | 0.865 | |
Wu_NCUT_task4a_1 | Wu_NCUT_task4a_1 | Wu2023 | 1.15 | 0.391 (0.379 - 0.405) | 0.596 (0.584 - 0.610) | 0.429 | 0.644 | |
Barahona_AUDIAS_task4a_1 | CRNN T++ resolution | Barahona2023 | 1.06 | 0.351 (0.333 - 0.372) | 0.562 (0.532 - 0.587) | 0.374 | 0.575 | |
Barahona_AUDIAS_task4a_2 | CRNN T++ resolution with class-wise median filtering | Barahona2023 | 1.12 | 0.380 (0.361 - 0.406) | 0.575 (0.553 - 0.594) | 0.387 | 0.585 | |
Barahona_AUDIAS_task4a_3 | Conformer F+ resolution | Barahona2023 | 0.91 | 0.200 (0.164 - 0.225) | 0.646 (0.626 - 0.664) | 0.224 | 0.696 | |
Barahona_AUDIAS_task4a_4 | Conformer F+ resolution with class-wise median filtering | Barahona2023 | 0.84 | 0.141 (0.124 - 0.155) | 0.673 (0.652 - 0.700) | 0.164 | 0.740 | |
Gan_NCUT_task4_1 | Gan_NCUT_SED_system_1 | Gan2023 | 1.12 | 0.365 (0.353 - 0.377) | 0.603 (0.589 - 0.617) | 0.402 | 0.620 | |
Liu_SRCN_task4a_4 | DCASE2023 t4a system4 | Chen2023a | 1.25 | 0.412 (0.400 - 0.424) | 0.663 (0.652 - 0.676) | 0.436 | 0.675 | |
Kim_GIST-HanwhaVision_task4a_1 | DCASE2023 FDY-LKA CRNN without external single | Kim2023 | 1.35 | 0.459 (0.431 - 0.484) | 0.701 (0.681 - 0.720) | 0.471 | 0.715 | |
Kim_GIST-HanwhaVision_task4a_2 | FDYLKA BEATs pool1d Stage2 | Kim2023 | 1.68 | 0.591 (0.574 - 0.611) | 0.831 (0.823 - 0.841) | 0.546 | 0.807 | |
Kim_GIST-HanwhaVision_task4a_3 | LKAFDY BEATs Stage 2 interpolate | Kim2023 | 1.66 | 0.581 (0.553 - 0.600) | 0.835 (0.826 - 0.846) | 0.543 | 0.806 | |
Wenxin_TJU_task4a_5 | single-pretrained-psds1-0 | Wenxin2023 | 1.58 | 0.539 (0.528 - 0.549) | 0.816 (0.806 - 0.831) | 0.521 | 0.793 | |
Wenxin_TJU_task4a_6 | single-pretrained-psds1-1 | Wenxin2023 | 1.61 | 0.546 (0.536 - 0.556) | 0.831 (0.823 - 0.842) | 0.512 | 0.808 | |
Wenxin_TJU_task4a_7 | single-psds1 | Wenxin2023 | 1.31 | 0.440 (0.429 - 0.454) | 0.686 (0.673 - 0.699) | 0.460 | 0.699 | |
Wenxin_TJU_task4a_8 | single-psds2 | Wenxin2023 | 0.75 | 0.059 (0.049 - 0.068) | 0.707 (0.694 - 0.723) | 0.067 | 0.781 |
Supplementary metrics
Rank |
Submission code |
Submission name |
Technical Report |
PSDS 1 (Evaluation dataset) |
PSDS 1 (Public evaluation) |
PSDS 1 (Vimeo dataset) |
PSDS 2 (Evaluation dataset) |
PSDS 2 (Public evaluation) |
PSDS 2 (Vimeo dataset) |
F-score (Evaluation dataset) |
F-score (Public evaluation) |
F-score (Vimeo dataset) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Baseline_task4a_1 | DCASE2023 baseline system | Turpault2023 | 0.327 (0.317 - 0.339) | 0.366 (0.347 - 0.385) | 0.247 (0.220 - 0.275) | 0.538 (0.515 - 0.566) | 0.580 (0.552 - 0.612) | 0.430 (0.397 - 0.466) | 0.377 (0.351 - 0.402) | 0.408 (0.379 - 0.441) | 0.299 (0.269 - 0.330) | |
Baseline_task4a_2 | DCASE2023 baseline system (Audioset+Beats) | Turpault2023 | 0.510 (0.496 - 0.523) | 0.560 (0.541 - 0.579) | 0.414 (0.395 - 0.435) | 0.798 (0.782 - 0.811) | 0.841 (0.829 - 0.853) | 0.697 (0.671 - 0.718) | 0.567 (0.544 - 0.588) | 0.603 (0.571 - 0.629) | 0.480 (0.454 - 0.504) | |
Li_USTC_task4a_5 | TAFT and SdMT and single | Li2023 | 0.531 (0.520 - 0.544) | 0.577 (0.562 - 0.595) | 0.431 (0.409 - 0.457) | 0.762 (0.751 - 0.773) | 0.800 (0.789 - 0.812) | 0.663 (0.637 - 0.688) | 0.599 (0.584 - 0.613) | 0.634 (0.620 - 0.653) | 0.509 (0.477 - 0.543) | |
Li_USTC_task4a_6 | Pseudo labeling and single | Li2023 | 0.546 (0.529 - 0.562) | 0.593 (0.573 - 0.614) | 0.451 (0.429 - 0.473) | 0.783 (0.771 - 0.796) | 0.810 (0.797 - 0.825) | 0.703 (0.679 - 0.724) | 0.603 (0.589 - 0.615) | 0.635 (0.618 - 0.651) | 0.523 (0.502 - 0.545) | |
Li_USTC_task4a_7 | SKCRNN MT | Li2023 | 0.404 (0.389 - 0.421) | 0.451 (0.431 - 0.473) | 0.303 (0.281 - 0.320) | 0.630 (0.612 - 0.648) | 0.673 (0.654 - 0.693) | 0.502 (0.466 - 0.532) | 0.478 (0.467 - 0.489) | 0.516 (0.502 - 0.530) | 0.384 (0.363 - 0.405) | |
Liu_NSYSU_task4_3 | DCASE2023 VGGSK_Single | Liu2023 | 0.434 (0.420 - 0.448) | 0.489 (0.472 - 0.506) | 0.314 (0.292 - 0.334) | 0.646 (0.633 - 0.660) | 0.704 (0.688 - 0.721) | 0.510 (0.485 - 0.531) | 0.468 (0.447 - 0.485) | 0.500 (0.477 - 0.518) | 0.389 (0.359 - 0.414) | |
Liu_NSYSU_task4_4 | DCASE2023 FDY_Single | Liu2023 | 0.413 (0.394 - 0.438) | 0.461 (0.438 - 0.488) | 0.318 (0.293 - 0.342) | 0.655 (0.638 - 0.673) | 0.715 (0.693 - 0.737) | 0.527 (0.500 - 0.549) | 0.484 (0.470 - 0.497) | 0.518 (0.502 - 0.535) | 0.397 (0.369 - 0.422) | |
Liu_NSYSU_task4_7 | DCASE2023 FDY_BEATs | Liu2023 | 0.521 (0.510 - 0.531) | 0.569 (0.555 - 0.586) | 0.424 (0.404 - 0.446) | 0.813 (0.796 - 0.831) | 0.858 (0.839 - 0.876) | 0.717 (0.694 - 0.742) | 0.564 (0.551 - 0.575) | 0.598 (0.586 - 0.611) | 0.481 (0.460 - 0.503) | |
Liu_NSYSU_task4_8 | DCASE2023 FDY_BEATs | Liu2023 | 0.515 (0.488 - 0.536) | 0.564 (0.532 - 0.587) | 0.416 (0.390 - 0.440) | 0.805 (0.791 - 0.818) | 0.850 (0.832 - 0.868) | 0.699 (0.676 - 0.719) | 0.553 (0.529 - 0.574) | 0.586 (0.557 - 0.608) | 0.469 (0.446 - 0.498) | |
Lee_CAU_task4A_1 | CAU_ET | Lee2023 | 0.425 (0.415 - 0.440) | 0.475 (0.458 - 0.492) | 0.320 (0.302 - 0.339) | 0.634 (0.618 - 0.648) | 0.683 (0.662 - 0.704) | 0.514 (0.490 - 0.542) | 0.470 (0.459 - 0.481) | 0.513 (0.500 - 0.528) | 0.364 (0.342 - 0.384) | |
Lee_CAU_task4A_2 | CAU_ET | Lee2023 | 0.104 (0.090 - 0.117) | 0.118 (0.098 - 0.136) | 0.090 (0.075 - 0.105) | 0.674 (0.661 - 0.690) | 0.707 (0.690 - 0.727) | 0.592 (0.560 - 0.622) | 0.137 (0.119 - 0.151) | 0.150 (0.126 - 0.169) | 0.106 (0.091 - 0.119) | |
Cheimariotis_DUTH_task4a_1 | DuthApida | Cheimariotis2023 | 0.516 (0.504 - 0.529) | 0.573 (0.555 - 0.593) | 0.411 (0.391 - 0.433) | 0.796 (0.784 - 0.808) | 0.841 (0.828 - 0.854) | 0.697 (0.675 - 0.719) | 0.577 (0.566 - 0.588) | 0.615 (0.599 - 0.632) | 0.486 (0.469 - 0.504) | |
Cheimariotis_DUTH_task4a_2 | DuthApida | Cheimariotis2023 | 0.487 (0.475 - 0.502) | 0.540 (0.521 - 0.560) | 0.389 (0.366 - 0.410) | 0.759 (0.745 - 0.773) | 0.804 (0.785 - 0.823) | 0.656 (0.633 - 0.682) | 0.555 (0.543 - 0.566) | 0.596 (0.580 - 0.611) | 0.454 (0.432 - 0.477) | |
Chen_CHT_task4_1 | VGGSK | Chen2023b | 0.441 (0.403 - 0.468) | 0.488 (0.440 - 0.523) | 0.333 (0.289 - 0.370) | 0.620 (0.567 - 0.652) | 0.666 (0.608 - 0.707) | 0.496 (0.428 - 0.548) | 0.504 (0.449 - 0.543) | 0.544 (0.486 - 0.585) | 0.406 (0.351 - 0.447) | |
Chen_CHT_task4_2 | VGGSK+BEATs | Chen2023b | 0.563 (0.550 - 0.574) | 0.621 (0.600 - 0.639) | 0.451 (0.431 - 0.471) | 0.779 (0.768 - 0.792) | 0.821 (0.809 - 0.834) | 0.690 (0.665 - 0.715) | 0.628 (0.615 - 0.641) | 0.669 (0.653 - 0.686) | 0.530 (0.508 - 0.552) | |
Xiao_FMSG_task4a_1 | Xiao_FMSG_task4a_1_single_model_without_external | Xiao2023 | 0.403 (0.392 - 0.417) | 0.455 (0.439 - 0.472) | 0.309 (0.292 - 0.326) | 0.660 (0.646 - 0.672) | 0.705 (0.690 - 0.724) | 0.549 (0.527 - 0.572) | 0.483 (0.472 - 0.493) | 0.522 (0.510 - 0.534) | 0.388 (0.373 - 0.402) | |
Xiao_FMSG_task4a_2 | Xiao_FMSG_task4a_2_single_model | Xiao2023 | 0.525 (0.516 - 0.538) | 0.566 (0.549 - 0.584) | 0.438 (0.424 - 0.454) | 0.808 (0.796 - 0.821) | 0.848 (0.837 - 0.862) | 0.705 (0.683 - 0.729) | 0.579 (0.569 - 0.588) | 0.613 (0.601 - 0.627) | 0.498 (0.481 - 0.517) | |
Xiao_FMSG_task4a_3 | Xiao_FMSG_task4a_3_single_model_psds2 | Xiao2023 | 0.071 (0.062 - 0.080) | 0.084 (0.070 - 0.096) | 0.061 (0.050 - 0.074) | 0.807 (0.796 - 0.818) | 0.845 (0.833 - 0.859) | 0.723 (0.701 - 0.742) | 0.131 (0.124 - 0.137) | 0.138 (0.128 - 0.147) | 0.118 (0.106 - 0.130) | |
Xiao_FMSG_task4a_4 | Xiao_FMSG_task4a_4_single_model | Xiao2023 | 0.551 (0.543 - 0.562) | 0.605 (0.591 - 0.621) | 0.451 (0.433 - 0.469) | 0.813 (0.802 - 0.827) | 0.855 (0.844 - 0.868) | 0.718 (0.698 - 0.736) | 0.581 (0.573 - 0.591) | 0.628 (0.616 - 0.641) | 0.467 (0.449 - 0.485) | |
Guan_HIT_task4a_3 | Guan_HIT_task4a_3 | Guan2023 | 0.526 (0.513 - 0.539) | 0.572 (0.552 - 0.590) | 0.435 (0.418 - 0.450) | 0.800 (0.788 - 0.813) | 0.840 (0.825 - 0.857) | 0.716 (0.695 - 0.735) | 0.548 (0.533 - 0.563) | 0.584 (0.567 - 0.604) | 0.462 (0.444 - 0.482) | |
Guan_HIT_task4a_4 | Guan_HIT_task4a_4 | Guan2023 | 0.082 (0.073 - 0.091) | 0.096 (0.083 - 0.107) | 0.057 (0.043 - 0.071) | 0.855 (0.844 - 0.867) | 0.890 (0.871 - 0.903) | 0.775 (0.756 - 0.796) | 0.142 (0.134 - 0.151) | 0.150 (0.139 - 0.160) | 0.126 (0.113 - 0.138) | |
Wang_XiaoRice_task4a_1 | SINGLE | Wang2023 | 0.494 (0.477 - 0.510) | 0.551 (0.532 - 0.574) | 0.380 (0.362 - 0.402) | 0.801 (0.789 - 0.815) | 0.838 (0.823 - 0.854) | 0.713 (0.686 - 0.742) | 0.487 (0.465 - 0.513) | 0.514 (0.491 - 0.543) | 0.423 (0.396 - 0.451) | |
Zhang_IOA_task4_5 | base system | Zhang2023 | 0.524 (0.513 - 0.537) | 0.565 (0.549 - 0.579) | 0.445 (0.421 - 0.472) | 0.774 (0.762 - 0.786) | 0.821 (0.804 - 0.837) | 0.672 (0.651 - 0.695) | 0.601 (0.591 - 0.610) | 0.630 (0.616 - 0.644) | 0.534 (0.513 - 0.553) | |
Zhang_IOA_task4_6 | strong_single | Zhang2023 | 0.562 (0.552 - 0.575) | 0.612 (0.597 - 0.626) | 0.467 (0.450 - 0.487) | 0.795 (0.786 - 0.805) | 0.848 (0.838 - 0.857) | 0.683 (0.661 - 0.703) | 0.626 (0.617 - 0.633) | 0.658 (0.646 - 0.669) | 0.550 (0.530 - 0.566) | |
Zhang_IOA_task4_7 | weak single | Zhang2023 | 0.055 (0.048 - 0.064) | 0.062 (0.050 - 0.074) | 0.030 (0.015 - 0.044) | 0.830 (0.820 - 0.842) | 0.882 (0.873 - 0.892) | 0.714 (0.690 - 0.735) | 0.129 (0.123 - 0.135) | 0.135 (0.126 - 0.143) | 0.115 (0.103 - 0.126) | |
Wu_NCUT_task4a_1 | Wu_NCUT_task4a_1 | Wu2023 | 0.391 (0.379 - 0.405) | 0.437 (0.423 - 0.458) | 0.295 (0.278 - 0.311) | 0.596 (0.584 - 0.610) | 0.638 (0.617 - 0.660) | 0.484 (0.463 - 0.505) | 0.466 (0.454 - 0.478) | 0.513 (0.500 - 0.527) | 0.347 (0.326 - 0.365) | |
Barahona_AUDIAS_task4a_1 | CRNN T++ resolution | Barahona2023 | 0.351 (0.333 - 0.372) | 0.394 (0.370 - 0.422) | 0.257 (0.236 - 0.275) | 0.562 (0.532 - 0.587) | 0.612 (0.586 - 0.640) | 0.434 (0.391 - 0.477) | 0.390 (0.373 - 0.414) | 0.422 (0.400 - 0.452) | 0.311 (0.288 - 0.329) | |
Barahona_AUDIAS_task4a_2 | CRNN T++ resolution with class-wise median filtering | Barahona2023 | 0.380 (0.361 - 0.406) | 0.427 (0.400 - 0.459) | 0.278 (0.257 - 0.296) | 0.575 (0.553 - 0.594) | 0.625 (0.604 - 0.650) | 0.444 (0.409 - 0.480) | 0.408 (0.389 - 0.432) | 0.442 (0.416 - 0.474) | 0.323 (0.302 - 0.341) | |
Barahona_AUDIAS_task4a_3 | Conformer F+ resolution | Barahona2023 | 0.200 (0.164 - 0.225) | 0.227 (0.185 - 0.256) | 0.153 (0.117 - 0.179) | 0.646 (0.626 - 0.664) | 0.681 (0.656 - 0.706) | 0.556 (0.525 - 0.590) | 0.163 (0.141 - 0.181) | 0.173 (0.148 - 0.192) | 0.146 (0.123 - 0.164) | |
Barahona_AUDIAS_task4a_4 | Conformer F+ resolution with class-wise median filtering | Barahona2023 | 0.141 (0.124 - 0.155) | 0.160 (0.136 - 0.179) | 0.105 (0.089 - 0.121) | 0.673 (0.652 - 0.700) | 0.708 (0.683 - 0.735) | 0.580 (0.550 - 0.610) | 0.155 (0.135 - 0.172) | 0.161 (0.137 - 0.180) | 0.144 (0.126 - 0.160) | |
Gan_NCUT_task4_1 | Gan_NCUT_SED_system_1 | Gan2023 | 0.365 (0.353 - 0.377) | 0.403 (0.388 - 0.424) | 0.281 (0.260 - 0.298) | 0.603 (0.589 - 0.617) | 0.656 (0.636 - 0.676) | 0.473 (0.453 - 0.496) | 0.437 (0.426 - 0.448) | 0.469 (0.455 - 0.483) | 0.353 (0.331 - 0.371) | |
Liu_SRCN_task4a_4 | DCASE2023 t4a system4 | Chen2023a | 0.412 (0.400 - 0.424) | 0.450 (0.432 - 0.472) | 0.334 (0.314 - 0.352) | 0.663 (0.652 - 0.676) | 0.707 (0.690 - 0.724) | 0.556 (0.531 - 0.574) | 0.472 (0.462 - 0.480) | 0.504 (0.491 - 0.515) | 0.390 (0.370 - 0.407) | |
Kim_GIST-HanwhaVision_task4a_1 | DCASE2023 FDY-LKA CRNN without external single | Kim2023 | 0.459 (0.431 - 0.484) | 0.504 (0.472 - 0.533) | 0.368 (0.330 - 0.400) | 0.701 (0.681 - 0.720) | 0.750 (0.732 - 0.771) | 0.590 (0.556 - 0.625) | 0.545 (0.530 - 0.564) | 0.582 (0.567 - 0.602) | 0.453 (0.434 - 0.474) | |
Kim_GIST-HanwhaVision_task4a_2 | FDYLKA BEATs pool1d Stage2 | Kim2023 | 0.591 (0.574 - 0.611) | 0.645 (0.624 - 0.668) | 0.489 (0.466 - 0.515) | 0.831 (0.823 - 0.841) | 0.868 (0.859 - 0.877) | 0.751 (0.733 - 0.768) | 0.646 (0.634 - 0.658) | 0.684 (0.670 - 0.697) | 0.554 (0.534 - 0.573) | |
Kim_GIST-HanwhaVision_task4a_3 | LKAFDY BEATs Stage 2 interpolate | Kim2023 | 0.581 (0.553 - 0.600) | 0.633 (0.604 - 0.655) | 0.483 (0.456 - 0.503) | 0.835 (0.826 - 0.846) | 0.871 (0.862 - 0.881) | 0.754 (0.736 - 0.772) | 0.638 (0.622 - 0.654) | 0.675 (0.658 - 0.691) | 0.549 (0.522 - 0.572) | |
Wenxin_TJU_task4a_5 | single-pretrained-psds1-0 | Wenxin2023 | 0.539 (0.528 - 0.549) | 0.598 (0.581 - 0.614) | 0.423 (0.404 - 0.437) | 0.816 (0.806 - 0.831) | 0.858 (0.848 - 0.870) | 0.710 (0.688 - 0.733) | 0.569 (0.559 - 0.577) | 0.605 (0.594 - 0.614) | 0.481 (0.460 - 0.501) | |
Wenxin_TJU_task4a_6 | single-pretrained-psds1-1 | Wenxin2023 | 0.546 (0.536 - 0.556) | 0.596 (0.583 - 0.611) | 0.432 (0.418 - 0.448) | 0.831 (0.823 - 0.842) | 0.875 (0.868 - 0.884) | 0.735 (0.715 - 0.754) | 0.582 (0.574 - 0.589) | 0.615 (0.603 - 0.626) | 0.498 (0.481 - 0.515) | |
Wenxin_TJU_task4a_7 | single-psds1 | Wenxin2023 | 0.440 (0.429 - 0.454) | 0.491 (0.472 - 0.508) | 0.331 (0.314 - 0.349) | 0.686 (0.673 - 0.699) | 0.730 (0.711 - 0.751) | 0.567 (0.547 - 0.588) | 0.504 (0.497 - 0.514) | 0.547 (0.533 - 0.561) | 0.397 (0.379 - 0.413) | |
Wenxin_TJU_task4a_8 | single-psds2 | Wenxin2023 | 0.059 (0.049 - 0.068) | 0.076 (0.064 - 0.086) | 0.054 (0.041 - 0.067) | 0.707 (0.694 - 0.723) | 0.739 (0.723 - 0.758) | 0.634 (0.610 - 0.654) | 0.131 (0.125 - 0.137) | 0.140 (0.131 - 0.148) | 0.116 (0.105 - 0.127) |
With ensembling
Rank |
Submission code |
Submission name |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS 1 (Evaluation dataset) |
PSDS 2 (Evaluation dataset) |
PSDS 1 (Development dataset) |
PSDS 2 (Development dataset) |
---|---|---|---|---|---|---|---|---|
Baseline_task4a_1 | DCASE2023 baseline system | Turpault2023 | 1.00 | 0.327 (0.317 - 0.339) | 0.538 (0.515 - 0.566) | 0.359 | 0.562 | |
Baseline_task4a_2 | DCASE2023 baseline system (Audioset+Beats) | Turpault2023 | 1.52 | 0.510 (0.496 - 0.523) | 0.798 (0.782 - 0.811) | 0.491 | 0.787 | |
Li_USTC_task4a_1 | TAFT and SdMT | Li2023 | 1.54 | 0.539 (0.527 - 0.551) | 0.769 (0.758 - 0.778) | 0.562 | 0.795 | |
Li_USTC_task4a_2 | Pseudo labeling | Li2023 | 1.58 | 0.556 (0.544 - 0.569) | 0.781 (0.769 - 0.795) | 0.554 | 0.799 | |
Li_USTC_task4a_3 | TAFT and AFL | Li2023 | 1.54 | 0.546 (0.535 - 0.558) | 0.756 (0.745 - 0.769) | 0.558 | 0.798 | |
Li_USTC_task4a_4 | MaxFilter | Li2023 | 0.89 | 0.061 (0.050 - 0.070) | 0.852 (0.843 - 0.863) | 0.093 | 0.899 | |
Li_USTC_task4a_5 | TAFT and SdMT and single | Li2023 | 1.52 | 0.531 (0.520 - 0.544) | 0.762 (0.751 - 0.773) | 0.555 | 0.791 | |
Li_USTC_task4a_6 | Pseudo labeling and single | Li2023 | 1.56 | 0.546 (0.529 - 0.562) | 0.783 (0.771 - 0.796) | 0.552 | 0.795 | |
Li_USTC_task4a_7 | SKCRNN MT | Li2023 | 1.20 | 0.404 (0.389 - 0.421) | 0.630 (0.612 - 0.648) | 0.451 | 0.662 | |
Liu_NSYSU_task4_1 | DCASE2023 FDY_WeakSED_Ensemble | Liu2023 | 0.80 | 0.051 (0.042 - 0.060) | 0.779 (0.767 - 0.791) | 0.063 | 0.711 | |
Liu_NSYSU_task4_2 | FDY_Ensemble | Liu2023 | 1.36 | 0.466 (0.455 - 0.480) | 0.701 (0.688 - 0.714) | 0.473 | 0.714 | |
Liu_NSYSU_task4_3 | DCASE2023 VGGSK_Single | Liu2023 | 1.26 | 0.434 (0.420 - 0.448) | 0.646 (0.633 - 0.660) | 0.437 | 0.682 | |
Liu_NSYSU_task4_4 | DCASE2023 FDY_Single | Liu2023 | 1.24 | 0.413 (0.394 - 0.438) | 0.655 (0.638 - 0.673) | 0.456 | 0.687 | |
Liu_NSYSU_task4_5 | DCASE2023 FDY_BEATs_WeakSED | Liu2023 | 0.82 | 0.045 (0.035 - 0.053) | 0.806 (0.794 - 0.818) | 0.061 | 0.839 | |
Liu_NSYSU_task4_6 | DCASE2023 FDY_BEATs | Liu2023 | 1.62 | 0.552 (0.540 - 0.563) | 0.838 (0.829 - 0.848) | 0.527 | 0.803 | |
Liu_NSYSU_task4_7 | DCASE2023 FDY_BEATs | Liu2023 | 1.55 | 0.521 (0.510 - 0.531) | 0.813 (0.796 - 0.831) | 0.492 | 0.800 | |
Liu_NSYSU_task4_8 | DCASE2023 FDY_BEATs | Liu2023 | 1.53 | 0.515 (0.488 - 0.536) | 0.805 (0.791 - 0.818) | 0.511 | 0.780 | |
Lee_CAU_task4A_1 | CAU_ET | Lee2023 | 1.24 | 0.425 (0.415 - 0.440) | 0.634 (0.618 - 0.648) | 0.437 | 0.654 | |
Lee_CAU_task4A_2 | CAU_ET | Lee2023 | 0.79 | 0.104 (0.090 - 0.117) | 0.674 (0.661 - 0.690) | 0.070 | 0.734 | |
Cheimariotis_DUTH_task4a_1 | DuthApida | Cheimariotis2023 | 1.53 | 0.516 (0.504 - 0.529) | 0.796 (0.784 - 0.808) | 0.496 | 0.788 | |
Cheimariotis_DUTH_task4a_2 | DuthApida | Cheimariotis2023 | 1.45 | 0.487 (0.475 - 0.502) | 0.759 (0.745 - 0.773) | 0.516 | 0.781 | |
Chen_CHT_task4_1 | VGGSK | Chen2023b | 1.25 | 0.441 (0.403 - 0.468) | 0.620 (0.567 - 0.652) | 0.424 | 0.633 | |
Chen_CHT_task4_2 | VGGSK+BEATs | Chen2023b | 1.58 | 0.563 (0.550 - 0.574) | 0.779 (0.768 - 0.792) | 0.529 | 0.780 | |
Chen_CHT_task4_3 | VGGSK+BEATs | Chen2023b | 1.66 | 0.596 (0.585 - 0.606) | 0.810 (0.800 - 0.822) | 0.552 | 0.794 | |
Chen_CHT_task4_4 | multi+BEATs | Chen2023b | 1.66 | 0.590 (0.578 - 0.601) | 0.820 (0.810 - 0.831) | 0.542 | 0.799 | |
Xiao_FMSG_task4a_1 | Xiao_FMSG_task4a_1_single_model_without_external | Zhang2023 | 1.23 | 0.403 (0.392 - 0.417) | 0.660 (0.646 - 0.672) | 0.464 | 0.711 | |
Xiao_FMSG_task4a_2 | Xiao_FMSG_task4a_2_single_model | Xiao2023 | 1.55 | 0.525 (0.516 - 0.538) | 0.808 (0.796 - 0.821) | 0.543 | 0.801 | |
Xiao_FMSG_task4a_3 | Xiao_FMSG_task4a_3_single_model_psds2 | Xiao2023 | 0.86 | 0.071 (0.062 - 0.080) | 0.807 (0.796 - 0.818) | 0.098 | 0.845 | |
Xiao_FMSG_task4a_4 | Xiao_FMSG_task4a_4_single_model | Xiao2023 | 1.60 | 0.551 (0.543 - 0.562) | 0.813 (0.802 - 0.827) | 0.539 | 0.793 | |
Xiao_FMSG_task4a_5 | Xiao_FMSG_task4a_5_ensemble_model | Xiao2023 | 1.61 | 0.555 (0.545 - 0.567) | 0.821 (0.811 - 0.834) | 0.544 | 0.801 | |
Xiao_FMSG_task4a_6 | Xiao_FMSG_task4a_6_ensemble_model | Xiao2023 | 1.61 | 0.551 (0.541 - 0.561) | 0.829 (0.819 - 0.842) | 0.557 | 0.812 | |
Xiao_FMSG_task4a_7 | Xiao_FMSG_task4a_7_ensemble_model | Xiao2023 | 0.87 | 0.075 (0.066 - 0.084) | 0.811 (0.800 - 0.822) | 0.098 | 0.854 | |
Xiao_FMSG_task4a_8 | Xiao_FMSG_task4a_8_ensemble_model | Xiao2023 | 1.62 | 0.549 (0.540 - 0.560) | 0.834 (0.824 - 0.847) | 0.551 | 0.813 | |
Guan_HIT_task4a_1 | Guan_HIT_task4a_1 | Guan2023 | 1.57 | 0.536 (0.526 - 0.546) | 0.810 (0.800 - 0.822) | 0.523 | 0.790 | |
Guan_HIT_task4a_2 | Guan_HIT_task4a_2 | Guan2023 | 0.93 | 0.082 (0.074 - 0.090) | 0.862 (0.852 - 0.872) | 0.115 | 0.890 | |
Guan_HIT_task4a_3 | Guan_HIT_task4a_3 | Guan2023 | 1.55 | 0.526 (0.513 - 0.539) | 0.800 (0.788 - 0.813) | 0.517 | 0.782 | |
Guan_HIT_task4a_4 | Guan_HIT_task4a_4 | Guan2023 | 0.92 | 0.082 (0.073 - 0.091) | 0.855 (0.844 - 0.867) | 0.113 | 0.885 | |
Guan_HIT_task4a_5 | Guan_HIT_task4a_5 | Guan2023 | 1.40 | 0.488 (0.475 - 0.503) | 0.708 (0.696 - 0.720) | 0.492 | 0.705 | |
Guan_HIT_task4a_6 | Guan_HIT_task4a_6 | Guan2023 | 0.88 | 0.088 (0.080 - 0.096) | 0.797 (0.787 - 0.810) | 0.109 | 0.839 | |
Wang_XiaoRice_task4a_1 | SINGLE | Wang2023 | 1.50 | 0.494 (0.477 - 0.510) | 0.801 (0.789 - 0.815) | 0.527 | 0.790 | |
Wang_XiaoRice_task4a_2 | SED Embed | Wang2023 | 1.52 | 0.497 (0.486 - 0.510) | 0.814 (0.803 - 0.828) | 0.534 | 0.811 | |
Wang_XiaoRice_task4a_3 | L-TAG | Wang2023 | 0.91 | 0.088 (0.076 - 0.098) | 0.835 (0.824 - 0.844) | 0.102 | 0.886 | |
Zhang_IOA_task4_1 | strong_ensemble | Zhang2023 | 1.75 | 0.622 (0.613 - 0.634) | 0.857 (0.849 - 0.866) | 0.598 | 0.837 | |
Zhang_IOA_task4_2 | segment tagging model | Zhang2023 | 0.95 | 0.070 (0.060 - 0.080) | 0.903 (0.895 - 0.911) | 0.071 | 0.921 | |
Zhang_IOA_task4_3 | strong_ensemble_all | Zhang2023 | 1.71 | 0.613 (0.603 - 0.625) | 0.828 (0.821 - 0.839) | 0.601 | 0.847 | |
Zhang_IOA_task4_4 | strong_ensemble_1 | Zhang2023 | 1.75 | 0.625 (0.615 - 0.637) | 0.855 (0.847 - 0.864) | 0.602 | 0.841 | |
Zhang_IOA_task4_5 | base system | Zhang2023 | 1.52 | 0.524 (0.513 - 0.537) | 0.774 (0.762 - 0.786) | 0.498 | 0.746 | |
Zhang_IOA_task4_6 | strong_single | Zhang2023 | 1.60 | 0.562 (0.552 - 0.575) | 0.795 (0.786 - 0.805) | 0.552 | 0.794 | |
Zhang_IOA_task4_7 | weak single | Zhang2023 | 0.86 | 0.055 (0.048 - 0.064) | 0.830 (0.820 - 0.842) | 0.065 | 0.865 | |
Wu_NCUT_task4a_1 | Wu_NCUT_task4a_1 | Wu2023 | 1.15 | 0.391 (0.379 - 0.405) | 0.596 (0.584 - 0.610) | 0.429 | 0.644 | |
Wu_NCUT_task4a_2 | Wu_NCUT_task4a_2 | Wu2023 | 1.53 | 0.519 (0.507 - 0.531) | 0.793 (0.783 - 0.806) | 0.525 | 0.780 | |
Wu_NCUT_task4a_3 | Wu_NCUT_task4a_3 | Wu2023 | 1.50 | 0.497 (0.486 - 0.509) | 0.793 (0.783 - 0.806) | 0.521 | 0.783 | |
Barahona_AUDIAS_task4a_1 | CRNN T++ resolution | Barahona2023 | 1.06 | 0.351 (0.333 - 0.372) | 0.562 (0.532 - 0.587) | 0.374 | 0.575 | |
Barahona_AUDIAS_task4a_2 | CRNN T++ resolution with class-wise median filtering | Barahona2023 | 1.12 | 0.380 (0.361 - 0.406) | 0.575 (0.553 - 0.594) | 0.387 | 0.585 | |
Barahona_AUDIAS_task4a_3 | Conformer F+ resolution | Barahona2023 | 0.91 | 0.200 (0.164 - 0.225) | 0.646 (0.626 - 0.664) | 0.224 | 0.696 | |
Barahona_AUDIAS_task4a_4 | Conformer F+ resolution with class-wise median filtering | Barahona2023 | 0.84 | 0.141 (0.124 - 0.155) | 0.673 (0.652 - 0.700) | 0.164 | 0.740 | |
Barahona_AUDIAS_task4a_5 | 4-Resolution CRNN | Barahona2023 | 1.14 | 0.378 (0.365 - 0.392) | 0.604 (0.590 - 0.622) | 0.405 | 0.624 | |
Barahona_AUDIAS_task4a_6 | 4-Resolution CRNN with class-dependent median filtering | Barahona2023 | 1.18 | 0.401 (0.390 - 0.414) | 0.612 (0.596 - 0.630) | 0.416 | 0.626 | |
Barahona_AUDIAS_task4a_7 | 5-Resolution Conformer | Barahona2023 | 1.06 | 0.274 (0.262 - 0.287) | 0.684 (0.671 - 0.699) | 0.306 | 0.727 | |
Barahona_AUDIAS_task4a_8 | 5-Resolution Conformer with class-wise median filtering | Barahona2023 | 1.00 | 0.213 (0.201 - 0.226) | 0.729 (0.710 - 0.752) | 0.243 | 0.781 | |
Gan_NCUT_task4_1 | Gan_NCUT_SED_system_1 | Gan2023 | 1.12 | 0.365 (0.353 - 0.377) | 0.603 (0.589 - 0.617) | 0.402 | 0.620 | |
Gan_NCUT_task4_2 | Gan_NCUT_SED_system_2 | Gan2023 | 1.52 | 0.511 (0.498 - 0.524) | 0.799 (0.785 - 0.813) | 0.521 | 0.792 | |
Gan_NCUT_task4_3 | Gan_NCUT_SED_system_3 | Gan2023 | 1.50 | 0.483 (0.467 - 0.498) | 0.816 (0.805 - 0.828) | 0.497 | 0.825 | |
Liu_SRCN_task4a_1 | DCASE2023 t4a system1 | Chen2023a | 1.65 | 0.585 (0.572 - 0.598) | 0.817 (0.804 - 0.834) | 0.570 | 0.843 | |
Liu_SRCN_task4a_2 | DCASE2023 t4a system2 | Chen2023a | 1.40 | 0.380 (0.369 - 0.392) | 0.877 (0.867 - 0.885) | 0.414 | 0.884 | |
Liu_SRCN_task4a_3 | DCASE2023 t4a system3 | Chen2023a | 1.65 | 0.556 (0.544 - 0.569) | 0.861 (0.852 - 0.870) | 0.554 | 0.833 | |
Liu_SRCN_task4a_4 | DCASE2023 t4a system4 | Chen2023a | 1.25 | 0.412 (0.400 - 0.424) | 0.663 (0.652 - 0.676) | 0.436 | 0.675 | |
Liu_SRCN_task4a_5 | DCASE2023 t4a system5 | Chen2023a | 0.94 | 0.098 (0.086 - 0.108) | 0.851 (0.841 - 0.860) | 0.118 | 0.889 | |
Kim_GIST-HanwhaVision_task4a_1 | DCASE2023 FDY-LKA CRNN without external single | Kim2023 | 1.35 | 0.459 (0.431 - 0.484) | 0.701 (0.681 - 0.720) | 0.471 | 0.715 | |
Kim_GIST-HanwhaVision_task4a_2 | FDYLKA BEATs pool1d Stage2 | Kim2023 | 1.68 | 0.591 (0.574 - 0.611) | 0.831 (0.823 - 0.841) | 0.546 | 0.807 | |
Kim_GIST-HanwhaVision_task4a_3 | LKAFDY BEATs Stage 2 interpolate | Kim2023 | 1.66 | 0.581 (0.553 - 0.600) | 0.835 (0.826 - 0.846) | 0.543 | 0.806 | |
Kim_GIST-HanwhaVision_task4a_4 | FDYLKA BEATs pool 1d stage1 | Kim2023 | 1.63 | 0.576 (0.549 - 0.595) | 0.809 (0.797 - 0.821) | 0.525 | 0.770 | |
Kim_GIST-HanwhaVision_task4a_5 | FDYLKA BEATs all ensemble 48 | Kim2023 | 1.72 | 0.611 (0.598 - 0.623) | 0.846 (0.838 - 0.855) | 0.566 | 0.815 | |
Kim_GIST-HanwhaVision_task4a_6 | FDYLKA BEATs PSDS1 ensemble 16 | Kim2023 | 1.72 | 0.611 (0.590 - 0.628) | 0.841 (0.832 - 0.851) | 0.564 | 0.810 | |
Kim_GIST-HanwhaVision_task4a_7 | FDYLKA BEATs PSDS2 ensemble 16 | Kim2023 | 1.69 | 0.591 (0.574 - 0.604) | 0.844 (0.835 - 0.853) | 0.554 | 0.817 | |
Kim_GIST-HanwhaVision_task4a_8 | FDYLKA BEATs PSDS sum ensemble 16 | Kim2023 | 1.72 | 0.612 (0.599 - 0.626) | 0.841 (0.831 - 0.851) | 0.567 | 0.810 | |
Wenxin_TJU_task4a_1 | ensemble-pretrained-psds1-0 | Wenxin2023 | 1.63 | 0.555 (0.543 - 0.566) | 0.837 (0.828 - 0.847) | 0.535 | 0.806 | |
Wenxin_TJU_task4a_2 | ensemble-pretrained-psds1-1 | Wenxin2023 | 1.66 | 0.570 (0.559 - 0.580) | 0.844 (0.836 - 0.854) | 0.530 | 0.804 | |
Wenxin_TJU_task4a_3 | ensemble-pretrained-psds2-0 | Wenxin2023 | 0.88 | 0.080 (0.071 - 0.088) | 0.815 (0.802 - 0.825) | 0.087 | 0.875 | |
Wenxin_TJU_task4a_4 | ensemble-pretrained-psds2-1 | Wenxin2023 | 0.90 | 0.081 (0.071 - 0.090) | 0.838 (0.828 - 0.849) | 0.087 | 0.875 | |
Wenxin_TJU_task4a_5 | single-pretrained-psds1-0 | Wenxin2023 | 1.58 | 0.539 (0.528 - 0.549) | 0.816 (0.806 - 0.831) | 0.521 | 0.793 | |
Wenxin_TJU_task4a_6 | single-pretrained-psds1-1 | Wenxin2023 | 1.61 | 0.546 (0.536 - 0.556) | 0.831 (0.823 - 0.842) | 0.512 | 0.808 | |
Wenxin_TJU_task4a_7 | single-psds1 | Wenxin2023 | 1.31 | 0.440 (0.429 - 0.454) | 0.686 (0.673 - 0.699) | 0.460 | 0.699 | |
Wenxin_TJU_task4a_8 | single-psds2 | Wenxin2023 | 0.75 | 0.059 (0.049 - 0.068) | 0.707 (0.694 - 0.723) | 0.067 | 0.781 |
Supplementary metrics
Rank |
Submission code |
Submission name |
Technical Report |
PSDS 1 (Evaluation dataset) |
PSDS 1 (Public evaluation) |
PSDS 1 (Vimeo dataset) |
PSDS 2 (Evaluation dataset) |
PSDS 2 (Public evaluation) |
PSDS 2 (Vimeo dataset) |
F-score (Evaluation dataset) |
F-score (Public evaluation) |
F-score (Vimeo dataset) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Baseline_task4a_1 | DCASE2023 baseline system | Turpault2023 | 0.327 (0.317 - 0.339) | 0.366 (0.347 - 0.385) | 0.247 (0.220 - 0.275) | 0.538 (0.515 - 0.566) | 0.580 (0.552 - 0.612) | 0.430 (0.397 - 0.466) | 0.377 (0.351 - 0.402) | 0.408 (0.379 - 0.441) | 0.299 (0.269 - 0.330) | |
Baseline_task4a_2 | DCASE2023 baseline system (Audioset+Beats) | Turpault2023 | 0.510 (0.496 - 0.523) | 0.560 (0.541 - 0.579) | 0.414 (0.395 - 0.435) | 0.798 (0.782 - 0.811) | 0.841 (0.829 - 0.853) | 0.697 (0.671 - 0.718) | 0.567 (0.544 - 0.588) | 0.603 (0.571 - 0.629) | 0.480 (0.454 - 0.504) | |
Li_USTC_task4a_1 | TAFT and SdMT | Li2023 | 0.539 (0.527 - 0.551) | 0.588 (0.574 - 0.605) | 0.435 (0.418 - 0.451) | 0.769 (0.758 - 0.778) | 0.810 (0.800 - 0.820) | 0.669 (0.647 - 0.687) | 0.595 (0.584 - 0.606) | 0.632 (0.619 - 0.646) | 0.501 (0.478 - 0.522) | |
Li_USTC_task4a_2 | Pseudo labeling | Li2023 | 0.556 (0.544 - 0.569) | 0.603 (0.589 - 0.618) | 0.453 (0.435 - 0.469) | 0.781 (0.769 - 0.795) | 0.809 (0.794 - 0.823) | 0.706 (0.683 - 0.725) | 0.615 (0.598 - 0.629) | 0.647 (0.631 - 0.663) | 0.534 (0.507 - 0.559) | |
Li_USTC_task4a_3 | TAFT and AFL | Li2023 | 0.546 (0.535 - 0.558) | 0.591 (0.577 - 0.607) | 0.446 (0.430 - 0.464) | 0.756 (0.745 - 0.769) | 0.792 (0.780 - 0.805) | 0.675 (0.655 - 0.697) | 0.590 (0.581 - 0.599) | 0.629 (0.615 - 0.642) | 0.494 (0.475 - 0.513) | |
Li_USTC_task4a_4 | MaxFilter | Li2023 | 0.061 (0.050 - 0.070) | 0.076 (0.064 - 0.088) | 0.028 (0.016 - 0.040) | 0.852 (0.843 - 0.863) | 0.891 (0.882 - 0.900) | 0.764 (0.743 - 0.786) | 0.137 (0.130 - 0.143) | 0.141 (0.131 - 0.150) | 0.127 (0.113 - 0.139) | |
Li_USTC_task4a_5 | TAFT and SdMT and single | Li2023 | 0.531 (0.520 - 0.544) | 0.577 (0.562 - 0.595) | 0.431 (0.409 - 0.457) | 0.762 (0.751 - 0.773) | 0.800 (0.789 - 0.812) | 0.663 (0.637 - 0.688) | 0.599 (0.584 - 0.613) | 0.634 (0.620 - 0.653) | 0.509 (0.477 - 0.543) | |
Li_USTC_task4a_6 | Pseudo labeling and single | Li2023 | 0.546 (0.529 - 0.562) | 0.593 (0.573 - 0.614) | 0.451 (0.429 - 0.473) | 0.783 (0.771 - 0.796) | 0.810 (0.797 - 0.825) | 0.703 (0.679 - 0.724) | 0.603 (0.589 - 0.615) | 0.635 (0.618 - 0.651) | 0.523 (0.502 - 0.545) | |
Li_USTC_task4a_7 | SKCRNN MT | Li2023 | 0.404 (0.389 - 0.421) | 0.451 (0.431 - 0.473) | 0.303 (0.281 - 0.320) | 0.630 (0.612 - 0.648) | 0.673 (0.654 - 0.693) | 0.502 (0.466 - 0.532) | 0.478 (0.467 - 0.489) | 0.516 (0.502 - 0.530) | 0.384 (0.363 - 0.405) | |
Liu_NSYSU_task4_1 | DCASE2023 FDY_WeakSED_Ensemble | Liu2023 | 0.051 (0.042 - 0.060) | 0.062 (0.050 - 0.072) | 0.018 (0.007 - 0.029) | 0.779 (0.767 - 0.791) | 0.807 (0.794 - 0.822) | 0.697 (0.675 - 0.721) | 0.128 (0.122 - 0.135) | 0.136 (0.126 - 0.145) | 0.110 (0.097 - 0.123) | |
Liu_NSYSU_task4_2 | FDY_Ensemble | Liu2023 | 0.466 (0.455 - 0.480) | 0.521 (0.505 - 0.536) | 0.354 (0.337 - 0.370) | 0.701 (0.688 - 0.714) | 0.756 (0.741 - 0.773) | 0.577 (0.551 - 0.599) | 0.516 (0.505 - 0.527) | 0.550 (0.537 - 0.564) | 0.427 (0.407 - 0.444) | |
Liu_NSYSU_task4_3 | DCASE2023 VGGSK_Single | Liu2023 | 0.434 (0.420 - 0.448) | 0.489 (0.472 - 0.506) | 0.314 (0.292 - 0.334) | 0.646 (0.633 - 0.660) | 0.704 (0.688 - 0.721) | 0.510 (0.485 - 0.531) | 0.468 (0.447 - 0.485) | 0.500 (0.477 - 0.518) | 0.389 (0.359 - 0.414) | |
Liu_NSYSU_task4_4 | DCASE2023 FDY_Single | Liu2023 | 0.413 (0.394 - 0.438) | 0.461 (0.438 - 0.488) | 0.318 (0.293 - 0.342) | 0.655 (0.638 - 0.673) | 0.715 (0.693 - 0.737) | 0.527 (0.500 - 0.549) | 0.484 (0.470 - 0.497) | 0.518 (0.502 - 0.535) | 0.397 (0.369 - 0.422) | |
Liu_NSYSU_task4_5 | DCASE2023 FDY_BEATs_WeakSED | Liu2023 | 0.045 (0.035 - 0.053) | 0.059 (0.047 - 0.069) | 0.007 (0.001 - 0.019) | 0.806 (0.794 - 0.818) | 0.835 (0.823 - 0.849) | 0.725 (0.704 - 0.750) | 0.142 (0.135 - 0.149) | 0.151 (0.141 - 0.161) | 0.124 (0.112 - 0.135) | |
Liu_NSYSU_task4_6 | DCASE2023 FDY_BEATs | Liu2023 | 0.552 (0.540 - 0.563) | 0.600 (0.583 - 0.619) | 0.452 (0.437 - 0.467) | 0.838 (0.829 - 0.848) | 0.879 (0.871 - 0.889) | 0.746 (0.728 - 0.763) | 0.589 (0.578 - 0.599) | 0.625 (0.613 - 0.637) | 0.504 (0.488 - 0.519) | |
Liu_NSYSU_task4_7 | DCASE2023 FDY_BEATs | Liu2023 | 0.521 (0.510 - 0.531) | 0.569 (0.555 - 0.586) | 0.424 (0.404 - 0.446) | 0.813 (0.796 - 0.831) | 0.858 (0.839 - 0.876) | 0.717 (0.694 - 0.742) | 0.564 (0.551 - 0.575) | 0.598 (0.586 - 0.611) | 0.481 (0.460 - 0.503) | |
Liu_NSYSU_task4_8 | DCASE2023 FDY_BEATs | Liu2023 | 0.515 (0.488 - 0.536) | 0.564 (0.532 - 0.587) | 0.416 (0.390 - 0.440) | 0.805 (0.791 - 0.818) | 0.850 (0.832 - 0.868) | 0.699 (0.676 - 0.719) | 0.553 (0.529 - 0.574) | 0.586 (0.557 - 0.608) | 0.469 (0.446 - 0.498) | |
Lee_CAU_task4A_1 | CAU_ET | Lee2023 | 0.425 (0.415 - 0.440) | 0.475 (0.458 - 0.492) | 0.320 (0.302 - 0.339) | 0.634 (0.618 - 0.648) | 0.683 (0.662 - 0.704) | 0.514 (0.490 - 0.542) | 0.470 (0.459 - 0.481) | 0.513 (0.500 - 0.528) | 0.364 (0.342 - 0.384) | |
Lee_CAU_task4A_2 | CAU_ET | Lee2023 | 0.104 (0.090 - 0.117) | 0.118 (0.098 - 0.136) | 0.090 (0.075 - 0.105) | 0.674 (0.661 - 0.690) | 0.707 (0.690 - 0.727) | 0.592 (0.560 - 0.622) | 0.137 (0.119 - 0.151) | 0.150 (0.126 - 0.169) | 0.106 (0.091 - 0.119) | |
Cheimariotis_DUTH_task4a_1 | DuthApida | Cheimariotis2023 | 0.516 (0.504 - 0.529) | 0.573 (0.555 - 0.593) | 0.411 (0.391 - 0.433) | 0.796 (0.784 - 0.808) | 0.841 (0.828 - 0.854) | 0.697 (0.675 - 0.719) | 0.577 (0.566 - 0.588) | 0.615 (0.599 - 0.632) | 0.486 (0.469 - 0.504) | |
Cheimariotis_DUTH_task4a_2 | DuthApida | Cheimariotis2023 | 0.487 (0.475 - 0.502) | 0.540 (0.521 - 0.560) | 0.389 (0.366 - 0.410) | 0.759 (0.745 - 0.773) | 0.804 (0.785 - 0.823) | 0.656 (0.633 - 0.682) | 0.555 (0.543 - 0.566) | 0.596 (0.580 - 0.611) | 0.454 (0.432 - 0.477) | |
Chen_CHT_task4_1 | VGGSK | Chen2023b | 0.441 (0.403 - 0.468) | 0.488 (0.440 - 0.523) | 0.333 (0.289 - 0.370) | 0.620 (0.567 - 0.652) | 0.666 (0.608 - 0.707) | 0.496 (0.428 - 0.548) | 0.504 (0.449 - 0.543) | 0.544 (0.486 - 0.585) | 0.406 (0.351 - 0.447) | |
Chen_CHT_task4_2 | VGGSK+BEATs | Chen2023b | 0.563 (0.550 - 0.574) | 0.621 (0.600 - 0.639) | 0.451 (0.431 - 0.471) | 0.779 (0.768 - 0.792) | 0.821 (0.809 - 0.834) | 0.690 (0.665 - 0.715) | 0.628 (0.615 - 0.641) | 0.669 (0.653 - 0.686) | 0.530 (0.508 - 0.552) | |
Chen_CHT_task4_3 | VGGSK+BEATs | Chen2023b | 0.596 (0.585 - 0.606) | 0.655 (0.643 - 0.668) | 0.482 (0.464 - 0.500) | 0.810 (0.800 - 0.822) | 0.849 (0.837 - 0.860) | 0.733 (0.714 - 0.752) | 0.638 (0.630 - 0.648) | 0.673 (0.660 - 0.685) | 0.552 (0.533 - 0.573) | |
Chen_CHT_task4_4 | multi+BEATs | Chen2023b | 0.590 (0.578 - 0.601) | 0.649 (0.634 - 0.664) | 0.476 (0.460 - 0.493) | 0.820 (0.810 - 0.831) | 0.862 (0.851 - 0.872) | 0.731 (0.710 - 0.751) | 0.639 (0.629 - 0.649) | 0.676 (0.662 - 0.688) | 0.547 (0.528 - 0.565) | |
Xiao_FMSG_task4a_1 | Xiao_FMSG_task4a_1_single_model_without_external | Zhang2023 | 0.403 (0.392 - 0.417) | 0.455 (0.439 - 0.472) | 0.309 (0.292 - 0.326) | 0.660 (0.646 - 0.672) | 0.705 (0.690 - 0.724) | 0.549 (0.527 - 0.572) | 0.483 (0.472 - 0.493) | 0.522 (0.510 - 0.534) | 0.388 (0.373 - 0.402) | |
Xiao_FMSG_task4a_2 | Xiao_FMSG_task4a_2_single_model | Xiao2023 | 0.525 (0.516 - 0.538) | 0.566 (0.549 - 0.584) | 0.438 (0.424 - 0.454) | 0.808 (0.796 - 0.821) | 0.848 (0.837 - 0.862) | 0.705 (0.683 - 0.729) | 0.579 (0.569 - 0.588) | 0.613 (0.601 - 0.627) | 0.498 (0.481 - 0.517) | |
Xiao_FMSG_task4a_3 | Xiao_FMSG_task4a_3_single_model_psds2 | Xiao2023 | 0.071 (0.062 - 0.080) | 0.084 (0.070 - 0.096) | 0.061 (0.050 - 0.074) | 0.807 (0.796 - 0.818) | 0.845 (0.833 - 0.859) | 0.723 (0.701 - 0.742) | 0.131 (0.124 - 0.137) | 0.138 (0.128 - 0.147) | 0.118 (0.106 - 0.130) | |
Xiao_FMSG_task4a_4 | Xiao_FMSG_task4a_4_single_model | Xiao2023 | 0.551 (0.543 - 0.562) | 0.605 (0.591 - 0.621) | 0.451 (0.433 - 0.469) | 0.813 (0.802 - 0.827) | 0.855 (0.844 - 0.868) | 0.718 (0.698 - 0.736) | 0.581 (0.573 - 0.591) | 0.628 (0.616 - 0.641) | 0.467 (0.449 - 0.485) | |
Xiao_FMSG_task4a_5 | Xiao_FMSG_task4a_5_ensemble_model | Xiao2023 | 0.555 (0.545 - 0.567) | 0.606 (0.592 - 0.623) | 0.457 (0.442 - 0.474) | 0.821 (0.811 - 0.834) | 0.859 (0.850 - 0.873) | 0.735 (0.717 - 0.751) | 0.000 (0.000 - 0.000) | 0.000 (0.000 - 0.000) | 0.000 (0.000 - 0.000) | |
Xiao_FMSG_task4a_6 | Xiao_FMSG_task4a_6_ensemble_model | Xiao2023 | 0.551 (0.541 - 0.561) | 0.595 (0.581 - 0.612) | 0.464 (0.449 - 0.479) | 0.829 (0.819 - 0.842) | 0.867 (0.856 - 0.880) | 0.741 (0.724 - 0.760) | 0.599 (0.590 - 0.609) | 0.643 (0.630 - 0.657) | 0.495 (0.477 - 0.512) | |
Xiao_FMSG_task4a_7 | Xiao_FMSG_task4a_7_ensemble_model | Xiao2023 | 0.075 (0.066 - 0.084) | 0.088 (0.074 - 0.101) | 0.074 (0.059 - 0.088) | 0.811 (0.800 - 0.822) | 0.847 (0.837 - 0.861) | 0.731 (0.710 - 0.751) | 0.132 (0.126 - 0.138) | 0.140 (0.130 - 0.149) | 0.117 (0.106 - 0.128) | |
Xiao_FMSG_task4a_8 | Xiao_FMSG_task4a_8_ensemble_model | Xiao2023 | 0.549 (0.540 - 0.560) | 0.594 (0.578 - 0.613) | 0.464 (0.447 - 0.481) | 0.834 (0.824 - 0.847) | 0.870 (0.861 - 0.883) | 0.747 (0.728 - 0.762) | 0.602 (0.593 - 0.612) | 0.641 (0.627 - 0.656) | 0.509 (0.493 - 0.527) | |
Guan_HIT_task4a_1 | Guan_HIT_task4a_1 | Guan2023 | 0.536 (0.526 - 0.546) | 0.579 (0.565 - 0.598) | 0.445 (0.428 - 0.460) | 0.810 (0.800 - 0.822) | 0.851 (0.841 - 0.863) | 0.727 (0.709 - 0.747) | 0.559 (0.547 - 0.570) | 0.598 (0.585 - 0.614) | 0.465 (0.445 - 0.484) | |
Guan_HIT_task4a_2 | Guan_HIT_task4a_2 | Guan2023 | 0.082 (0.074 - 0.090) | 0.096 (0.086 - 0.107) | 0.055 (0.042 - 0.069) | 0.862 (0.852 - 0.872) | 0.899 (0.889 - 0.909) | 0.783 (0.764 - 0.802) | 0.143 (0.137 - 0.149) | 0.151 (0.140 - 0.160) | 0.127 (0.114 - 0.138) | |
Guan_HIT_task4a_3 | Guan_HIT_task4a_3 | Guan2023 | 0.526 (0.513 - 0.539) | 0.572 (0.552 - 0.590) | 0.435 (0.418 - 0.450) | 0.800 (0.788 - 0.813) | 0.840 (0.825 - 0.857) | 0.716 (0.695 - 0.735) | 0.548 (0.533 - 0.563) | 0.584 (0.567 - 0.604) | 0.462 (0.444 - 0.482) | |
Guan_HIT_task4a_4 | Guan_HIT_task4a_4 | Guan2023 | 0.082 (0.073 - 0.091) | 0.096 (0.083 - 0.107) | 0.057 (0.043 - 0.071) | 0.855 (0.844 - 0.867) | 0.890 (0.871 - 0.903) | 0.775 (0.756 - 0.796) | 0.142 (0.134 - 0.151) | 0.150 (0.139 - 0.160) | 0.126 (0.113 - 0.138) | |
Guan_HIT_task4a_5 | Guan_HIT_task4a_5 | Guan2023 | 0.488 (0.475 - 0.503) | 0.535 (0.517 - 0.554) | 0.394 (0.374 - 0.412) | 0.708 (0.696 - 0.720) | 0.758 (0.743 - 0.774) | 0.592 (0.568 - 0.619) | 0.511 (0.501 - 0.521) | 0.548 (0.533 - 0.561) | 0.422 (0.407 - 0.437) | |
Guan_HIT_task4a_6 | Guan_HIT_task4a_6 | Guan2023 | 0.088 (0.080 - 0.096) | 0.100 (0.088 - 0.110) | 0.059 (0.042 - 0.075) | 0.797 (0.787 - 0.810) | 0.838 (0.826 - 0.850) | 0.698 (0.679 - 0.720) | 0.137 (0.130 - 0.144) | 0.146 (0.136 - 0.156) | 0.115 (0.100 - 0.127) | |
Wang_XiaoRice_task4a_1 | SINGLE | Wang2023 | 0.494 (0.477 - 0.510) | 0.551 (0.532 - 0.574) | 0.380 (0.362 - 0.402) | 0.801 (0.789 - 0.815) | 0.838 (0.823 - 0.854) | 0.713 (0.686 - 0.742) | 0.487 (0.465 - 0.513) | 0.514 (0.491 - 0.543) | 0.423 (0.396 - 0.451) | |
Wang_XiaoRice_task4a_2 | SED Embed | Wang2023 | 0.497 (0.486 - 0.510) | 0.556 (0.538 - 0.576) | 0.387 (0.366 - 0.406) | 0.814 (0.803 - 0.828) | 0.849 (0.837 - 0.862) | 0.727 (0.704 - 0.753) | 0.482 (0.467 - 0.496) | 0.512 (0.492 - 0.532) | 0.413 (0.393 - 0.431) | |
Wang_XiaoRice_task4a_3 | L-TAG | Wang2023 | 0.088 (0.076 - 0.098) | 0.100 (0.086 - 0.113) | 0.069 (0.055 - 0.084) | 0.835 (0.824 - 0.844) | 0.864 (0.851 - 0.875) | 0.755 (0.733 - 0.772) | 0.122 (0.115 - 0.130) | 0.130 (0.120 - 0.142) | 0.103 (0.087 - 0.117) | |
Zhang_IOA_task4_1 | strong_ensemble | Zhang2023 | 0.622 (0.613 - 0.634) | 0.671 (0.657 - 0.687) | 0.523 (0.506 - 0.541) | 0.857 (0.849 - 0.866) | 0.892 (0.884 - 0.902) | 0.748 (0.728 - 0.770) | 0.666 (0.658 - 0.675) | 0.690 (0.679 - 0.700) | 0.616 (0.600 - 0.635) | |
Zhang_IOA_task4_2 | segment tagging model | Zhang2023 | 0.070 (0.060 - 0.080) | 0.078 (0.066 - 0.089) | 0.040 (0.027 - 0.054) | 0.903 (0.895 - 0.911) | 0.951 (0.946 - 0.957) | 0.800 (0.782 - 0.821) | 0.154 (0.147 - 0.161) | 0.164 (0.155 - 0.173) | 0.131 (0.118 - 0.142) | |
Zhang_IOA_task4_3 | strong_ensemble_all | Zhang2023 | 0.613 (0.603 - 0.625) | 0.669 (0.656 - 0.683) | 0.518 (0.503 - 0.535) | 0.828 (0.821 - 0.839) | 0.870 (0.860 - 0.882) | 0.743 (0.722 - 0.764) | 0.651 (0.643 - 0.659) | 0.677 (0.665 - 0.690) | 0.588 (0.572 - 0.606) | |
Zhang_IOA_task4_4 | strong_ensemble_1 | Zhang2023 | 0.625 (0.615 - 0.637) | 0.673 (0.659 - 0.689) | 0.526 (0.508 - 0.543) | 0.855 (0.847 - 0.864) | 0.891 (0.883 - 0.901) | 0.745 (0.725 - 0.767) | 0.668 (0.659 - 0.676) | 0.691 (0.680 - 0.701) | 0.619 (0.603 - 0.638) | |
Zhang_IOA_task4_5 | base system | Zhang2023 | 0.524 (0.513 - 0.537) | 0.565 (0.549 - 0.579) | 0.445 (0.421 - 0.472) | 0.774 (0.762 - 0.786) | 0.821 (0.804 - 0.837) | 0.672 (0.651 - 0.695) | 0.601 (0.591 - 0.610) | 0.630 (0.616 - 0.644) | 0.534 (0.513 - 0.553) | |
Zhang_IOA_task4_6 | strong_single | Zhang2023 | 0.562 (0.552 - 0.575) | 0.612 (0.597 - 0.626) | 0.467 (0.450 - 0.487) | 0.795 (0.786 - 0.805) | 0.848 (0.838 - 0.857) | 0.683 (0.661 - 0.703) | 0.626 (0.617 - 0.633) | 0.658 (0.646 - 0.669) | 0.550 (0.530 - 0.566) | |
Zhang_IOA_task4_7 | weak single | Zhang2023 | 0.055 (0.048 - 0.064) | 0.062 (0.050 - 0.074) | 0.030 (0.015 - 0.044) | 0.830 (0.820 - 0.842) | 0.882 (0.873 - 0.892) | 0.714 (0.690 - 0.735) | 0.129 (0.123 - 0.135) | 0.135 (0.126 - 0.143) | 0.115 (0.103 - 0.126) | |
Wu_NCUT_task4a_1 | Wu_NCUT_task4a_1 | Wu2023 | 0.391 (0.379 - 0.405) | 0.437 (0.423 - 0.458) | 0.295 (0.278 - 0.311) | 0.596 (0.584 - 0.610) | 0.638 (0.617 - 0.660) | 0.484 (0.463 - 0.505) | 0.466 (0.454 - 0.478) | 0.513 (0.500 - 0.527) | 0.347 (0.326 - 0.365) | |
Wu_NCUT_task4a_2 | Wu_NCUT_task4a_2 | Wu2023 | 0.519 (0.507 - 0.531) | 0.576 (0.560 - 0.596) | 0.429 (0.412 - 0.444) | 0.793 (0.783 - 0.806) | 0.840 (0.830 - 0.851) | 0.693 (0.672 - 0.716) | 0.587 (0.577 - 0.596) | 0.620 (0.608 - 0.635) | 0.506 (0.485 - 0.529) | |
Wu_NCUT_task4a_3 | Wu_NCUT_task4a_3 | Wu2023 | 0.497 (0.486 - 0.509) | 0.553 (0.537 - 0.575) | 0.418 (0.402 - 0.434) | 0.793 (0.783 - 0.806) | 0.840 (0.830 - 0.850) | 0.691 (0.669 - 0.715) | 0.572 (0.562 - 0.581) | 0.605 (0.592 - 0.618) | 0.491 (0.470 - 0.512) | |
Barahona_AUDIAS_task4a_1 | CRNN T++ resolution | Barahona2023 | 0.351 (0.333 - 0.372) | 0.394 (0.370 - 0.422) | 0.257 (0.236 - 0.275) | 0.562 (0.532 - 0.587) | 0.612 (0.586 - 0.640) | 0.434 (0.391 - 0.477) | 0.390 (0.373 - 0.414) | 0.422 (0.400 - 0.452) | 0.311 (0.288 - 0.329) | |
Barahona_AUDIAS_task4a_2 | CRNN T++ resolution with class-wise median filtering | Barahona2023 | 0.380 (0.361 - 0.406) | 0.427 (0.400 - 0.459) | 0.278 (0.257 - 0.296) | 0.575 (0.553 - 0.594) | 0.625 (0.604 - 0.650) | 0.444 (0.409 - 0.480) | 0.408 (0.389 - 0.432) | 0.442 (0.416 - 0.474) | 0.323 (0.302 - 0.341) | |
Barahona_AUDIAS_task4a_3 | Conformer F+ resolution | Barahona2023 | 0.200 (0.164 - 0.225) | 0.227 (0.185 - 0.256) | 0.153 (0.117 - 0.179) | 0.646 (0.626 - 0.664) | 0.681 (0.656 - 0.706) | 0.556 (0.525 - 0.590) | 0.163 (0.141 - 0.181) | 0.173 (0.148 - 0.192) | 0.146 (0.123 - 0.164) | |
Barahona_AUDIAS_task4a_4 | Conformer F+ resolution with class-wise median filtering | Barahona2023 | 0.141 (0.124 - 0.155) | 0.160 (0.136 - 0.179) | 0.105 (0.089 - 0.121) | 0.673 (0.652 - 0.700) | 0.708 (0.683 - 0.735) | 0.580 (0.550 - 0.610) | 0.155 (0.135 - 0.172) | 0.161 (0.137 - 0.180) | 0.144 (0.126 - 0.160) | |
Barahona_AUDIAS_task4a_5 | 4-Resolution CRNN | Barahona2023 | 0.378 (0.365 - 0.392) | 0.424 (0.407 - 0.442) | 0.287 (0.266 - 0.304) | 0.604 (0.590 - 0.622) | 0.655 (0.627 - 0.683) | 0.480 (0.450 - 0.511) | 0.427 (0.415 - 0.439) | 0.457 (0.442 - 0.473) | 0.349 (0.330 - 0.366) | |
Barahona_AUDIAS_task4a_6 | 4-Resolution CRNN with class-dependent median filtering | Barahona2023 | 0.401 (0.390 - 0.414) | 0.449 (0.433 - 0.466) | 0.300 (0.277 - 0.320) | 0.612 (0.596 - 0.630) | 0.664 (0.639 - 0.690) | 0.487 (0.449 - 0.525) | 0.435 (0.417 - 0.449) | 0.467 (0.448 - 0.483) | 0.358 (0.335 - 0.378) | |
Barahona_AUDIAS_task4a_7 | 5-Resolution Conformer | Barahona2023 | 0.274 (0.262 - 0.287) | 0.310 (0.295 - 0.325) | 0.227 (0.208 - 0.247) | 0.684 (0.671 - 0.699) | 0.718 (0.697 - 0.741) | 0.605 (0.585 - 0.628) | 0.239 (0.231 - 0.248) | 0.253 (0.240 - 0.264) | 0.209 (0.196 - 0.223) | |
Barahona_AUDIAS_task4a_8 | 5-Resolution Conformer with class-wise median filtering | Barahona2023 | 0.213 (0.201 - 0.226) | 0.239 (0.220 - 0.257) | 0.173 (0.154 - 0.188) | 0.729 (0.710 - 0.752) | 0.761 (0.736 - 0.789) | 0.648 (0.621 - 0.672) | 0.204 (0.197 - 0.213) | 0.215 (0.203 - 0.226) | 0.182 (0.169 - 0.196) | |
Gan_NCUT_task4_1 | Gan_NCUT_SED_system_1 | Gan2023 | 0.365 (0.353 - 0.377) | 0.403 (0.388 - 0.424) | 0.281 (0.260 - 0.298) | 0.603 (0.589 - 0.617) | 0.656 (0.636 - 0.676) | 0.473 (0.453 - 0.496) | 0.437 (0.426 - 0.448) | 0.469 (0.455 - 0.483) | 0.353 (0.331 - 0.371) | |
Gan_NCUT_task4_2 | Gan_NCUT_SED_system_2 | Gan2023 | 0.511 (0.498 - 0.524) | 0.562 (0.545 - 0.581) | 0.412 (0.396 - 0.429) | 0.799 (0.785 - 0.813) | 0.846 (0.835 - 0.858) | 0.699 (0.676 - 0.721) | 0.569 (0.553 - 0.584) | 0.604 (0.589 - 0.620) | 0.483 (0.457 - 0.510) | |
Gan_NCUT_task4_3 | Gan_NCUT_SED_system_3 | Gan2023 | 0.483 (0.467 - 0.498) | 0.531 (0.515 - 0.549) | 0.391 (0.374 - 0.414) | 0.816 (0.805 - 0.828) | 0.853 (0.842 - 0.865) | 0.729 (0.707 - 0.751) | 0.569 (0.558 - 0.579) | 0.604 (0.591 - 0.618) | 0.482 (0.461 - 0.501) | |
Liu_SRCN_task4a_1 | DCASE2023 t4a system1 | Chen2023a | 0.585 (0.572 - 0.598) | 0.636 (0.618 - 0.655) | 0.484 (0.467 - 0.501) | 0.817 (0.804 - 0.834) | 0.853 (0.839 - 0.870) | 0.725 (0.700 - 0.750) | 0.632 (0.620 - 0.642) | 0.672 (0.660 - 0.685) | 0.541 (0.525 - 0.556) | |
Liu_SRCN_task4a_2 | DCASE2023 t4a system2 | Chen2023a | 0.380 (0.369 - 0.392) | 0.419 (0.405 - 0.435) | 0.313 (0.297 - 0.328) | 0.877 (0.867 - 0.885) | 0.914 (0.907 - 0.920) | 0.806 (0.788 - 0.825) | 0.436 (0.426 - 0.446) | 0.465 (0.453 - 0.477) | 0.371 (0.355 - 0.389) | |
Liu_SRCN_task4a_3 | DCASE2023 t4a system3 | Chen2023a | 0.556 (0.544 - 0.569) | 0.601 (0.586 - 0.620) | 0.469 (0.451 - 0.487) | 0.861 (0.852 - 0.870) | 0.901 (0.893 - 0.909) | 0.775 (0.756 - 0.799) | 0.634 (0.623 - 0.645) | 0.671 (0.658 - 0.686) | 0.541 (0.523 - 0.557) | |
Liu_SRCN_task4a_4 | DCASE2023 t4a system4 | Chen2023a | 0.412 (0.400 - 0.424) | 0.450 (0.432 - 0.472) | 0.334 (0.314 - 0.352) | 0.663 (0.652 - 0.676) | 0.707 (0.690 - 0.724) | 0.556 (0.531 - 0.574) | 0.472 (0.462 - 0.480) | 0.504 (0.491 - 0.515) | 0.390 (0.370 - 0.407) | |
Liu_SRCN_task4a_5 | DCASE2023 t4a system5 | Chen2023a | 0.098 (0.086 - 0.108) | 0.110 (0.097 - 0.123) | 0.048 (0.033 - 0.067) | 0.851 (0.841 - 0.860) | 0.879 (0.869 - 0.890) | 0.770 (0.750 - 0.786) | 0.158 (0.151 - 0.165) | 0.165 (0.155 - 0.176) | 0.142 (0.129 - 0.154) | |
Kim_GIST-HanwhaVision_task4a_1 | DCASE2023 FDY-LKA CRNN without external single | Kim2023 | 0.459 (0.431 - 0.484) | 0.504 (0.472 - 0.533) | 0.368 (0.330 - 0.400) | 0.701 (0.681 - 0.720) | 0.750 (0.732 - 0.771) | 0.590 (0.556 - 0.625) | 0.545 (0.530 - 0.564) | 0.582 (0.567 - 0.602) | 0.453 (0.434 - 0.474) | |
Kim_GIST-HanwhaVision_task4a_2 | FDYLKA BEATs pool1d Stage2 | Kim2023 | 0.591 (0.574 - 0.611) | 0.645 (0.624 - 0.668) | 0.489 (0.466 - 0.515) | 0.831 (0.823 - 0.841) | 0.868 (0.859 - 0.877) | 0.751 (0.733 - 0.768) | 0.646 (0.634 - 0.658) | 0.684 (0.670 - 0.697) | 0.554 (0.534 - 0.573) | |
Kim_GIST-HanwhaVision_task4a_3 | LKAFDY BEATs Stage 2 interpolate | Kim2023 | 0.581 (0.553 - 0.600) | 0.633 (0.604 - 0.655) | 0.483 (0.456 - 0.503) | 0.835 (0.826 - 0.846) | 0.871 (0.862 - 0.881) | 0.754 (0.736 - 0.772) | 0.638 (0.622 - 0.654) | 0.675 (0.658 - 0.691) | 0.549 (0.522 - 0.572) | |
Kim_GIST-HanwhaVision_task4a_4 | FDYLKA BEATs pool 1d stage1 | Kim2023 | 0.576 (0.549 - 0.595) | 0.628 (0.595 - 0.654) | 0.479 (0.441 - 0.508) | 0.809 (0.797 - 0.821) | 0.854 (0.842 - 0.867) | 0.712 (0.693 - 0.731) | 0.612 (0.585 - 0.632) | 0.656 (0.624 - 0.681) | 0.503 (0.483 - 0.526) | |
Kim_GIST-HanwhaVision_task4a_5 | FDYLKA BEATs all ensemble 48 | Kim2023 | 0.611 (0.598 - 0.623) | 0.661 (0.647 - 0.678) | 0.511 (0.494 - 0.529) | 0.846 (0.838 - 0.855) | 0.883 (0.875 - 0.891) | 0.760 (0.743 - 0.777) | 0.655 (0.641 - 0.671) | 0.694 (0.676 - 0.715) | 0.558 (0.540 - 0.575) | |
Kim_GIST-HanwhaVision_task4a_6 | FDYLKA BEATs PSDS1 ensemble 16 | Kim2023 | 0.611 (0.590 - 0.628) | 0.661 (0.640 - 0.682) | 0.512 (0.485 - 0.535) | 0.841 (0.832 - 0.851) | 0.877 (0.867 - 0.887) | 0.762 (0.743 - 0.779) | 0.658 (0.639 - 0.674) | 0.698 (0.675 - 0.717) | 0.561 (0.540 - 0.583) | |
Kim_GIST-HanwhaVision_task4a_7 | FDYLKA BEATs PSDS2 ensemble 16 | Kim2023 | 0.591 (0.574 - 0.604) | 0.643 (0.621 - 0.662) | 0.489 (0.467 - 0.509) | 0.844 (0.835 - 0.853) | 0.882 (0.873 - 0.892) | 0.759 (0.741 - 0.776) | 0.638 (0.620 - 0.652) | 0.676 (0.655 - 0.696) | 0.543 (0.522 - 0.563) | |
Kim_GIST-HanwhaVision_task4a_8 | FDYLKA BEATs PSDS sum ensemble 16 | Kim2023 | 0.612 (0.599 - 0.626) | 0.659 (0.644 - 0.675) | 0.517 (0.500 - 0.536) | 0.841 (0.831 - 0.851) | 0.877 (0.867 - 0.887) | 0.759 (0.740 - 0.777) | 0.657 (0.644 - 0.671) | 0.696 (0.679 - 0.712) | 0.562 (0.543 - 0.581) | |
Wenxin_TJU_task4a_1 | ensemble-pretrained-psds1-0 | Wenxin2023 | 0.555 (0.543 - 0.566) | 0.608 (0.592 - 0.627) | 0.439 (0.425 - 0.454) | 0.837 (0.828 - 0.847) | 0.879 (0.871 - 0.888) | 0.738 (0.719 - 0.754) | 0.590 (0.581 - 0.600) | 0.626 (0.614 - 0.640) | 0.503 (0.488 - 0.519) | |
Wenxin_TJU_task4a_2 | ensemble-pretrained-psds1-1 | Wenxin2023 | 0.570 (0.559 - 0.580) | 0.623 (0.606 - 0.638) | 0.445 (0.429 - 0.458) | 0.844 (0.836 - 0.854) | 0.884 (0.876 - 0.894) | 0.752 (0.732 - 0.770) | 0.603 (0.595 - 0.612) | 0.641 (0.629 - 0.653) | 0.513 (0.495 - 0.531) | |
Wenxin_TJU_task4a_3 | ensemble-pretrained-psds2-0 | Wenxin2023 | 0.080 (0.071 - 0.088) | 0.095 (0.083 - 0.105) | 0.075 (0.063 - 0.088) | 0.815 (0.802 - 0.825) | 0.844 (0.833 - 0.860) | 0.723 (0.700 - 0.744) | 0.141 (0.135 - 0.147) | 0.144 (0.132 - 0.152) | 0.137 (0.125 - 0.150) | |
Wenxin_TJU_task4a_4 | ensemble-pretrained-psds2-1 | Wenxin2023 | 0.081 (0.071 - 0.090) | 0.095 (0.085 - 0.105) | 0.078 (0.063 - 0.093) | 0.838 (0.828 - 0.849) | 0.867 (0.857 - 0.878) | 0.760 (0.738 - 0.780) | 0.143 (0.136 - 0.150) | 0.150 (0.140 - 0.158) | 0.129 (0.117 - 0.140) | |
Wenxin_TJU_task4a_5 | single-pretrained-psds1-0 | Wenxin2023 | 0.539 (0.528 - 0.549) | 0.598 (0.581 - 0.614) | 0.423 (0.404 - 0.437) | 0.816 (0.806 - 0.831) | 0.858 (0.848 - 0.870) | 0.710 (0.688 - 0.733) | 0.569 (0.559 - 0.577) | 0.605 (0.594 - 0.614) | 0.481 (0.460 - 0.501) | |
Wenxin_TJU_task4a_6 | single-pretrained-psds1-1 | Wenxin2023 | 0.546 (0.536 - 0.556) | 0.596 (0.583 - 0.611) | 0.432 (0.418 - 0.448) | 0.831 (0.823 - 0.842) | 0.875 (0.868 - 0.884) | 0.735 (0.715 - 0.754) | 0.582 (0.574 - 0.589) | 0.615 (0.603 - 0.626) | 0.498 (0.481 - 0.515) | |
Wenxin_TJU_task4a_7 | single-psds1 | Wenxin2023 | 0.440 (0.429 - 0.454) | 0.491 (0.472 - 0.508) | 0.331 (0.314 - 0.349) | 0.686 (0.673 - 0.699) | 0.730 (0.711 - 0.751) | 0.567 (0.547 - 0.588) | 0.504 (0.497 - 0.514) | 0.547 (0.533 - 0.561) | 0.397 (0.379 - 0.413) | |
Wenxin_TJU_task4a_8 | single-psds2 | Wenxin2023 | 0.059 (0.049 - 0.068) | 0.076 (0.064 - 0.086) | 0.054 (0.041 - 0.067) | 0.707 (0.694 - 0.723) | 0.739 (0.723 - 0.758) | 0.634 (0.610 - 0.654) | 0.131 (0.125 - 0.137) | 0.140 (0.131 - 0.148) | 0.116 (0.105 - 0.127) |
Class-wise performance
PSDS scenario 1
Rank |
Submission code |
Submission name |
Technical Report |
Ranking score (Evaluation dataset) |
Alarm Bell Ringing |
Blender | Cat | Dishes | Dog |
Electric shave toothbrush |
Frying |
Running water |
Speech |
Vacuum cleaner |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Baseline_task4a_1 | DCASE2023 baseline system | Turpault2023 | 1.00 | 0.389 (0.345 - 0.443) | 0.654 (0.604 - 0.702) | 0.574 (0.515 - 0.622) | 0.168 (0.147 - 0.197) | 0.311 (0.286 - 0.340) | 0.546 (0.481 - 0.627) | 0.620 (0.531 - 0.688) | 0.408 (0.371 - 0.454) | 0.744 (0.713 - 0.766) | 0.699 (0.639 - 0.748) | |
Baseline_task4a_2 | DCASE2023 baseline system (Audioset+Beats) | Turpault2023 | 1.52 | 0.662 (0.608 - 0.736) | 0.871 (0.832 - 0.909) | 0.771 (0.741 - 0.799) | 0.286 (0.269 - 0.303) | 0.495 (0.458 - 0.533) | 0.778 (0.709 - 0.837) | 0.773 (0.716 - 0.815) | 0.641 (0.608 - 0.675) | 0.782 (0.769 - 0.796) | 0.864 (0.837 - 0.894) | |
Li_USTC_task4a_1 | TAFT and SdMT | Li2023 | 1.54 | 0.708 (0.666 - 0.752) | 0.800 (0.765 - 0.834) | 0.807 (0.786 - 0.825) | 0.380 (0.362 - 0.401) | 0.481 (0.453 - 0.510) | 0.773 (0.745 - 0.805) | 0.819 (0.792 - 0.842) | 0.597 (0.566 - 0.631) | 0.781 (0.767 - 0.796) | 0.863 (0.838 - 0.885) | |
Li_USTC_task4a_2 | Pseudo labeling | Li2023 | 1.58 | 0.704 (0.661 - 0.757) | 0.829 (0.799 - 0.858) | 0.828 (0.807 - 0.846) | 0.399 (0.380 - 0.421) | 0.475 (0.448 - 0.501) | 0.862 (0.833 - 0.893) | 0.800 (0.759 - 0.839) | 0.643 (0.604 - 0.678) | 0.804 (0.791 - 0.817) | 0.860 (0.834 - 0.885) | |
Li_USTC_task4a_3 | TAFT and AFL | Li2023 | 1.54 | 0.729 (0.698 - 0.770) | 0.831 (0.801 - 0.856) | 0.819 (0.799 - 0.836) | 0.379 (0.362 - 0.399) | 0.494 (0.465 - 0.523) | 0.802 (0.774 - 0.826) | 0.807 (0.786 - 0.827) | 0.608 (0.576 - 0.640) | 0.764 (0.750 - 0.779) | 0.893 (0.872 - 0.916) | |
Li_USTC_task4a_4 | MaxFilter | Li2023 | 0.89 | 0.248 (0.219 - 0.281) | 0.286 (0.243 - 0.322) | 0.066 (0.051 - 0.081) | 0.018 (0.011 - 0.023) | 0.089 (0.062 - 0.110) | 0.497 (0.459 - 0.541) | 0.720 (0.682 - 0.758) | 0.485 (0.448 - 0.524) | 0.076 (0.063 - 0.089) | 0.787 (0.758 - 0.816) | |
Li_USTC_task4a_5 | TAFT and SdMT and single | Li2023 | 1.52 | 0.708 (0.665 - 0.756) | 0.775 (0.741 - 0.813) | 0.806 (0.784 - 0.825) | 0.377 (0.358 - 0.398) | 0.480 (0.442 - 0.520) | 0.774 (0.746 - 0.808) | 0.806 (0.764 - 0.837) | 0.570 (0.535 - 0.608) | 0.777 (0.761 - 0.793) | 0.857 (0.816 - 0.892) | |
Li_USTC_task4a_6 | Pseudo labeling and single | Li2023 | 1.56 | 0.682 (0.623 - 0.743) | 0.819 (0.789 - 0.852) | 0.830 (0.807 - 0.852) | 0.400 (0.377 - 0.429) | 0.469 (0.433 - 0.505) | 0.851 (0.825 - 0.884) | 0.797 (0.754 - 0.846) | 0.613 (0.556 - 0.659) | 0.802 (0.789 - 0.817) | 0.856 (0.830 - 0.885) | |
Li_USTC_task4a_7 | SKCRNN MT | Li2023 | 1.20 | 0.482 (0.422 - 0.536) | 0.718 (0.677 - 0.757) | 0.656 (0.624 - 0.690) | 0.235 (0.214 - 0.261) | 0.322 (0.287 - 0.369) | 0.676 (0.644 - 0.710) | 0.663 (0.630 - 0.697) | 0.555 (0.526 - 0.587) | 0.756 (0.742 - 0.769) | 0.771 (0.736 - 0.812) | |
Liu_NSYSU_task4_1 | DCASE2023 FDY_WeakSED_Ensemble | Liu2023 | 0.80 | 0.190 (0.163 - 0.220) | 0.268 (0.230 - 0.303) | 0.043 (0.030 - 0.056) | 0.012 (0.004 - 0.016) | 0.087 (0.059 - 0.107) | 0.485 (0.441 - 0.527) | 0.714 (0.674 - 0.748) | 0.460 (0.429 - 0.496) | 0.092 (0.078 - 0.107) | 0.770 (0.740 - 0.802) | |
Liu_NSYSU_task4_2 | FDY_Ensemble | Liu2023 | 1.36 | 0.554 (0.508 - 0.615) | 0.781 (0.746 - 0.815) | 0.701 (0.671 - 0.729) | 0.277 (0.262 - 0.293) | 0.461 (0.438 - 0.497) | 0.683 (0.648 - 0.722) | 0.757 (0.728 - 0.783) | 0.547 (0.520 - 0.578) | 0.795 (0.785 - 0.807) | 0.818 (0.790 - 0.848) | |
Liu_NSYSU_task4_3 | DCASE2023 VGGSK_Single | Liu2023 | 1.26 | 0.555 (0.514 - 0.612) | 0.719 (0.684 - 0.755) | 0.675 (0.643 - 0.708) | 0.239 (0.228 - 0.255) | 0.435 (0.394 - 0.474) | 0.677 (0.629 - 0.737) | 0.735 (0.694 - 0.785) | 0.494 (0.459 - 0.527) | 0.756 (0.741 - 0.774) | 0.719 (0.689 - 0.748) | |
Liu_NSYSU_task4_4 | DCASE2023 FDY_Single | Liu2023 | 1.24 | 0.488 (0.440 - 0.549) | 0.765 (0.708 - 0.814) | 0.670 (0.634 - 0.707) | 0.244 (0.217 - 0.284) | 0.380 (0.332 - 0.441) | 0.654 (0.622 - 0.691) | 0.690 (0.641 - 0.743) | 0.507 (0.432 - 0.564) | 0.742 (0.727 - 0.758) | 0.812 (0.783 - 0.843) | |
Liu_NSYSU_task4_5 | DCASE2023 FDY_BEATs_WeakSED | Liu2023 | 0.82 | 0.199 (0.169 - 0.230) | 0.290 (0.249 - 0.329) | 0.052 (0.035 - 0.068) | 0.010 (0.003 - 0.014) | 0.086 (0.059 - 0.107) | 0.509 (0.468 - 0.553) | 0.733 (0.695 - 0.771) | 0.507 (0.471 - 0.550) | 0.031 (0.025 - 0.037) | 0.786 (0.757 - 0.816) | |
Liu_NSYSU_task4_6 | DCASE2023 FDY_BEATs | Liu2023 | 1.62 | 0.684 (0.634 - 0.750) | 0.908 (0.888 - 0.927) | 0.797 (0.773 - 0.820) | 0.340 (0.324 - 0.356) | 0.590 (0.563 - 0.618) | 0.731 (0.703 - 0.765) | 0.812 (0.786 - 0.842) | 0.652 (0.617 - 0.690) | 0.801 (0.787 - 0.814) | 0.886 (0.861 - 0.908) | |
Liu_NSYSU_task4_7 | DCASE2023 FDY_BEATs | Liu2023 | 1.55 | 0.696 (0.656 - 0.746) | 0.831 (0.795 - 0.865) | 0.772 (0.747 - 0.800) | 0.318 (0.301 - 0.334) | 0.552 (0.527 - 0.582) | 0.685 (0.644 - 0.727) | 0.805 (0.775 - 0.834) | 0.600 (0.563 - 0.642) | 0.730 (0.713 - 0.747) | 0.860 (0.831 - 0.890) | |
Liu_NSYSU_task4_8 | DCASE2023 FDY_BEATs | Liu2023 | 1.53 | 0.667 (0.576 - 0.735) | 0.866 (0.827 - 0.898) | 0.774 (0.745 - 0.800) | 0.312 (0.291 - 0.332) | 0.551 (0.523 - 0.581) | 0.660 (0.597 - 0.718) | 0.818 (0.791 - 0.846) | 0.607 (0.568 - 0.649) | 0.755 (0.719 - 0.788) | 0.872 (0.835 - 0.905) | |
Lee_CAU_task4A_1 | CAU_ET | Lee2023 | 1.24 | 0.528 (0.482 - 0.596) | 0.752 (0.717 - 0.781) | 0.673 (0.637 - 0.703) | 0.252 (0.235 - 0.272) | 0.408 (0.381 - 0.443) | 0.631 (0.599 - 0.669) | 0.622 (0.590 - 0.658) | 0.515 (0.485 - 0.547) | 0.789 (0.773 - 0.803) | 0.783 (0.752 - 0.821) | |
Lee_CAU_task4A_2 | CAU_ET | Lee2023 | 0.79 | 0.229 (0.181 - 0.289) | 0.311 (0.258 - 0.369) | 0.143 (0.106 - 0.182) | 0.032 (0.025 - 0.040) | 0.113 (0.087 - 0.137) | 0.449 (0.402 - 0.499) | 0.619 (0.578 - 0.663) | 0.452 (0.414 - 0.489) | 0.166 (0.136 - 0.200) | 0.694 (0.646 - 0.741) | |
Cheimariotis_DUTH_task4a_1 | DuthApida | Cheimariotis2023 | 1.53 | 0.652 (0.600 - 0.722) | 0.867 (0.838 - 0.892) | 0.786 (0.760 - 0.809) | 0.321 (0.303 - 0.339) | 0.497 (0.473 - 0.528) | 0.824 (0.797 - 0.856) | 0.788 (0.762 - 0.822) | 0.592 (0.553 - 0.628) | 0.775 (0.758 - 0.793) | 0.865 (0.836 - 0.892) | |
Cheimariotis_DUTH_task4a_2 | DuthApida | Cheimariotis2023 | 1.45 | 0.485 (0.441 - 0.542) | 0.815 (0.783 - 0.846) | 0.781 (0.751 - 0.809) | 0.316 (0.302 - 0.332) | 0.519 (0.494 - 0.552) | 0.789 (0.758 - 0.820) | 0.707 (0.671 - 0.745) | 0.577 (0.540 - 0.610) | 0.814 (0.797 - 0.831) | 0.870 (0.839 - 0.898) | |
Chen_CHT_task4_1 | VGGSK | Chen2023b | 1.25 | 0.523 (0.385 - 0.629) | 0.704 (0.660 - 0.745) | 0.666 (0.622 - 0.707) | 0.271 (0.224 - 0.307) | 0.468 (0.437 - 0.500) | 0.707 (0.659 - 0.751) | 0.771 (0.717 - 0.815) | 0.471 (0.438 - 0.505) | 0.730 (0.717 - 0.746) | 0.830 (0.789 - 0.870) | |
Chen_CHT_task4_2 | VGGSK+BEATs | Chen2023b | 1.58 | 0.679 (0.616 - 0.730) | 0.876 (0.841 - 0.904) | 0.802 (0.777 - 0.825) | 0.376 (0.356 - 0.395) | 0.596 (0.565 - 0.629) | 0.812 (0.782 - 0.843) | 0.804 (0.768 - 0.844) | 0.598 (0.569 - 0.630) | 0.824 (0.812 - 0.839) | 0.891 (0.856 - 0.921) | |
Chen_CHT_task4_3 | VGGSK+BEATs | Chen2023b | 1.66 | 0.720 (0.682 - 0.765) | 0.899 (0.879 - 0.918) | 0.821 (0.799 - 0.841) | 0.395 (0.375 - 0.413) | 0.615 (0.591 - 0.641) | 0.860 (0.834 - 0.886) | 0.830 (0.801 - 0.861) | 0.671 (0.644 - 0.700) | 0.837 (0.824 - 0.850) | 0.913 (0.893 - 0.933) | |
Chen_CHT_task4_4 | multi+BEATs | Chen2023b | 1.66 | 0.692 (0.648 - 0.749) | 0.896 (0.875 - 0.919) | 0.832 (0.811 - 0.852) | 0.384 (0.366 - 0.400) | 0.610 (0.587 - 0.634) | 0.859 (0.834 - 0.883) | 0.803 (0.776 - 0.829) | 0.688 (0.656 - 0.719) | 0.839 (0.828 - 0.853) | 0.912 (0.895 - 0.934) | |
Xiao_FMSG_task4a_1 | Xiao_FMSG_task4a_1_single_model_without_external | Zhang2023 | 1.23 | 0.393 (0.356 - 0.440) | 0.746 (0.712 - 0.777) | 0.690 (0.662 - 0.717) | 0.228 (0.214 - 0.241) | 0.396 (0.374 - 0.431) | 0.711 (0.681 - 0.750) | 0.720 (0.694 - 0.746) | 0.514 (0.487 - 0.541) | 0.787 (0.776 - 0.799) | 0.824 (0.800 - 0.851) | |
Xiao_FMSG_task4a_2 | Xiao_FMSG_task4a_2_single_model | Xiao2023 | 1.55 | 0.635 (0.591 - 0.713) | 0.849 (0.819 - 0.889) | 0.774 (0.754 - 0.790) | 0.303 (0.286 - 0.321) | 0.513 (0.483 - 0.551) | 0.786 (0.758 - 0.819) | 0.864 (0.843 - 0.888) | 0.647 (0.617 - 0.673) | 0.798 (0.787 - 0.813) | 0.899 (0.878 - 0.923) | |
Xiao_FMSG_task4a_3 | Xiao_FMSG_task4a_3_single_model_psds2 | Xiao2023 | 0.86 | 0.196 (0.169 - 0.225) | 0.359 (0.326 - 0.389) | 0.071 (0.056 - 0.084) | 0.042 (0.035 - 0.050) | 0.095 (0.071 - 0.114) | 0.523 (0.488 - 0.565) | 0.702 (0.667 - 0.730) | 0.445 (0.412 - 0.480) | 0.095 (0.088 - 0.106) | 0.787 (0.760 - 0.817) | |
Xiao_FMSG_task4a_4 | Xiao_FMSG_task4a_4_single_model | Xiao2023 | 1.60 | 0.692 (0.658 - 0.736) | 0.818 (0.788 - 0.855) | 0.790 (0.771 - 0.810) | 0.360 (0.345 - 0.376) | 0.544 (0.517 - 0.577) | 0.774 (0.749 - 0.805) | 0.759 (0.732 - 0.788) | 0.624 (0.592 - 0.657) | 0.813 (0.803 - 0.823) | 0.888 (0.864 - 0.915) | |
Xiao_FMSG_task4a_5 | Xiao_FMSG_task4a_5_ensemble_model | Xiao2023 | 1.61 | 0.685 (0.641 - 0.739) | 0.837 (0.806 - 0.873) | 0.790 (0.766 - 0.811) | 0.362 (0.345 - 0.377) | 0.553 (0.524 - 0.583) | 0.796 (0.771 - 0.822) | 0.770 (0.742 - 0.801) | 0.634 (0.604 - 0.663) | 0.811 (0.799 - 0.823) | 0.887 (0.861 - 0.909) | |
Xiao_FMSG_task4a_6 | Xiao_FMSG_task4a_6_ensemble_model | Xiao2023 | 1.61 | 0.673 (0.629 - 0.733) | 0.837 (0.810 - 0.871) | 0.810 (0.788 - 0.830) | 0.334 (0.317 - 0.347) | 0.556 (0.529 - 0.588) | 0.815 (0.792 - 0.844) | 0.808 (0.784 - 0.835) | 0.640 (0.607 - 0.673) | 0.823 (0.813 - 0.834) | 0.905 (0.886 - 0.929) | |
Xiao_FMSG_task4a_7 | Xiao_FMSG_task4a_7_ensemble_model | Xiao2023 | 0.87 | 0.202 (0.175 - 0.231) | 0.381 (0.344 - 0.416) | 0.072 (0.057 - 0.086) | 0.043 (0.033 - 0.051) | 0.097 (0.073 - 0.117) | 0.510 (0.471 - 0.553) | 0.704 (0.669 - 0.733) | 0.464 (0.427 - 0.501) | 0.099 (0.092 - 0.109) | 0.791 (0.764 - 0.821) | |
Xiao_FMSG_task4a_8 | Xiao_FMSG_task4a_8_ensemble_model | Xiao2023 | 1.62 | 0.678 (0.634 - 0.752) | 0.850 (0.820 - 0.889) | 0.803 (0.783 - 0.822) | 0.332 (0.315 - 0.349) | 0.553 (0.522 - 0.586) | 0.821 (0.799 - 0.849) | 0.804 (0.777 - 0.833) | 0.640 (0.613 - 0.669) | 0.816 (0.805 - 0.827) | 0.899 (0.878 - 0.925) | |
Guan_HIT_task4a_1 | Guan_HIT_task4a_1 | Guan2023 | 1.57 | 0.698 (0.661 - 0.740) | 0.851 (0.820 - 0.887) | 0.813 (0.791 - 0.839) | 0.322 (0.308 - 0.336) | 0.466 (0.436 - 0.500) | 0.850 (0.831 - 0.878) | 0.838 (0.811 - 0.867) | 0.668 (0.634 - 0.699) | 0.792 (0.779 - 0.806) | 0.905 (0.884 - 0.925) | |
Guan_HIT_task4a_2 | Guan_HIT_task4a_2 | Guan2023 | 0.93 | 0.269 (0.235 - 0.311) | 0.494 (0.456 - 0.533) | 0.063 (0.049 - 0.076) | 0.024 (0.016 - 0.030) | 0.090 (0.068 - 0.109) | 0.621 (0.590 - 0.662) | 0.777 (0.744 - 0.803) | 0.509 (0.477 - 0.546) | 0.089 (0.081 - 0.099) | 0.865 (0.839 - 0.890) | |
Guan_HIT_task4a_3 | Guan_HIT_task4a_3 | Guan2023 | 1.55 | 0.686 (0.638 - 0.739) | 0.842 (0.811 - 0.879) | 0.806 (0.781 - 0.831) | 0.320 (0.306 - 0.335) | 0.452 (0.417 - 0.488) | 0.843 (0.819 - 0.874) | 0.822 (0.773 - 0.868) | 0.651 (0.619 - 0.686) | 0.781 (0.764 - 0.797) | 0.901 (0.878 - 0.922) | |
Guan_HIT_task4a_4 | Guan_HIT_task4a_4 | Guan2023 | 0.92 | 0.269 (0.235 - 0.310) | 0.497 (0.458 - 0.535) | 0.064 (0.049 - 0.078) | 0.025 (0.017 - 0.030) | 0.090 (0.066 - 0.110) | 0.617 (0.584 - 0.663) | 0.765 (0.731 - 0.798) | 0.498 (0.459 - 0.539) | 0.091 (0.069 - 0.109) | 0.865 (0.839 - 0.890) | |
Guan_HIT_task4a_5 | Guan_HIT_task4a_5 | Guan2023 | 1.40 | 0.500 (0.455 - 0.556) | 0.777 (0.745 - 0.822) | 0.760 (0.731 - 0.789) | 0.326 (0.310 - 0.342) | 0.461 (0.434 - 0.491) | 0.744 (0.711 - 0.793) | 0.738 (0.709 - 0.771) | 0.621 (0.590 - 0.651) | 0.811 (0.801 - 0.823) | 0.887 (0.859 - 0.913) | |
Guan_HIT_task4a_6 | Guan_HIT_task4a_6 | Guan2023 | 0.88 | 0.226 (0.194 - 0.266) | 0.447 (0.410 - 0.478) | 0.049 (0.035 - 0.063) | 0.021 (0.015 - 0.025) | 0.093 (0.070 - 0.114) | 0.558 (0.520 - 0.607) | 0.715 (0.683 - 0.747) | 0.512 (0.477 - 0.550) | 0.185 (0.167 - 0.205) | 0.847 (0.819 - 0.874) | |
Wang_XiaoRice_task4a_1 | SINGLE | Wang2023 | 1.50 | 0.553 (0.497 - 0.622) | 0.816 (0.788 - 0.845) | 0.753 (0.723 - 0.778) | 0.334 (0.310 - 0.355) | 0.493 (0.467 - 0.523) | 0.703 (0.621 - 0.802) | 0.742 (0.695 - 0.781) | 0.595 (0.566 - 0.627) | 0.767 (0.751 - 0.784) | 0.870 (0.843 - 0.896) | |
Wang_XiaoRice_task4a_2 | SED Embed | Wang2023 | 1.52 | 0.524 (0.476 - 0.580) | 0.833 (0.808 - 0.859) | 0.732 (0.699 - 0.759) | 0.341 (0.323 - 0.360) | 0.477 (0.452 - 0.504) | 0.671 (0.637 - 0.709) | 0.749 (0.723 - 0.779) | 0.666 (0.632 - 0.700) | 0.785 (0.772 - 0.798) | 0.833 (0.800 - 0.863) | |
Wang_XiaoRice_task4a_3 | L-TAG | Wang2023 | 0.91 | 0.214 (0.185 - 0.246) | 0.310 (0.272 - 0.348) | 0.068 (0.050 - 0.087) | 0.022 (0.014 - 0.027) | 0.114 (0.086 - 0.137) | 0.512 (0.473 - 0.556) | 0.728 (0.691 - 0.767) | 0.521 (0.485 - 0.561) | 0.208 (0.191 - 0.226) | 0.792 (0.761 - 0.822) | |
Zhang_IOA_task4_1 | strong_ensemble | Zhang2023 | 1.75 | 0.786 (0.747 - 0.831) | 0.925 (0.908 - 0.940) | 0.911 (0.897 - 0.926) | 0.395 (0.377 - 0.413) | 0.588 (0.556 - 0.626) | 0.826 (0.802 - 0.856) | 0.878 (0.860 - 0.905) | 0.786 (0.762 - 0.814) | 0.855 (0.845 - 0.869) | 0.929 (0.912 - 0.947) | |
Zhang_IOA_task4_2 | segment tagging model | Zhang2023 | 0.95 | 0.220 (0.190 - 0.252) | 0.320 (0.280 - 0.357) | 0.055 (0.043 - 0.068) | 0.016 (0.010 - 0.019) | 0.109 (0.080 - 0.132) | 0.511 (0.470 - 0.555) | 0.744 (0.705 - 0.781) | 0.537 (0.506 - 0.577) | 0.105 (0.095 - 0.116) | 0.800 (0.769 - 0.834) | |
Zhang_IOA_task4_3 | strong_ensemble_all | Zhang2023 | 1.71 | 0.782 (0.744 - 0.833) | 0.919 (0.902 - 0.935) | 0.888 (0.874 - 0.906) | 0.425 (0.406 - 0.447) | 0.575 (0.543 - 0.611) | 0.818 (0.793 - 0.847) | 0.878 (0.859 - 0.902) | 0.701 (0.673 - 0.725) | 0.817 (0.806 - 0.830) | 0.921 (0.903 - 0.940) | |
Zhang_IOA_task4_4 | strong_ensemble_1 | Zhang2023 | 1.75 | 0.786 (0.747 - 0.831) | 0.925 (0.908 - 0.940) | 0.911 (0.897 - 0.926) | 0.404 (0.384 - 0.423) | 0.588 (0.556 - 0.626) | 0.826 (0.802 - 0.856) | 0.878 (0.860 - 0.905) | 0.786 (0.762 - 0.814) | 0.855 (0.845 - 0.869) | 0.929 (0.912 - 0.947) | |
Zhang_IOA_task4_5 | base system | Zhang2023 | 1.52 | 0.718 (0.675 - 0.769) | 0.887 (0.857 - 0.919) | 0.861 (0.845 - 0.876) | 0.344 (0.325 - 0.364) | 0.400 (0.368 - 0.433) | 0.795 (0.767 - 0.829) | 0.834 (0.809 - 0.860) | 0.647 (0.615 - 0.677) | 0.779 (0.760 - 0.797) | 0.899 (0.879 - 0.922) | |
Zhang_IOA_task4_6 | strong_single | Zhang2023 | 1.60 | 0.747 (0.713 - 0.796) | 0.912 (0.893 - 0.934) | 0.873 (0.856 - 0.893) | 0.391 (0.374 - 0.410) | 0.452 (0.419 - 0.489) | 0.807 (0.784 - 0.839) | 0.856 (0.836 - 0.878) | 0.674 (0.642 - 0.702) | 0.801 (0.788 - 0.814) | 0.928 (0.912 - 0.947) | |
Zhang_IOA_task4_7 | weak single | Zhang2023 | 0.86 | 0.196 (0.166 - 0.226) | 0.294 (0.261 - 0.327) | 0.059 (0.041 - 0.074) | 0.013 (0.007 - 0.017) | 0.090 (0.064 - 0.112) | 0.497 (0.459 - 0.536) | 0.717 (0.688 - 0.744) | 0.468 (0.434 - 0.505) | 0.085 (0.075 - 0.096) | 0.799 (0.765 - 0.829) | |
Wu_NCUT_task4a_1 | Wu_NCUT_task4a_1 | Wu2023 | 1.15 | 0.379 (0.344 - 0.427) | 0.670 (0.634 - 0.698) | 0.640 (0.610 - 0.670) | 0.243 (0.228 - 0.260) | 0.355 (0.332 - 0.394) | 0.689 (0.660 - 0.719) | 0.714 (0.690 - 0.739) | 0.521 (0.494 - 0.552) | 0.752 (0.741 - 0.767) | 0.768 (0.744 - 0.800) | |
Wu_NCUT_task4a_2 | Wu_NCUT_task4a_2 | Wu2023 | 1.53 | 0.495 (0.455 - 0.547) | 0.824 (0.800 - 0.853) | 0.775 (0.746 - 0.799) | 0.334 (0.321 - 0.347) | 0.558 (0.533 - 0.585) | 0.837 (0.812 - 0.859) | 0.840 (0.817 - 0.862) | 0.645 (0.618 - 0.671) | 0.809 (0.795 - 0.824) | 0.893 (0.870 - 0.915) | |
Wu_NCUT_task4a_3 | Wu_NCUT_task4a_3 | Wu2023 | 1.50 | 0.490 (0.449 - 0.544) | 0.830 (0.808 - 0.856) | 0.775 (0.746 - 0.800) | 0.297 (0.283 - 0.312) | 0.500 (0.478 - 0.534) | 0.837 (0.813 - 0.859) | 0.841 (0.819 - 0.863) | 0.646 (0.619 - 0.671) | 0.816 (0.804 - 0.831) | 0.893 (0.870 - 0.915) | |
Barahona_AUDIAS_task4a_1 | CRNN T++ resolution | Barahona2023 | 1.06 | 0.418 (0.373 - 0.475) | 0.657 (0.621 - 0.692) | 0.597 (0.548 - 0.636) | 0.194 (0.168 - 0.224) | 0.341 (0.316 - 0.370) | 0.518 (0.451 - 0.612) | 0.658 (0.619 - 0.696) | 0.429 (0.404 - 0.460) | 0.766 (0.749 - 0.781) | 0.716 (0.680 - 0.752) | |
Barahona_AUDIAS_task4a_2 | CRNN T++ resolution with class-wise median filtering | Barahona2023 | 1.12 | 0.437 (0.390 - 0.487) | 0.661 (0.625 - 0.697) | 0.596 (0.546 - 0.637) | 0.224 (0.199 - 0.255) | 0.410 (0.379 - 0.438) | 0.522 (0.459 - 0.606) | 0.669 (0.628 - 0.707) | 0.450 (0.412 - 0.482) | 0.763 (0.745 - 0.779) | 0.733 (0.697 - 0.768) | |
Barahona_AUDIAS_task4a_3 | Conformer F+ resolution | Barahona2023 | 0.91 | 0.265 (0.210 - 0.318) | 0.504 (0.467 - 0.547) | 0.291 (0.194 - 0.361) | 0.053 (0.041 - 0.064) | 0.128 (0.099 - 0.154) | 0.557 (0.457 - 0.650) | 0.604 (0.525 - 0.663) | 0.463 (0.423 - 0.500) | 0.581 (0.549 - 0.612) | 0.736 (0.705 - 0.768) | |
Barahona_AUDIAS_task4a_4 | Conformer F+ resolution with class-wise median filtering | Barahona2023 | 0.84 | 0.242 (0.198 - 0.293) | 0.457 (0.426 - 0.495) | 0.070 (0.053 - 0.088) | 0.029 (0.020 - 0.037) | 0.111 (0.083 - 0.132) | 0.546 (0.465 - 0.619) | 0.631 (0.556 - 0.687) | 0.482 (0.442 - 0.520) | 0.551 (0.465 - 0.610) | 0.752 (0.721 - 0.785) | |
Barahona_AUDIAS_task4a_5 | 4-Resolution CRNN | Barahona2023 | 1.14 | 0.420 (0.378 - 0.472) | 0.711 (0.669 - 0.750) | 0.650 (0.612 - 0.687) | 0.210 (0.193 - 0.228) | 0.358 (0.334 - 0.385) | 0.616 (0.559 - 0.691) | 0.675 (0.633 - 0.725) | 0.465 (0.437 - 0.497) | 0.782 (0.767 - 0.797) | 0.771 (0.729 - 0.818) | |
Barahona_AUDIAS_task4a_6 | 4-Resolution CRNN with class-dependent median filtering | Barahona2023 | 1.18 | 0.426 (0.393 - 0.466) | 0.703 (0.664 - 0.744) | 0.646 (0.607 - 0.683) | 0.238 (0.217 - 0.259) | 0.418 (0.391 - 0.444) | 0.615 (0.556 - 0.693) | 0.686 (0.647 - 0.733) | 0.479 (0.451 - 0.507) | 0.777 (0.762 - 0.794) | 0.774 (0.735 - 0.821) | |
Barahona_AUDIAS_task4a_7 | 5-Resolution Conformer | Barahona2023 | 1.06 | 0.275 (0.240 - 0.315) | 0.579 (0.534 - 0.618) | 0.489 (0.450 - 0.532) | 0.135 (0.123 - 0.151) | 0.189 (0.158 - 0.214) | 0.571 (0.532 - 0.613) | 0.659 (0.624 - 0.691) | 0.490 (0.454 - 0.527) | 0.742 (0.729 - 0.756) | 0.760 (0.731 - 0.791) | |
Barahona_AUDIAS_task4a_8 | 5-Resolution Conformer with class-wise median filtering | Barahona2023 | 1.00 | 0.263 (0.234 - 0.296) | 0.576 (0.531 - 0.627) | 0.281 (0.239 - 0.331) | 0.061 (0.052 - 0.069) | 0.137 (0.110 - 0.161) | 0.567 (0.528 - 0.604) | 0.679 (0.643 - 0.709) | 0.503 (0.468 - 0.538) | 0.662 (0.628 - 0.692) | 0.774 (0.743 - 0.803) | |
Gan_NCUT_task4_1 | Gan_NCUT_SED_system_1 | Gan2023 | 1.12 | 0.370 (0.329 - 0.416) | 0.705 (0.676 - 0.733) | 0.614 (0.569 - 0.647) | 0.224 (0.211 - 0.238) | 0.311 (0.285 - 0.340) | 0.658 (0.633 - 0.689) | 0.622 (0.589 - 0.663) | 0.500 (0.474 - 0.529) | 0.757 (0.745 - 0.769) | 0.757 (0.727 - 0.796) | |
Gan_NCUT_task4_2 | Gan_NCUT_SED_system_2 | Gan2023 | 1.52 | 0.565 (0.515 - 0.632) | 0.855 (0.825 - 0.884) | 0.783 (0.756 - 0.807) | 0.306 (0.290 - 0.322) | 0.498 (0.470 - 0.532) | 0.836 (0.811 - 0.864) | 0.788 (0.762 - 0.816) | 0.624 (0.593 - 0.654) | 0.827 (0.813 - 0.841) | 0.881 (0.859 - 0.906) | |
Gan_NCUT_task4_3 | Gan_NCUT_SED_system_3 | Gan2023 | 1.50 | 0.507 (0.458 - 0.568) | 0.842 (0.815 - 0.868) | 0.782 (0.757 - 0.804) | 0.291 (0.275 - 0.308) | 0.457 (0.427 - 0.492) | 0.826 (0.801 - 0.856) | 0.776 (0.752 - 0.806) | 0.633 (0.602 - 0.664) | 0.817 (0.804 - 0.831) | 0.880 (0.858 - 0.904) | |
Liu_SRCN_task4a_1 | DCASE2023 t4a system1 | Chen2023a | 1.65 | 0.682 (0.631 - 0.747) | 0.872 (0.847 - 0.893) | 0.836 (0.815 - 0.859) | 0.400 (0.384 - 0.413) | 0.554 (0.523 - 0.583) | 0.784 (0.757 - 0.816) | 0.902 (0.880 - 0.923) | 0.694 (0.660 - 0.723) | 0.830 (0.819 - 0.842) | 0.915 (0.896 - 0.936) | |
Liu_SRCN_task4a_2 | DCASE2023 t4a system2 | Chen2023a | 1.40 | 0.576 (0.520 - 0.634) | 0.851 (0.821 - 0.886) | 0.686 (0.655 - 0.715) | 0.124 (0.110 - 0.138) | 0.262 (0.238 - 0.296) | 0.823 (0.797 - 0.854) | 0.905 (0.886 - 0.924) | 0.687 (0.658 - 0.717) | 0.562 (0.547 - 0.576) | 0.916 (0.896 - 0.934) | |
Liu_SRCN_task4a_3 | DCASE2023 t4a system3 | Chen2023a | 1.65 | 0.697 (0.639 - 0.757) | 0.919 (0.900 - 0.942) | 0.836 (0.814 - 0.856) | 0.338 (0.323 - 0.354) | 0.465 (0.440 - 0.498) | 0.876 (0.856 - 0.900) | 0.892 (0.874 - 0.916) | 0.722 (0.688 - 0.751) | 0.851 (0.841 - 0.863) | 0.921 (0.902 - 0.941) | |
Liu_SRCN_task4a_4 | DCASE2023 t4a system4 | Chen2023a | 1.25 | 0.400 (0.367 - 0.446) | 0.730 (0.695 - 0.772) | 0.694 (0.669 - 0.718) | 0.251 (0.239 - 0.267) | 0.395 (0.366 - 0.422) | 0.681 (0.650 - 0.720) | 0.761 (0.732 - 0.788) | 0.538 (0.517 - 0.565) | 0.768 (0.755 - 0.784) | 0.864 (0.837 - 0.888) | |
Liu_SRCN_task4a_5 | DCASE2023 t4a system5 | Chen2023a | 0.94 | 0.230 (0.196 - 0.263) | 0.322 (0.279 - 0.363) | 0.057 (0.039 - 0.073) | 0.015 (0.004 - 0.020) | 0.118 (0.087 - 0.142) | 0.531 (0.490 - 0.573) | 0.747 (0.711 - 0.784) | 0.549 (0.516 - 0.590) | 0.275 (0.256 - 0.293) | 0.804 (0.774 - 0.833) | |
Kim_GIST-HanwhaVision_task4a_1 | DCASE2023 FDY-LKA CRNN without external single | Kim2023 | 1.35 | 0.504 (0.455 - 0.573) | 0.806 (0.762 - 0.853) | 0.719 (0.665 - 0.760) | 0.298 (0.273 - 0.329) | 0.431 (0.345 - 0.495) | 0.719 (0.687 - 0.755) | 0.790 (0.758 - 0.819) | 0.572 (0.538 - 0.607) | 0.716 (0.660 - 0.750) | 0.842 (0.812 - 0.875) | |
Kim_GIST-HanwhaVision_task4a_2 | FDYLKA BEATs pool1d Stage2 | Kim2023 | 1.68 | 0.740 (0.684 - 0.803) | 0.868 (0.843 - 0.894) | 0.823 (0.798 - 0.848) | 0.395 (0.372 - 0.415) | 0.608 (0.557 - 0.676) | 0.835 (0.811 - 0.864) | 0.811 (0.784 - 0.839) | 0.665 (0.628 - 0.699) | 0.814 (0.801 - 0.827) | 0.882 (0.859 - 0.903) | |
Kim_GIST-HanwhaVision_task4a_3 | LKAFDY BEATs Stage 2 interpolate | Kim2023 | 1.66 | 0.723 (0.659 - 0.786) | 0.862 (0.837 - 0.888) | 0.814 (0.785 - 0.840) | 0.388 (0.367 - 0.408) | 0.583 (0.508 - 0.657) | 0.840 (0.807 - 0.871) | 0.829 (0.791 - 0.865) | 0.652 (0.599 - 0.697) | 0.816 (0.803 - 0.829) | 0.883 (0.859 - 0.905) | |
Kim_GIST-HanwhaVision_task4a_4 | FDYLKA BEATs pool 1d stage1 | Kim2023 | 1.63 | 0.715 (0.646 - 0.788) | 0.871 (0.839 - 0.898) | 0.797 (0.755 - 0.834) | 0.387 (0.340 - 0.423) | 0.601 (0.566 - 0.640) | 0.833 (0.800 - 0.867) | 0.811 (0.765 - 0.844) | 0.636 (0.599 - 0.669) | 0.797 (0.784 - 0.812) | 0.867 (0.842 - 0.896) | |
Kim_GIST-HanwhaVision_task4a_5 | FDYLKA BEATs all ensemble 48 | Kim2023 | 1.72 | 0.750 (0.709 - 0.796) | 0.891 (0.869 - 0.915) | 0.827 (0.804 - 0.849) | 0.415 (0.395 - 0.434) | 0.642 (0.611 - 0.679) | 0.860 (0.837 - 0.886) | 0.851 (0.831 - 0.873) | 0.664 (0.626 - 0.705) | 0.835 (0.824 - 0.850) | 0.894 (0.872 - 0.917) | |
Kim_GIST-HanwhaVision_task4a_6 | FDYLKA BEATs PSDS1 ensemble 16 | Kim2023 | 1.72 | 0.755 (0.701 - 0.809) | 0.884 (0.853 - 0.910) | 0.835 (0.811 - 0.855) | 0.419 (0.392 - 0.442) | 0.631 (0.576 - 0.681) | 0.855 (0.830 - 0.882) | 0.845 (0.823 - 0.870) | 0.668 (0.631 - 0.703) | 0.830 (0.817 - 0.843) | 0.894 (0.872 - 0.916) | |
Kim_GIST-HanwhaVision_task4a_7 | FDYLKA BEATs PSDS2 ensemble 16 | Kim2023 | 1.69 | 0.725 (0.680 - 0.774) | 0.875 (0.850 - 0.900) | 0.814 (0.790 - 0.835) | 0.389 (0.371 - 0.407) | 0.620 (0.580 - 0.661) | 0.851 (0.824 - 0.882) | 0.848 (0.822 - 0.875) | 0.652 (0.613 - 0.692) | 0.828 (0.814 - 0.844) | 0.890 (0.859 - 0.917) | |
Kim_GIST-HanwhaVision_task4a_8 | FDYLKA BEATs PSDS sum ensemble 16 | Kim2023 | 1.72 | 0.752 (0.711 - 0.798) | 0.887 (0.864 - 0.913) | 0.829 (0.806 - 0.851) | 0.416 (0.397 - 0.433) | 0.641 (0.608 - 0.679) | 0.860 (0.838 - 0.887) | 0.849 (0.828 - 0.873) | 0.672 (0.634 - 0.712) | 0.834 (0.822 - 0.847) | 0.892 (0.869 - 0.916) | |
Wenxin_TJU_task4a_1 | ensemble-pretrained-psds1-0 | Wenxin2023 | 1.63 | 0.616 (0.569 - 0.673) | 0.871 (0.849 - 0.894) | 0.773 (0.747 - 0.797) | 0.385 (0.371 - 0.400) | 0.561 (0.536 - 0.589) | 0.752 (0.722 - 0.787) | 0.862 (0.842 - 0.890) | 0.669 (0.635 - 0.706) | 0.802 (0.789 - 0.816) | 0.872 (0.848 - 0.899) | |
Wenxin_TJU_task4a_2 | ensemble-pretrained-psds1-1 | Wenxin2023 | 1.66 | 0.662 (0.624 - 0.704) | 0.865 (0.838 - 0.889) | 0.780 (0.757 - 0.803) | 0.402 (0.387 - 0.417) | 0.564 (0.539 - 0.589) | 0.748 (0.722 - 0.779) | 0.869 (0.851 - 0.891) | 0.700 (0.672 - 0.738) | 0.784 (0.770 - 0.798) | 0.879 (0.853 - 0.901) | |
Wenxin_TJU_task4a_3 | ensemble-pretrained-psds2-0 | Wenxin2023 | 0.88 | 0.241 (0.214 - 0.274) | 0.325 (0.301 - 0.349) | 0.064 (0.048 - 0.079) | 0.024 (0.020 - 0.027) | 0.111 (0.084 - 0.133) | 0.506 (0.465 - 0.550) | 0.723 (0.688 - 0.757) | 0.500 (0.465 - 0.536) | 0.140 (0.127 - 0.152) | 0.782 (0.752 - 0.813) | |
Wenxin_TJU_task4a_4 | ensemble-pretrained-psds2-1 | Wenxin2023 | 0.90 | 0.227 (0.200 - 0.259) | 0.317 (0.285 - 0.347) | 0.069 (0.054 - 0.084) | 0.027 (0.024 - 0.030) | 0.112 (0.085 - 0.134) | 0.511 (0.470 - 0.553) | 0.735 (0.698 - 0.769) | 0.523 (0.487 - 0.559) | 0.146 (0.134 - 0.160) | 0.791 (0.763 - 0.820) | |
Wenxin_TJU_task4a_5 | single-pretrained-psds1-0 | Wenxin2023 | 1.58 | 0.606 (0.561 - 0.652) | 0.857 (0.834 - 0.884) | 0.767 (0.739 - 0.791) | 0.371 (0.354 - 0.389) | 0.547 (0.521 - 0.572) | 0.713 (0.679 - 0.751) | 0.837 (0.814 - 0.860) | 0.642 (0.609 - 0.670) | 0.788 (0.776 - 0.801) | 0.842 (0.813 - 0.869) | |
Wenxin_TJU_task4a_6 | single-pretrained-psds1-1 | Wenxin2023 | 1.61 | 0.660 (0.624 - 0.704) | 0.852 (0.831 - 0.876) | 0.757 (0.728 - 0.784) | 0.373 (0.357 - 0.387) | 0.545 (0.522 - 0.575) | 0.693 (0.663 - 0.724) | 0.874 (0.855 - 0.898) | 0.682 (0.653 - 0.717) | 0.748 (0.732 - 0.764) | 0.873 (0.849 - 0.896) | |
Wenxin_TJU_task4a_7 | single-psds1 | Wenxin2023 | 1.31 | 0.442 (0.403 - 0.493) | 0.745 (0.716 - 0.779) | 0.738 (0.711 - 0.763) | 0.301 (0.286 - 0.319) | 0.425 (0.401 - 0.457) | 0.642 (0.608 - 0.681) | 0.710 (0.678 - 0.740) | 0.538 (0.512 - 0.566) | 0.785 (0.774 - 0.796) | 0.821 (0.796 - 0.848) | |
Wenxin_TJU_task4a_8 | single-psds2 | Wenxin2023 | 0.75 | 0.210 (0.184 - 0.238) | 0.266 (0.237 - 0.294) | 0.046 (0.036 - 0.055) | 0.031 (0.025 - 0.035) | 0.105 (0.080 - 0.126) | 0.491 (0.450 - 0.533) | 0.677 (0.642 - 0.712) | 0.425 (0.397 - 0.458) | 0.084 (0.073 - 0.096) | 0.723 (0.691 - 0.764) |
PSDS scenario 2
Rank |
Submission code |
Submission name |
Technical Report |
Ranking score (Evaluation dataset) |
Alarm Bell Ringing |
Blender | Cat | Dishes | Dog |
Electric shave toothbrush |
Frying |
Running water |
Speech |
Vacuum cleaner |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Baseline_task4a_1 | DCASE2023 baseline system | Turpault2023 | 1.00 | 0.665 (0.613 - 0.727) | 0.742 (0.705 - 0.780) | 0.818 (0.784 - 0.850) | 0.376 (0.338 - 0.410) | 0.689 (0.644 - 0.739) | 0.800 (0.773 - 0.829) | 0.734 (0.646 - 0.803) | 0.478 (0.430 - 0.545) | 0.861 (0.824 - 0.887) | 0.761 (0.709 - 0.801) | |
Baseline_task4a_2 | DCASE2023 baseline system (Audioset+Beats) | Turpault2023 | 1.52 | 0.917 (0.887 - 0.959) | 0.914 (0.889 - 0.935) | 0.963 (0.954 - 0.974) | 0.709 (0.680 - 0.737) | 0.918 (0.900 - 0.938) | 0.920 (0.901 - 0.941) | 0.892 (0.866 - 0.920) | 0.778 (0.751 - 0.805) | 0.886 (0.871 - 0.902) | 0.916 (0.896 - 0.936) | |
Li_USTC_task4a_1 | TAFT and SdMT | Li2023 | 1.54 | 0.917 (0.896 - 0.938) | 0.860 (0.836 - 0.884) | 0.954 (0.944 - 0.965) | 0.706 (0.686 - 0.728) | 0.880 (0.861 - 0.897) | 0.922 (0.908 - 0.938) | 0.887 (0.867 - 0.907) | 0.706 (0.677 - 0.734) | 0.870 (0.856 - 0.886) | 0.930 (0.915 - 0.945) | |
Li_USTC_task4a_2 | Pseudo labeling | Li2023 | 1.58 | 0.886 (0.854 - 0.940) | 0.867 (0.842 - 0.892) | 0.964 (0.955 - 0.975) | 0.733 (0.714 - 0.756) | 0.890 (0.874 - 0.905) | 0.942 (0.930 - 0.958) | 0.886 (0.862 - 0.909) | 0.723 (0.690 - 0.752) | 0.880 (0.866 - 0.897) | 0.923 (0.907 - 0.941) | |
Li_USTC_task4a_3 | TAFT and AFL | Li2023 | 1.54 | 0.919 (0.896 - 0.938) | 0.859 (0.837 - 0.883) | 0.958 (0.949 - 0.969) | 0.699 (0.679 - 0.726) | 0.888 (0.871 - 0.907) | 0.956 (0.947 - 0.968) | 0.867 (0.852 - 0.884) | 0.675 (0.646 - 0.703) | 0.853 (0.838 - 0.872) | 0.958 (0.946 - 0.968) | |
Li_USTC_task4a_4 | MaxFilter | Li2023 | 0.89 | 0.943 (0.928 - 0.959) | 0.931 (0.918 - 0.948) | 0.967 (0.959 - 0.976) | 0.815 (0.797 - 0.838) | 0.960 (0.951 - 0.968) | 0.969 (0.960 - 0.981) | 0.937 (0.925 - 0.948) | 0.782 (0.756 - 0.806) | 0.956 (0.949 - 0.964) | 0.957 (0.946 - 0.971) | |
Li_USTC_task4a_5 | TAFT and SdMT and single | Li2023 | 1.52 | 0.917 (0.895 - 0.939) | 0.838 (0.809 - 0.868) | 0.950 (0.935 - 0.966) | 0.699 (0.675 - 0.728) | 0.880 (0.861 - 0.900) | 0.909 (0.886 - 0.931) | 0.871 (0.838 - 0.900) | 0.704 (0.676 - 0.732) | 0.867 (0.852 - 0.884) | 0.920 (0.886 - 0.946) | |
Li_USTC_task4a_6 | Pseudo labeling and single | Li2023 | 1.56 | 0.881 (0.847 - 0.932) | 0.869 (0.847 - 0.892) | 0.965 (0.955 - 0.977) | 0.723 (0.699 - 0.749) | 0.887 (0.868 - 0.906) | 0.942 (0.925 - 0.960) | 0.878 (0.854 - 0.904) | 0.747 (0.717 - 0.774) | 0.877 (0.864 - 0.895) | 0.924 (0.903 - 0.946) | |
Li_USTC_task4a_7 | SKCRNN MT | Li2023 | 1.20 | 0.681 (0.634 - 0.733) | 0.776 (0.742 - 0.810) | 0.858 (0.836 - 0.879) | 0.514 (0.460 - 0.560) | 0.741 (0.699 - 0.778) | 0.825 (0.786 - 0.868) | 0.694 (0.667 - 0.727) | 0.646 (0.616 - 0.678) | 0.860 (0.844 - 0.874) | 0.842 (0.805 - 0.877) | |
Liu_NSYSU_task4_1 | DCASE2023 FDY_WeakSED_Ensemble | Liu2023 | 0.80 | 0.914 (0.893 - 0.938) | 0.871 (0.847 - 0.894) | 0.896 (0.881 - 0.912) | 0.703 (0.680 - 0.730) | 0.920 (0.906 - 0.941) | 0.937 (0.923 - 0.951) | 0.903 (0.886 - 0.917) | 0.719 (0.693 - 0.749) | 0.881 (0.864 - 0.898) | 0.919 (0.904 - 0.935) | |
Liu_NSYSU_task4_2 | FDY_Ensemble | Liu2023 | 1.36 | 0.813 (0.779 - 0.854) | 0.861 (0.834 - 0.886) | 0.907 (0.892 - 0.925) | 0.559 (0.529 - 0.591) | 0.745 (0.716 - 0.782) | 0.914 (0.901 - 0.928) | 0.903 (0.886 - 0.916) | 0.693 (0.668 - 0.721) | 0.900 (0.890 - 0.912) | 0.918 (0.903 - 0.935) | |
Liu_NSYSU_task4_3 | DCASE2023 VGGSK_Single | Liu2023 | 1.26 | 0.806 (0.768 - 0.840) | 0.794 (0.765 - 0.820) | 0.876 (0.860 - 0.898) | 0.496 (0.470 - 0.523) | 0.718 (0.687 - 0.755) | 0.866 (0.843 - 0.895) | 0.850 (0.825 - 0.873) | 0.625 (0.585 - 0.663) | 0.856 (0.839 - 0.873) | 0.833 (0.803 - 0.857) | |
Liu_NSYSU_task4_4 | DCASE2023 FDY_Single | Liu2023 | 1.24 | 0.749 (0.684 - 0.820) | 0.844 (0.807 - 0.887) | 0.872 (0.855 - 0.891) | 0.521 (0.478 - 0.567) | 0.707 (0.674 - 0.749) | 0.893 (0.874 - 0.914) | 0.847 (0.810 - 0.894) | 0.640 (0.566 - 0.692) | 0.862 (0.834 - 0.894) | 0.870 (0.846 - 0.895) | |
Liu_NSYSU_task4_5 | DCASE2023 FDY_BEATs_WeakSED | Liu2023 | 0.82 | 0.976 (0.967 - 0.985) | 0.933 (0.919 - 0.948) | 0.914 (0.903 - 0.928) | 0.716 (0.691 - 0.740) | 0.954 (0.943 - 0.966) | 0.967 (0.959 - 0.975) | 0.942 (0.931 - 0.952) | 0.841 (0.815 - 0.869) | 0.804 (0.778 - 0.830) | 0.954 (0.944 - 0.967) | |
Liu_NSYSU_task4_6 | DCASE2023 FDY_BEATs | Liu2023 | 1.62 | 0.947 (0.927 - 0.973) | 0.948 (0.937 - 0.959) | 0.967 (0.959 - 0.976) | 0.742 (0.722 - 0.763) | 0.941 (0.931 - 0.957) | 0.957 (0.947 - 0.968) | 0.943 (0.932 - 0.954) | 0.840 (0.817 - 0.866) | 0.896 (0.883 - 0.911) | 0.949 (0.938 - 0.962) | |
Liu_NSYSU_task4_7 | DCASE2023 FDY_BEATs | Liu2023 | 1.55 | 0.936 (0.913 - 0.975) | 0.921 (0.891 - 0.946) | 0.962 (0.952 - 0.973) | 0.736 (0.702 - 0.764) | 0.926 (0.915 - 0.943) | 0.934 (0.917 - 0.954) | 0.919 (0.905 - 0.933) | 0.798 (0.757 - 0.843) | 0.860 (0.841 - 0.879) | 0.920 (0.894 - 0.942) | |
Liu_NSYSU_task4_8 | DCASE2023 FDY_BEATs | Liu2023 | 1.53 | 0.916 (0.880 - 0.970) | 0.921 (0.891 - 0.946) | 0.963 (0.954 - 0.974) | 0.716 (0.686 - 0.759) | 0.917 (0.894 - 0.938) | 0.939 (0.924 - 0.957) | 0.925 (0.913 - 0.937) | 0.787 (0.759 - 0.819) | 0.872 (0.848 - 0.895) | 0.929 (0.894 - 0.956) | |
Lee_CAU_task4A_1 | CAU_ET | Lee2023 | 1.24 | 0.794 (0.758 - 0.843) | 0.820 (0.793 - 0.846) | 0.902 (0.882 - 0.923) | 0.523 (0.495 - 0.554) | 0.699 (0.669 - 0.731) | 0.830 (0.796 - 0.865) | 0.729 (0.697 - 0.761) | 0.586 (0.551 - 0.622) | 0.889 (0.878 - 0.901) | 0.821 (0.793 - 0.853) | |
Lee_CAU_task4A_2 | CAU_ET | Lee2023 | 0.79 | 0.826 (0.765 - 0.908) | 0.803 (0.771 - 0.838) | 0.890 (0.872 - 0.910) | 0.617 (0.586 - 0.648) | 0.831 (0.804 - 0.860) | 0.838 (0.801 - 0.874) | 0.757 (0.720 - 0.795) | 0.609 (0.571 - 0.648) | 0.846 (0.812 - 0.872) | 0.797 (0.746 - 0.847) | |
Cheimariotis_DUTH_task4a_1 | DuthApida | Cheimariotis2023 | 1.53 | 0.925 (0.899 - 0.969) | 0.914 (0.897 - 0.929) | 0.967 (0.957 - 0.977) | 0.736 (0.709 - 0.767) | 0.912 (0.892 - 0.934) | 0.934 (0.918 - 0.956) | 0.906 (0.888 - 0.924) | 0.739 (0.707 - 0.772) | 0.874 (0.857 - 0.893) | 0.922 (0.905 - 0.940) | |
Cheimariotis_DUTH_task4a_2 | DuthApida | Cheimariotis2023 | 1.45 | 0.825 (0.791 - 0.869) | 0.883 (0.862 - 0.907) | 0.954 (0.933 - 0.968) | 0.672 (0.646 - 0.701) | 0.924 (0.908 - 0.940) | 0.911 (0.886 - 0.932) | 0.890 (0.873 - 0.909) | 0.702 (0.668 - 0.737) | 0.906 (0.891 - 0.920) | 0.922 (0.908 - 0.940) | |
Chen_CHT_task4_1 | VGGSK | Chen2023b | 1.25 | 0.695 (0.544 - 0.796) | 0.789 (0.758 - 0.818) | 0.865 (0.828 - 0.890) | 0.520 (0.445 - 0.571) | 0.697 (0.658 - 0.742) | 0.832 (0.764 - 0.882) | 0.824 (0.774 - 0.868) | 0.556 (0.523 - 0.594) | 0.842 (0.829 - 0.855) | 0.863 (0.830 - 0.896) | |
Chen_CHT_task4_2 | VGGSK+BEATs | Chen2023b | 1.58 | 0.921 (0.888 - 0.950) | 0.906 (0.886 - 0.926) | 0.965 (0.954 - 0.977) | 0.697 (0.676 - 0.720) | 0.912 (0.890 - 0.932) | 0.922 (0.893 - 0.948) | 0.897 (0.869 - 0.928) | 0.706 (0.675 - 0.735) | 0.909 (0.898 - 0.920) | 0.933 (0.917 - 0.953) | |
Chen_CHT_task4_3 | VGGSK+BEATs | Chen2023b | 1.66 | 0.943 (0.929 - 0.958) | 0.918 (0.901 - 0.936) | 0.965 (0.958 - 0.974) | 0.706 (0.684 - 0.727) | 0.915 (0.900 - 0.933) | 0.925 (0.910 - 0.941) | 0.928 (0.914 - 0.942) | 0.789 (0.761 - 0.815) | 0.912 (0.901 - 0.923) | 0.950 (0.937 - 0.963) | |
Chen_CHT_task4_4 | multi+BEATs | Chen2023b | 1.66 | 0.947 (0.933 - 0.962) | 0.918 (0.902 - 0.936) | 0.965 (0.958 - 0.974) | 0.715 (0.693 - 0.737) | 0.918 (0.904 - 0.934) | 0.942 (0.931 - 0.954) | 0.936 (0.924 - 0.948) | 0.809 (0.781 - 0.834) | 0.915 (0.903 - 0.926) | 0.952 (0.939 - 0.965) | |
Xiao_FMSG_task4a_1 | Xiao_FMSG_task4a_1_single_model_without_external | Zhang2023 | 1.23 | 0.750 (0.712 - 0.794) | 0.833 (0.808 - 0.864) | 0.904 (0.888 - 0.923) | 0.523 (0.495 - 0.552) | 0.774 (0.750 - 0.804) | 0.857 (0.834 - 0.883) | 0.814 (0.789 - 0.835) | 0.615 (0.586 - 0.647) | 0.896 (0.885 - 0.909) | 0.878 (0.861 - 0.898) | |
Xiao_FMSG_task4a_2 | Xiao_FMSG_task4a_2_single_model | Xiao2023 | 1.55 | 0.864 (0.832 - 0.914) | 0.928 (0.914 - 0.941) | 0.951 (0.940 - 0.963) | 0.711 (0.692 - 0.733) | 0.914 (0.900 - 0.934) | 0.944 (0.932 - 0.958) | 0.926 (0.914 - 0.945) | 0.799 (0.776 - 0.818) | 0.918 (0.908 - 0.928) | 0.950 (0.938 - 0.966) | |
Xiao_FMSG_task4a_3 | Xiao_FMSG_task4a_3_single_model_psds2 | Xiao2023 | 0.86 | 0.891 (0.862 - 0.943) | 0.889 (0.870 - 0.908) | 0.929 (0.918 - 0.942) | 0.743 (0.720 - 0.763) | 0.951 (0.940 - 0.966) | 0.933 (0.918 - 0.950) | 0.893 (0.875 - 0.914) | 0.764 (0.740 - 0.793) | 0.899 (0.886 - 0.915) | 0.957 (0.947 - 0.968) | |
Xiao_FMSG_task4a_4 | Xiao_FMSG_task4a_4_single_model | Xiao2023 | 1.60 | 0.898 (0.870 - 0.948) | 0.917 (0.902 - 0.937) | 0.960 (0.951 - 0.970) | 0.724 (0.701 - 0.755) | 0.931 (0.920 - 0.948) | 0.933 (0.922 - 0.947) | 0.911 (0.893 - 0.930) | 0.786 (0.759 - 0.811) | 0.919 (0.910 - 0.927) | 0.949 (0.936 - 0.962) | |
Xiao_FMSG_task4a_5 | Xiao_FMSG_task4a_5_ensemble_model | Xiao2023 | 1.61 | 0.900 (0.873 - 0.949) | 0.927 (0.914 - 0.947) | 0.963 (0.956 - 0.973) | 0.721 (0.700 - 0.746) | 0.937 (0.927 - 0.953) | 0.952 (0.942 - 0.962) | 0.935 (0.922 - 0.948) | 0.806 (0.780 - 0.830) | 0.918 (0.908 - 0.928) | 0.960 (0.950 - 0.970) | |
Xiao_FMSG_task4a_6 | Xiao_FMSG_task4a_6_ensemble_model | Xiao2023 | 1.61 | 0.903 (0.876 - 0.953) | 0.935 (0.923 - 0.951) | 0.963 (0.954 - 0.973) | 0.737 (0.716 - 0.762) | 0.947 (0.938 - 0.960) | 0.956 (0.948 - 0.967) | 0.941 (0.930 - 0.952) | 0.808 (0.781 - 0.831) | 0.927 (0.916 - 0.936) | 0.964 (0.955 - 0.973) | |
Xiao_FMSG_task4a_7 | Xiao_FMSG_task4a_7_ensemble_model | Xiao2023 | 0.87 | 0.903 (0.876 - 0.957) | 0.894 (0.875 - 0.912) | 0.932 (0.922 - 0.944) | 0.745 (0.724 - 0.767) | 0.953 (0.943 - 0.970) | 0.932 (0.918 - 0.945) | 0.892 (0.872 - 0.912) | 0.774 (0.749 - 0.804) | 0.896 (0.884 - 0.913) | 0.954 (0.943 - 0.965) | |
Xiao_FMSG_task4a_8 | Xiao_FMSG_task4a_8_ensemble_model | Xiao2023 | 1.62 | 0.907 (0.881 - 0.955) | 0.940 (0.930 - 0.955) | 0.965 (0.957 - 0.974) | 0.740 (0.719 - 0.765) | 0.944 (0.934 - 0.957) | 0.958 (0.949 - 0.969) | 0.943 (0.932 - 0.955) | 0.821 (0.796 - 0.843) | 0.925 (0.914 - 0.935) | 0.966 (0.957 - 0.977) | |
Guan_HIT_task4a_1 | Guan_HIT_task4a_1 | Guan2023 | 1.57 | 0.937 (0.922 - 0.957) | 0.947 (0.937 - 0.957) | 0.960 (0.952 - 0.971) | 0.714 (0.689 - 0.739) | 0.897 (0.882 - 0.913) | 0.946 (0.933 - 0.958) | 0.941 (0.928 - 0.954) | 0.777 (0.753 - 0.805) | 0.905 (0.892 - 0.917) | 0.960 (0.948 - 0.972) | |
Guan_HIT_task4a_2 | Guan_HIT_task4a_2 | Guan2023 | 0.93 | 0.976 (0.969 - 0.983) | 0.957 (0.947 - 0.968) | 0.969 (0.960 - 0.979) | 0.824 (0.803 - 0.842) | 0.967 (0.960 - 0.976) | 0.962 (0.953 - 0.972) | 0.934 (0.922 - 0.949) | 0.797 (0.772 - 0.825) | 0.941 (0.931 - 0.957) | 0.958 (0.947 - 0.972) | |
Guan_HIT_task4a_3 | Guan_HIT_task4a_3 | Guan2023 | 1.55 | 0.930 (0.905 - 0.953) | 0.940 (0.926 - 0.955) | 0.958 (0.948 - 0.970) | 0.702 (0.678 - 0.730) | 0.888 (0.869 - 0.909) | 0.941 (0.926 - 0.955) | 0.936 (0.922 - 0.951) | 0.762 (0.731 - 0.794) | 0.900 (0.888 - 0.913) | 0.952 (0.938 - 0.968) | |
Guan_HIT_task4a_4 | Guan_HIT_task4a_4 | Guan2023 | 0.92 | 0.973 (0.963 - 0.982) | 0.955 (0.943 - 0.968) | 0.967 (0.957 - 0.978) | 0.817 (0.798 - 0.835) | 0.965 (0.954 - 0.976) | 0.959 (0.948 - 0.970) | 0.928 (0.914 - 0.944) | 0.782 (0.746 - 0.822) | 0.946 (0.936 - 0.960) | 0.951 (0.936 - 0.966) | |
Guan_HIT_task4a_5 | Guan_HIT_task4a_5 | Guan2023 | 1.40 | 0.773 (0.735 - 0.818) | 0.864 (0.842 - 0.887) | 0.903 (0.886 - 0.921) | 0.570 (0.545 - 0.600) | 0.782 (0.756 - 0.811) | 0.877 (0.857 - 0.898) | 0.895 (0.877 - 0.911) | 0.711 (0.678 - 0.746) | 0.912 (0.903 - 0.921) | 0.915 (0.896 - 0.933) | |
Guan_HIT_task4a_6 | Guan_HIT_task4a_6 | Guan2023 | 0.88 | 0.850 (0.815 - 0.890) | 0.886 (0.865 - 0.908) | 0.931 (0.914 - 0.948) | 0.750 (0.728 - 0.774) | 0.889 (0.872 - 0.905) | 0.908 (0.890 - 0.927) | 0.897 (0.879 - 0.913) | 0.742 (0.713 - 0.776) | 0.956 (0.949 - 0.965) | 0.916 (0.897 - 0.936) | |
Wang_XiaoRice_task4a_1 | SINGLE | Wang2023 | 1.50 | 0.884 (0.847 - 0.924) | 0.870 (0.848 - 0.896) | 0.967 (0.958 - 0.977) | 0.724 (0.698 - 0.751) | 0.933 (0.921 - 0.948) | 0.946 (0.926 - 0.968) | 0.875 (0.849 - 0.903) | 0.785 (0.760 - 0.811) | 0.905 (0.895 - 0.918) | 0.924 (0.905 - 0.946) | |
Wang_XiaoRice_task4a_2 | SED Embed | Wang2023 | 1.52 | 0.862 (0.824 - 0.907) | 0.894 (0.875 - 0.916) | 0.955 (0.946 - 0.966) | 0.719 (0.697 - 0.745) | 0.929 (0.918 - 0.942) | 0.956 (0.944 - 0.970) | 0.905 (0.885 - 0.927) | 0.837 (0.816 - 0.862) | 0.913 (0.897 - 0.925) | 0.920 (0.904 - 0.939) | |
Wang_XiaoRice_task4a_3 | L-TAG | Wang2023 | 0.91 | 0.947 (0.921 - 0.968) | 0.913 (0.897 - 0.932) | 0.930 (0.917 - 0.943) | 0.737 (0.714 - 0.761) | 0.948 (0.937 - 0.965) | 0.982 (0.976 - 0.988) | 0.950 (0.938 - 0.964) | 0.821 (0.795 - 0.845) | 0.923 (0.907 - 0.935) | 0.964 (0.954 - 0.975) | |
Zhang_IOA_task4_1 | strong_ensemble | Zhang2023 | 1.75 | 0.931 (0.914 - 0.952) | 0.950 (0.940 - 0.964) | 0.982 (0.977 - 0.988) | 0.757 (0.738 - 0.778) | 0.964 (0.957 - 0.972) | 0.923 (0.913 - 0.937) | 0.942 (0.930 - 0.956) | 0.897 (0.880 - 0.922) | 0.930 (0.922 - 0.941) | 0.953 (0.940 - 0.967) | |
Zhang_IOA_task4_2 | segment tagging model | Zhang2023 | 0.95 | 0.992 (0.990 - 0.996) | 0.988 (0.984 - 0.997) | 0.973 (0.965 - 0.982) | 0.835 (0.818 - 0.855) | 0.967 (0.959 - 0.978) | 0.997 (0.997 - 0.999) | 0.964 (0.954 - 0.975) | 0.897 (0.880 - 0.917) | 0.962 (0.954 - 0.974) | 0.981 (0.974 - 0.991) | |
Zhang_IOA_task4_3 | strong_ensemble_all | Zhang2023 | 1.71 | 0.931 (0.915 - 0.952) | 0.947 (0.936 - 0.961) | 0.979 (0.974 - 0.986) | 0.736 (0.718 - 0.756) | 0.959 (0.951 - 0.966) | 0.930 (0.918 - 0.941) | 0.940 (0.928 - 0.956) | 0.799 (0.778 - 0.823) | 0.907 (0.895 - 0.921) | 0.952 (0.940 - 0.966) | |
Zhang_IOA_task4_4 | strong_ensemble_1 | Zhang2023 | 1.75 | 0.931 (0.914 - 0.952) | 0.950 (0.940 - 0.964) | 0.982 (0.977 - 0.988) | 0.751 (0.733 - 0.775) | 0.964 (0.957 - 0.972) | 0.923 (0.913 - 0.937) | 0.942 (0.930 - 0.956) | 0.897 (0.880 - 0.922) | 0.930 (0.922 - 0.941) | 0.953 (0.940 - 0.967) | |
Zhang_IOA_task4_5 | base system | Zhang2023 | 1.52 | 0.867 (0.819 - 0.905) | 0.923 (0.902 - 0.942) | 0.960 (0.950 - 0.970) | 0.672 (0.651 - 0.696) | 0.876 (0.853 - 0.903) | 0.895 (0.876 - 0.915) | 0.905 (0.889 - 0.925) | 0.738 (0.710 - 0.768) | 0.898 (0.886 - 0.912) | 0.929 (0.910 - 0.948) | |
Zhang_IOA_task4_6 | strong_single | Zhang2023 | 1.60 | 0.896 (0.873 - 0.920) | 0.943 (0.931 - 0.957) | 0.965 (0.957 - 0.976) | 0.704 (0.685 - 0.727) | 0.881 (0.866 - 0.900) | 0.914 (0.901 - 0.930) | 0.923 (0.911 - 0.939) | 0.749 (0.722 - 0.775) | 0.906 (0.896 - 0.919) | 0.954 (0.941 - 0.967) | |
Zhang_IOA_task4_7 | weak single | Zhang2023 | 0.86 | 0.932 (0.920 - 0.948) | 0.935 (0.922 - 0.950) | 0.930 (0.916 - 0.943) | 0.752 (0.726 - 0.777) | 0.929 (0.916 - 0.944) | 0.956 (0.947 - 0.967) | 0.914 (0.900 - 0.931) | 0.787 (0.758 - 0.811) | 0.954 (0.947 - 0.961) | 0.946 (0.934 - 0.959) | |
Wu_NCUT_task4a_1 | Wu_NCUT_task4a_1 | Wu2023 | 1.15 | 0.665 (0.618 - 0.705) | 0.774 (0.744 - 0.803) | 0.857 (0.836 - 0.886) | 0.440 (0.414 - 0.466) | 0.719 (0.690 - 0.758) | 0.826 (0.800 - 0.856) | 0.768 (0.741 - 0.792) | 0.586 (0.557 - 0.622) | 0.871 (0.856 - 0.885) | 0.812 (0.789 - 0.840) | |
Wu_NCUT_task4a_2 | Wu_NCUT_task4a_2 | Wu2023 | 1.53 | 0.865 (0.838 - 0.895) | 0.897 (0.878 - 0.917) | 0.953 (0.942 - 0.964) | 0.672 (0.653 - 0.695) | 0.935 (0.926 - 0.949) | 0.936 (0.923 - 0.950) | 0.909 (0.894 - 0.924) | 0.799 (0.772 - 0.824) | 0.907 (0.896 - 0.918) | 0.945 (0.931 - 0.959) | |
Wu_NCUT_task4a_3 | Wu_NCUT_task4a_3 | Wu2023 | 1.50 | 0.870 (0.846 - 0.898) | 0.899 (0.879 - 0.918) | 0.955 (0.945 - 0.966) | 0.662 (0.643 - 0.685) | 0.933 (0.924 - 0.946) | 0.938 (0.926 - 0.952) | 0.911 (0.897 - 0.928) | 0.803 (0.777 - 0.829) | 0.912 (0.903 - 0.923) | 0.946 (0.933 - 0.961) | |
Barahona_AUDIAS_task4a_1 | CRNN T++ resolution | Barahona2023 | 1.06 | 0.673 (0.626 - 0.730) | 0.734 (0.699 - 0.770) | 0.819 (0.769 - 0.855) | 0.415 (0.364 - 0.461) | 0.683 (0.611 - 0.741) | 0.769 (0.737 - 0.804) | 0.747 (0.692 - 0.811) | 0.521 (0.492 - 0.559) | 0.871 (0.859 - 0.884) | 0.766 (0.731 - 0.802) | |
Barahona_AUDIAS_task4a_2 | CRNN T++ resolution with class-wise median filtering | Barahona2023 | 1.12 | 0.653 (0.597 - 0.711) | 0.742 (0.707 - 0.780) | 0.821 (0.771 - 0.857) | 0.437 (0.389 - 0.480) | 0.693 (0.624 - 0.747) | 0.777 (0.727 - 0.817) | 0.779 (0.728 - 0.835) | 0.540 (0.503 - 0.582) | 0.868 (0.855 - 0.881) | 0.787 (0.755 - 0.820) | |
Barahona_AUDIAS_task4a_3 | Conformer F+ resolution | Barahona2023 | 0.91 | 0.745 (0.684 - 0.805) | 0.787 (0.746 - 0.823) | 0.884 (0.864 - 0.906) | 0.534 (0.482 - 0.574) | 0.788 (0.748 - 0.822) | 0.873 (0.852 - 0.897) | 0.735 (0.648 - 0.790) | 0.611 (0.573 - 0.652) | 0.868 (0.843 - 0.899) | 0.858 (0.833 - 0.883) | |
Barahona_AUDIAS_task4a_4 | Conformer F+ resolution with class-wise median filtering | Barahona2023 | 0.84 | 0.776 (0.708 - 0.847) | 0.828 (0.783 - 0.871) | 0.882 (0.855 - 0.915) | 0.549 (0.492 - 0.619) | 0.842 (0.816 - 0.873) | 0.888 (0.864 - 0.909) | 0.768 (0.682 - 0.827) | 0.638 (0.599 - 0.677) | 0.853 (0.838 - 0.872) | 0.875 (0.853 - 0.897) | |
Barahona_AUDIAS_task4a_5 | 4-Resolution CRNN | Barahona2023 | 1.14 | 0.714 (0.666 - 0.773) | 0.804 (0.774 - 0.835) | 0.852 (0.823 - 0.883) | 0.467 (0.432 - 0.508) | 0.718 (0.683 - 0.760) | 0.855 (0.831 - 0.882) | 0.792 (0.754 - 0.827) | 0.533 (0.502 - 0.567) | 0.888 (0.875 - 0.899) | 0.822 (0.786 - 0.856) | |
Barahona_AUDIAS_task4a_6 | 4-Resolution CRNN with class-dependent median filtering | Barahona2023 | 1.18 | 0.696 (0.645 - 0.754) | 0.801 (0.770 - 0.834) | 0.852 (0.823 - 0.882) | 0.486 (0.457 - 0.523) | 0.719 (0.682 - 0.762) | 0.876 (0.855 - 0.900) | 0.815 (0.780 - 0.848) | 0.545 (0.512 - 0.582) | 0.886 (0.875 - 0.897) | 0.828 (0.796 - 0.862) | |
Barahona_AUDIAS_task4a_7 | 5-Resolution Conformer | Barahona2023 | 1.06 | 0.760 (0.717 - 0.811) | 0.795 (0.768 - 0.821) | 0.891 (0.875 - 0.910) | 0.577 (0.546 - 0.609) | 0.796 (0.767 - 0.831) | 0.895 (0.879 - 0.911) | 0.817 (0.793 - 0.841) | 0.654 (0.619 - 0.689) | 0.887 (0.874 - 0.901) | 0.876 (0.854 - 0.900) | |
Barahona_AUDIAS_task4a_8 | 5-Resolution Conformer with class-wise median filtering | Barahona2023 | 1.00 | 0.816 (0.779 - 0.860) | 0.838 (0.812 - 0.867) | 0.906 (0.887 - 0.926) | 0.628 (0.586 - 0.687) | 0.859 (0.834 - 0.884) | 0.922 (0.908 - 0.936) | 0.848 (0.825 - 0.871) | 0.683 (0.653 - 0.718) | 0.882 (0.870 - 0.894) | 0.896 (0.875 - 0.918) | |
Gan_NCUT_task4_1 | Gan_NCUT_SED_system_1 | Gan2023 | 1.12 | 0.670 (0.622 - 0.728) | 0.795 (0.767 - 0.822) | 0.842 (0.819 - 0.867) | 0.501 (0.476 - 0.527) | 0.661 (0.630 - 0.703) | 0.825 (0.800 - 0.850) | 0.737 (0.708 - 0.766) | 0.569 (0.541 - 0.603) | 0.866 (0.854 - 0.878) | 0.803 (0.777 - 0.832) | |
Gan_NCUT_task4_2 | Gan_NCUT_SED_system_2 | Gan2023 | 1.52 | 0.912 (0.887 - 0.942) | 0.906 (0.889 - 0.925) | 0.959 (0.949 - 0.969) | 0.676 (0.647 - 0.706) | 0.937 (0.926 - 0.951) | 0.932 (0.919 - 0.947) | 0.918 (0.903 - 0.932) | 0.785 (0.758 - 0.813) | 0.922 (0.911 - 0.932) | 0.948 (0.936 - 0.962) | |
Gan_NCUT_task4_3 | Gan_NCUT_SED_system_3 | Gan2023 | 1.50 | 0.898 (0.871 - 0.931) | 0.889 (0.869 - 0.913) | 0.963 (0.954 - 0.975) | 0.742 (0.717 - 0.767) | 0.943 (0.932 - 0.955) | 0.943 (0.930 - 0.958) | 0.891 (0.875 - 0.908) | 0.794 (0.768 - 0.820) | 0.926 (0.916 - 0.937) | 0.960 (0.950 - 0.972) | |
Liu_SRCN_task4a_1 | DCASE2023 t4a system1 | Chen2023a | 1.65 | 0.883 (0.851 - 0.935) | 0.906 (0.887 - 0.928) | 0.975 (0.969 - 0.985) | 0.781 (0.759 - 0.806) | 0.950 (0.940 - 0.967) | 0.807 (0.778 - 0.839) | 0.935 (0.923 - 0.946) | 0.818 (0.796 - 0.841) | 0.934 (0.925 - 0.943) | 0.960 (0.948 - 0.973) | |
Liu_SRCN_task4a_2 | DCASE2023 t4a system2 | Chen2023a | 1.40 | 0.967 (0.959 - 0.978) | 0.950 (0.941 - 0.963) | 0.978 (0.971 - 0.987) | 0.816 (0.796 - 0.835) | 0.966 (0.960 - 0.974) | 0.965 (0.957 - 0.976) | 0.956 (0.947 - 0.965) | 0.854 (0.830 - 0.875) | 0.921 (0.912 - 0.929) | 0.967 (0.958 - 0.978) | |
Liu_SRCN_task4a_3 | DCASE2023 t4a system3 | Chen2023a | 1.65 | 0.962 (0.952 - 0.974) | 0.956 (0.948 - 0.965) | 0.979 (0.973 - 0.987) | 0.784 (0.763 - 0.807) | 0.940 (0.930 - 0.953) | 0.957 (0.946 - 0.970) | 0.951 (0.942 - 0.960) | 0.845 (0.820 - 0.869) | 0.931 (0.922 - 0.940) | 0.965 (0.955 - 0.977) | |
Liu_SRCN_task4a_4 | DCASE2023 t4a system4 | Chen2023a | 1.25 | 0.730 (0.684 - 0.773) | 0.808 (0.775 - 0.844) | 0.885 (0.865 - 0.902) | 0.561 (0.539 - 0.587) | 0.785 (0.762 - 0.809) | 0.825 (0.799 - 0.852) | 0.870 (0.847 - 0.892) | 0.593 (0.566 - 0.625) | 0.890 (0.878 - 0.901) | 0.878 (0.855 - 0.901) | |
Liu_SRCN_task4a_5 | DCASE2023 t4a system5 | Chen2023a | 0.94 | 0.972 (0.965 - 0.980) | 0.926 (0.912 - 0.942) | 0.936 (0.926 - 0.948) | 0.763 (0.742 - 0.785) | 0.965 (0.957 - 0.975) | 0.961 (0.952 - 0.974) | 0.951 (0.941 - 0.959) | 0.847 (0.820 - 0.866) | 0.952 (0.946 - 0.957) | 0.966 (0.957 - 0.977) | |
Kim_GIST-HanwhaVision_task4a_1 | DCASE2023 FDY-LKA CRNN without external single | Kim2023 | 1.35 | 0.803 (0.744 - 0.845) | 0.867 (0.818 - 0.908) | 0.931 (0.918 - 0.945) | 0.586 (0.549 - 0.624) | 0.748 (0.695 - 0.804) | 0.908 (0.885 - 0.937) | 0.859 (0.840 - 0.879) | 0.709 (0.674 - 0.746) | 0.825 (0.760 - 0.866) | 0.892 (0.847 - 0.933) | |
Kim_GIST-HanwhaVision_task4a_2 | FDYLKA BEATs pool1d Stage2 | Kim2023 | 1.68 | 0.949 (0.925 - 0.973) | 0.930 (0.908 - 0.950) | 0.973 (0.965 - 0.984) | 0.732 (0.712 - 0.754) | 0.929 (0.911 - 0.948) | 0.948 (0.935 - 0.960) | 0.934 (0.922 - 0.945) | 0.831 (0.805 - 0.859) | 0.901 (0.887 - 0.914) | 0.953 (0.941 - 0.965) | |
Kim_GIST-HanwhaVision_task4a_3 | LKAFDY BEATs Stage 2 interpolate | Kim2023 | 1.66 | 0.952 (0.933 - 0.970) | 0.931 (0.907 - 0.950) | 0.974 (0.967 - 0.983) | 0.741 (0.718 - 0.766) | 0.929 (0.914 - 0.947) | 0.946 (0.935 - 0.960) | 0.938 (0.926 - 0.949) | 0.831 (0.803 - 0.859) | 0.907 (0.895 - 0.919) | 0.949 (0.936 - 0.963) | |
Kim_GIST-HanwhaVision_task4a_4 | FDYLKA BEATs pool 1d stage1 | Kim2023 | 1.63 | 0.946 (0.920 - 0.970) | 0.910 (0.887 - 0.930) | 0.970 (0.960 - 0.981) | 0.721 (0.693 - 0.747) | 0.908 (0.872 - 0.937) | 0.955 (0.939 - 0.969) | 0.911 (0.888 - 0.928) | 0.804 (0.772 - 0.834) | 0.887 (0.869 - 0.905) | 0.911 (0.879 - 0.947) | |
Kim_GIST-HanwhaVision_task4a_5 | FDYLKA BEATs all ensemble 48 | Kim2023 | 1.72 | 0.961 (0.947 - 0.978) | 0.935 (0.923 - 0.948) | 0.974 (0.966 - 0.983) | 0.755 (0.737 - 0.776) | 0.937 (0.925 - 0.952) | 0.961 (0.951 - 0.972) | 0.943 (0.930 - 0.955) | 0.841 (0.817 - 0.865) | 0.912 (0.900 - 0.923) | 0.951 (0.939 - 0.965) | |
Kim_GIST-HanwhaVision_task4a_6 | FDYLKA BEATs PSDS1 ensemble 16 | Kim2023 | 1.72 | 0.956 (0.941 - 0.974) | 0.930 (0.917 - 0.945) | 0.974 (0.967 - 0.983) | 0.751 (0.731 - 0.772) | 0.935 (0.924 - 0.950) | 0.957 (0.937 - 0.973) | 0.946 (0.934 - 0.956) | 0.835 (0.808 - 0.861) | 0.910 (0.898 - 0.922) | 0.949 (0.937 - 0.962) | |
Kim_GIST-HanwhaVision_task4a_7 | FDYLKA BEATs PSDS2 ensemble 16 | Kim2023 | 1.69 | 0.963 (0.949 - 0.979) | 0.931 (0.916 - 0.946) | 0.973 (0.966 - 0.982) | 0.755 (0.734 - 0.777) | 0.940 (0.928 - 0.953) | 0.959 (0.949 - 0.970) | 0.938 (0.926 - 0.949) | 0.840 (0.815 - 0.863) | 0.910 (0.896 - 0.922) | 0.951 (0.933 - 0.965) | |
Kim_GIST-HanwhaVision_task4a_8 | FDYLKA BEATs PSDS sum ensemble 16 | Kim2023 | 1.72 | 0.959 (0.940 - 0.976) | 0.925 (0.910 - 0.940) | 0.974 (0.966 - 0.983) | 0.753 (0.736 - 0.773) | 0.933 (0.920 - 0.951) | 0.955 (0.943 - 0.968) | 0.945 (0.933 - 0.957) | 0.833 (0.803 - 0.861) | 0.914 (0.902 - 0.925) | 0.947 (0.932 - 0.962) | |
Wenxin_TJU_task4a_1 | ensemble-pretrained-psds1-0 | Wenxin2023 | 1.63 | 0.931 (0.917 - 0.945) | 0.939 (0.930 - 0.952) | 0.973 (0.966 - 0.982) | 0.742 (0.722 - 0.766) | 0.896 (0.879 - 0.915) | 0.969 (0.962 - 0.977) | 0.955 (0.946 - 0.964) | 0.868 (0.851 - 0.885) | 0.892 (0.878 - 0.907) | 0.949 (0.936 - 0.963) | |
Wenxin_TJU_task4a_2 | ensemble-pretrained-psds1-1 | Wenxin2023 | 1.66 | 0.937 (0.923 - 0.950) | 0.941 (0.932 - 0.952) | 0.966 (0.958 - 0.977) | 0.753 (0.733 - 0.775) | 0.906 (0.891 - 0.923) | 0.972 (0.966 - 0.982) | 0.958 (0.949 - 0.968) | 0.891 (0.871 - 0.909) | 0.886 (0.873 - 0.900) | 0.944 (0.932 - 0.958) | |
Wenxin_TJU_task4a_3 | ensemble-pretrained-psds2-0 | Wenxin2023 | 0.88 | 0.902 (0.873 - 0.956) | 0.894 (0.878 - 0.916) | 0.917 (0.904 - 0.932) | 0.730 (0.704 - 0.754) | 0.939 (0.927 - 0.956) | 0.965 (0.958 - 0.974) | 0.926 (0.914 - 0.939) | 0.785 (0.758 - 0.811) | 0.927 (0.917 - 0.936) | 0.929 (0.913 - 0.946) | |
Wenxin_TJU_task4a_4 | ensemble-pretrained-psds2-1 | Wenxin2023 | 0.90 | 0.946 (0.930 - 0.966) | 0.900 (0.881 - 0.922) | 0.925 (0.913 - 0.939) | 0.739 (0.717 - 0.764) | 0.959 (0.951 - 0.969) | 0.982 (0.977 - 0.986) | 0.950 (0.942 - 0.960) | 0.832 (0.807 - 0.852) | 0.936 (0.929 - 0.946) | 0.946 (0.933 - 0.959) | |
Wenxin_TJU_task4a_5 | single-pretrained-psds1-0 | Wenxin2023 | 1.58 | 0.905 (0.887 - 0.927) | 0.914 (0.899 - 0.931) | 0.961 (0.952 - 0.972) | 0.705 (0.679 - 0.741) | 0.885 (0.867 - 0.904) | 0.963 (0.955 - 0.971) | 0.933 (0.922 - 0.945) | 0.845 (0.823 - 0.863) | 0.897 (0.886 - 0.910) | 0.933 (0.920 - 0.948) | |
Wenxin_TJU_task4a_6 | single-pretrained-psds1-1 | Wenxin2023 | 1.61 | 0.931 (0.917 - 0.944) | 0.924 (0.908 - 0.939) | 0.964 (0.957 - 0.973) | 0.734 (0.714 - 0.760) | 0.902 (0.889 - 0.919) | 0.965 (0.957 - 0.976) | 0.955 (0.945 - 0.963) | 0.882 (0.865 - 0.901) | 0.868 (0.854 - 0.885) | 0.940 (0.926 - 0.953) | |
Wenxin_TJU_task4a_7 | single-psds1 | Wenxin2023 | 1.31 | 0.702 (0.659 - 0.750) | 0.866 (0.843 - 0.890) | 0.918 (0.904 - 0.936) | 0.571 (0.543 - 0.602) | 0.755 (0.724 - 0.786) | 0.864 (0.842 - 0.888) | 0.868 (0.848 - 0.890) | 0.700 (0.673 - 0.726) | 0.887 (0.877 - 0.896) | 0.869 (0.848 - 0.891) | |
Wenxin_TJU_task4a_8 | single-psds2 | Wenxin2023 | 0.75 | 0.779 (0.735 - 0.838) | 0.841 (0.818 - 0.862) | 0.892 (0.877 - 0.913) | 0.665 (0.635 - 0.695) | 0.893 (0.877 - 0.914) | 0.905 (0.891 - 0.920) | 0.846 (0.825 - 0.864) | 0.664 (0.636 - 0.695) | 0.809 (0.792 - 0.832) | 0.828 (0.805 - 0.856) |
Energy Consumption
Rank |
Submission code |
Submission name |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS 1 (Evaluation dataset) |
PSDS 2 (Evaluation dataset) |
Energy (kWh) (training, normalized) |
Energy (kWh) (Test, normalized) |
EW-PSDS 1 (training energy) |
EW-PSDS 2 (training energy) |
EW-PSDS 1 (test energy) |
EW-PSDS 2 (test energy) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Baseline_task4a_1 | DCASE2023 baseline system | Turpault2023 | 1.00 | 0.327 (0.317 - 0.339) | 0.538 (0.515 - 0.566) | 1.390 | 0.019 | 0.327 | 0.538 | 0.327 | 0.538 | |
Baseline_task4a_2 | DCASE2023 baseline system (Audioset+Beats) | Turpault2023 | 1.52 | 0.510 (0.496 - 0.523) | 0.798 (0.782 - 0.811) | 1.821 | 0.020 | 0.389 | 0.609 | 0.484 | 0.758 | |
Li_USTC_task4a_1 | TAFT and SdMT | Li2023 | 1.54 | 0.539 (0.527 - 0.551) | 0.769 (0.758 - 0.778) | 10.045 | 0.026 | 0.075 | 0.106 | 0.394 | 0.562 | |
Li_USTC_task4a_2 | Pseudo labeling | Li2023 | 1.58 | 0.556 (0.544 - 0.569) | 0.781 (0.769 - 0.795) | 6.496 | 0.019 | 0.119 | 0.167 | 0.556 | 0.781 | |
Li_USTC_task4a_3 | TAFT and AFL | Li2023 | 1.54 | 0.546 (0.535 - 0.558) | 0.756 (0.745 - 0.769) | 6.496 | 0.019 | 0.117 | 0.162 | 0.546 | 0.756 | |
Li_USTC_task4a_4 | MaxFilter | Li2023 | 0.89 | 0.061 (0.050 - 0.070) | 0.852 (0.843 - 0.863) | 6.496 | 0.019 | 0.013 | 0.182 | 0.061 | 0.852 | |
Li_USTC_task4a_5 | TAFT and SdMT and single | Li2023 | 1.52 | 0.531 (0.520 - 0.544) | 0.762 (0.751 - 0.773) | 3.347 | 0.010 | 0.221 | 0.316 | 1.009 | 1.447 | |
Li_USTC_task4a_6 | Pseudo labeling and single | Li2023 | 1.56 | 0.546 (0.529 - 0.562) | 0.783 (0.771 - 0.796) | 3.347 | 0.010 | 0.227 | 0.325 | 1.037 | 1.488 | |
Li_USTC_task4a_7 | SKCRNN MT | Li2023 | 1.20 | 0.404 (0.389 - 0.421) | 0.630 (0.612 - 0.648) | 5.472 | 0.003 | 0.103 | 0.160 | 2.559 | 3.990 | |
Liu_NSYSU_task4_1 | DCASE2023 FDY_WeakSED_Ensemble | Liu2023 | 0.80 | 0.051 (0.042 - 0.060) | 0.779 (0.767 - 0.791) | 12.405 | 6.240 | 0.006 | 0.087 | 0.000 | 0.002 | |
Liu_NSYSU_task4_2 | FDY_Ensemble | Liu2023 | 1.36 | 0.466 (0.455 - 0.480) | 0.701 (0.688 - 0.714) | 6.453 | 1.957 | 0.100 | 0.151 | 0.005 | 0.007 | |
Liu_NSYSU_task4_3 | DCASE2023 VGGSK_Single | Liu2023 | 1.26 | 0.434 (0.420 - 0.448) | 0.646 (0.633 - 0.660) | 0.192 | 0.016 | 3.141 | 4.675 | 0.515 | 0.767 | |
Liu_NSYSU_task4_4 | DCASE2023 FDY_Single | Liu2023 | 1.24 | 0.413 (0.394 - 0.438) | 0.655 (0.638 - 0.673) | 1.077 | 0.325 | 0.533 | 0.846 | 0.024 | 0.038 | |
Liu_NSYSU_task4_5 | DCASE2023 FDY_BEATs_WeakSED | Liu2023 | 0.82 | 0.045 (0.035 - 0.053) | 0.806 (0.794 - 0.818) | 4.608 | 1.399 | 0.013 | 0.243 | 0.001 | 0.011 | |
Liu_NSYSU_task4_6 | DCASE2023 FDY_BEATs | Liu2023 | 1.62 | 0.552 (0.540 - 0.563) | 0.838 (0.829 - 0.848) | 4.608 | 1.399 | 0.167 | 0.253 | 0.007 | 0.011 | |
Liu_NSYSU_task4_7 | DCASE2023 FDY_BEATs | Liu2023 | 1.55 | 0.521 (0.510 - 0.531) | 0.813 (0.796 - 0.831) | 0.923 | 0.279 | 0.784 | 1.225 | 0.035 | 0.055 | |
Liu_NSYSU_task4_8 | DCASE2023 FDY_BEATs | Liu2023 | 1.53 | 0.515 (0.488 - 0.536) | 0.805 (0.791 - 0.818) | 0.923 | 0.279 | 0.775 | 1.212 | 0.035 | 0.055 | |
Lee_CAU_task4A_1 | CAU_ET | Lee2023 | 1.24 | 0.425 (0.415 - 0.440) | 0.634 (0.618 - 0.648) | 2.686 | 0.016 | 0.220 | 0.328 | 0.505 | 0.753 | |
Lee_CAU_task4A_2 | CAU_ET | Lee2023 | 0.79 | 0.104 (0.090 - 0.117) | 0.674 (0.661 - 0.690) | 3.011 | 0.012 | 0.048 | 0.311 | 0.164 | 1.068 | |
Cheimariotis_DUTH_task4a_1 | DuthApida | Cheimariotis2023 | 1.53 | 0.516 (0.504 - 0.529) | 0.796 (0.784 - 0.808) | 1.666 | 0.033 | 0.430 | 0.664 | 0.297 | 0.458 | |
Cheimariotis_DUTH_task4a_2 | DuthApida | Cheimariotis2023 | 1.45 | 0.487 (0.475 - 0.502) | 0.759 (0.745 - 0.773) | 1.964 | 0.375 | 0.345 | 0.537 | 0.025 | 0.038 | |
Chen_CHT_task4_1 | VGGSK | Chen2023b | 1.25 | 0.441 (0.403 - 0.468) | 0.620 (0.567 - 0.652) | 0.655 | 0.005 | 0.935 | 1.315 | 1.675 | 2.355 | |
Chen_CHT_task4_2 | VGGSK+BEATs | Chen2023b | 1.58 | 0.563 (0.550 - 0.574) | 0.779 (0.768 - 0.792) | 1.354 | 0.005 | 0.578 | 0.800 | 2.139 | 2.961 | |
Chen_CHT_task4_3 | VGGSK+BEATs | Chen2023b | 1.66 | 0.596 (0.585 - 0.606) | 0.810 (0.800 - 0.822) | 1.354 | 0.005 | 0.612 | 0.832 | 2.267 | 3.080 | |
Chen_CHT_task4_4 | multi+BEATs | Chen2023b | 1.66 | 0.590 (0.578 - 0.601) | 0.820 (0.810 - 0.831) | 1.354 | 0.005 | 0.606 | 0.842 | 2.243 | 3.118 | |
Xiao_FMSG_task4a_1 | Xiao_FMSG_task4a_1_single_model_without_external | Zhang2023 | 1.23 | 0.403 (0.392 - 0.417) | 0.660 (0.646 - 0.672) | 0.800 | 0.010 | 0.701 | 1.146 | 0.766 | 1.253 | |
Xiao_FMSG_task4a_2 | Xiao_FMSG_task4a_2_single_model | Xiao2023 | 1.55 | 0.525 (0.516 - 0.538) | 0.808 (0.796 - 0.821) | 0.971 | 0.007 | 0.752 | 1.156 | 1.425 | 2.193 | |
Xiao_FMSG_task4a_3 | Xiao_FMSG_task4a_3_single_model_psds2 | Xiao2023 | 0.86 | 0.071 (0.062 - 0.080) | 0.807 (0.796 - 0.818) | 0.853 | 0.007 | 0.116 | 1.314 | 0.193 | 2.189 | |
Xiao_FMSG_task4a_4 | Xiao_FMSG_task4a_4_single_model | Xiao2023 | 1.60 | 0.551 (0.543 - 0.562) | 0.813 (0.802 - 0.827) | 0.811 | 0.006 | 0.945 | 1.394 | 1.746 | 2.575 | |
Xiao_FMSG_task4a_5 | Xiao_FMSG_task4a_5_ensemble_model | Xiao2023 | 1.61 | 0.555 (0.545 - 0.567) | 0.821 (0.811 - 0.834) | 4.587 | 0.032 | 0.168 | 0.249 | 0.330 | 0.488 | |
Xiao_FMSG_task4a_6 | Xiao_FMSG_task4a_6_ensemble_model | Xiao2023 | 1.61 | 0.551 (0.541 - 0.561) | 0.829 (0.819 - 0.842) | 9.707 | 0.075 | 0.079 | 0.119 | 0.140 | 0.210 | |
Xiao_FMSG_task4a_7 | Xiao_FMSG_task4a_7_ensemble_model | Xiao2023 | 0.87 | 0.075 (0.066 - 0.084) | 0.811 (0.800 - 0.822) | 3.426 | 0.075 | 0.030 | 0.329 | 0.019 | 0.206 | |
Xiao_FMSG_task4a_8 | Xiao_FMSG_task4a_8_ensemble_model | Xiao2023 | 1.62 | 0.549 (0.540 - 0.560) | 0.834 (0.824 - 0.847) | 10.240 | 0.075 | 0.075 | 0.113 | 0.139 | 0.211 | |
Guan_HIT_task4a_1 | Guan_HIT_task4a_1 | Guan2023 | 1.57 | 0.536 (0.526 - 0.546) | 0.810 (0.800 - 0.822) | 6.673 | 0.038 | 0.112 | 0.169 | 0.268 | 0.405 | |
Guan_HIT_task4a_2 | Guan_HIT_task4a_2 | Guan2023 | 0.93 | 0.082 (0.074 - 0.090) | 0.862 (0.852 - 0.872) | 7.313 | 0.038 | 0.016 | 0.164 | 0.041 | 0.431 | |
Guan_HIT_task4a_3 | Guan_HIT_task4a_3 | Guan2023 | 1.55 | 0.526 (0.513 - 0.539) | 0.800 (0.788 - 0.813) | 8.806 | 0.049 | 0.083 | 0.126 | 0.204 | 0.310 | |
Guan_HIT_task4a_4 | Guan_HIT_task4a_4 | Guan2023 | 0.92 | 0.082 (0.073 - 0.091) | 0.855 (0.844 - 0.867) | 8.806 | 0.049 | 0.013 | 0.135 | 0.032 | 0.332 | |
Guan_HIT_task4a_5 | Guan_HIT_task4a_5 | Guan2023 | 1.40 | 0.488 (0.475 - 0.503) | 0.708 (0.696 - 0.720) | 8.981 | 0.049 | 0.076 | 0.110 | 0.189 | 0.274 | |
Guan_HIT_task4a_6 | Guan_HIT_task4a_6 | Guan2023 | 0.88 | 0.088 (0.080 - 0.096) | 0.797 (0.787 - 0.810) | 8.981 | 0.049 | 0.014 | 0.123 | 0.034 | 0.309 | |
Wang_XiaoRice_task4a_1 | SINGLE | Wang2023 | 1.50 | 0.494 (0.477 - 0.510) | 0.801 (0.789 - 0.815) | 0.358 | 0.056 | 1.918 | 3.110 | 0.168 | 0.272 | |
Wang_XiaoRice_task4a_2 | SED Embed | Wang2023 | 1.52 | 0.497 (0.486 - 0.510) | 0.814 (0.803 - 0.828) | 1.882 | 0.056 | 0.367 | 0.601 | 0.168 | 0.276 | |
Wang_XiaoRice_task4a_3 | L-TAG | Wang2023 | 0.91 | 0.088 (0.076 - 0.098) | 0.835 (0.824 - 0.844) | 1.882 | 0.056 | 0.065 | 0.617 | 0.030 | 0.283 | |
Zhang_IOA_task4_1 | strong_ensemble | Zhang2023 | 1.75 | 0.622 (0.613 - 0.634) | 0.857 (0.849 - 0.866) | 69.120 | 1.024 | 0.013 | 0.017 | 0.012 | 0.016 | |
Zhang_IOA_task4_2 | segment tagging model | Zhang2023 | 0.95 | 0.070 (0.060 - 0.080) | 0.903 (0.895 - 0.911) | 35.840 | 0.640 | 0.003 | 0.035 | 0.002 | 0.027 | |
Zhang_IOA_task4_3 | strong_ensemble_all | Zhang2023 | 1.71 | 0.613 (0.603 - 0.625) | 0.828 (0.821 - 0.839) | 128.000 | 1.280 | 0.007 | 0.009 | 0.009 | 0.012 | |
Zhang_IOA_task4_4 | strong_ensemble_1 | Zhang2023 | 1.75 | 0.625 (0.615 - 0.637) | 0.855 (0.847 - 0.864) | 69.120 | 1.024 | 0.013 | 0.017 | 0.012 | 0.016 | |
Zhang_IOA_task4_5 | base system | Zhang2023 | 1.52 | 0.524 (0.513 - 0.537) | 0.774 (0.762 - 0.786) | 15.360 | 0.640 | 0.047 | 0.070 | 0.016 | 0.023 | |
Zhang_IOA_task4_6 | strong_single | Zhang2023 | 1.60 | 0.562 (0.552 - 0.575) | 0.795 (0.786 - 0.805) | 20.480 | 0.640 | 0.038 | 0.054 | 0.017 | 0.024 | |
Zhang_IOA_task4_7 | weak single | Zhang2023 | 0.86 | 0.055 (0.048 - 0.064) | 0.830 (0.820 - 0.842) | 23.040 | 0.640 | 0.003 | 0.050 | 0.002 | 0.025 | |
Wu_NCUT_task4a_1 | Wu_NCUT_task4a_1 | Wu2023 | 1.15 | 0.391 (0.379 - 0.405) | 0.596 (0.584 - 0.610) | 1.002 | 0.002 | 0.543 | 0.826 | 3.715 | 5.659 | |
Wu_NCUT_task4a_2 | Wu_NCUT_task4a_2 | Wu2023 | 1.53 | 0.519 (0.507 - 0.531) | 0.793 (0.783 - 0.806) | 3.238 | 0.010 | 0.223 | 0.341 | 0.985 | 1.507 | |
Wu_NCUT_task4a_3 | Wu_NCUT_task4a_3 | Wu2023 | 1.50 | 0.497 (0.486 - 0.509) | 0.793 (0.783 - 0.806) | 4.317 | 0.014 | 0.160 | 0.255 | 0.675 | 1.076 | |
Barahona_AUDIAS_task4a_1 | CRNN T++ resolution | Barahona2023 | 1.06 | 0.351 (0.333 - 0.372) | 0.562 (0.532 - 0.587) | 10.007 | 0.095 | 0.049 | 0.078 | 0.070 | 0.112 | |
Barahona_AUDIAS_task4a_2 | CRNN T++ resolution with class-wise median filtering | Barahona2023 | 1.12 | 0.380 (0.361 - 0.406) | 0.575 (0.553 - 0.594) | 10.007 | 0.095 | 0.053 | 0.080 | 0.076 | 0.115 | |
Barahona_AUDIAS_task4a_3 | Conformer F+ resolution | Barahona2023 | 0.91 | 0.200 (0.164 - 0.225) | 0.646 (0.626 - 0.664) | 19.345 | 0.095 | 0.014 | 0.046 | 0.040 | 0.129 | |
Barahona_AUDIAS_task4a_4 | Conformer F+ resolution with class-wise median filtering | Barahona2023 | 0.84 | 0.141 (0.124 - 0.155) | 0.673 (0.652 - 0.700) | 19.345 | 0.095 | 0.010 | 0.048 | 0.028 | 0.135 | |
Barahona_AUDIAS_task4a_5 | 4-Resolution CRNN | Barahona2023 | 1.14 | 0.378 (0.365 - 0.392) | 0.604 (0.590 - 0.622) | 37.104 | 0.365 | 0.014 | 0.023 | 0.020 | 0.031 | |
Barahona_AUDIAS_task4a_6 | 4-Resolution CRNN with class-dependent median filtering | Barahona2023 | 1.18 | 0.401 (0.390 - 0.414) | 0.612 (0.596 - 0.630) | 37.104 | 0.365 | 0.015 | 0.023 | 0.021 | 0.032 | |
Barahona_AUDIAS_task4a_7 | 5-Resolution Conformer | Barahona2023 | 1.06 | 0.274 (0.262 - 0.287) | 0.684 (0.671 - 0.699) | 124.323 | 0.400 | 0.003 | 0.008 | 0.013 | 0.033 | |
Barahona_AUDIAS_task4a_8 | 5-Resolution Conformer with class-wise median filtering | Barahona2023 | 1.00 | 0.213 (0.201 - 0.226) | 0.729 (0.710 - 0.752) | 124.323 | 0.400 | 0.002 | 0.008 | 0.010 | 0.035 | |
Gan_NCUT_task4_1 | Gan_NCUT_SED_system_1 | Gan2023 | 1.12 | 0.365 (0.353 - 0.377) | 0.603 (0.589 - 0.617) | 1.175 | 0.006 | 0.432 | 0.714 | 1.156 | 1.911 | |
Gan_NCUT_task4_2 | Gan_NCUT_SED_system_2 | Gan2023 | 1.52 | 0.511 (0.498 - 0.524) | 0.799 (0.785 - 0.813) | 7.477 | 0.039 | 0.095 | 0.149 | 0.249 | 0.389 | |
Gan_NCUT_task4_3 | Gan_NCUT_SED_system_3 | Gan2023 | 1.50 | 0.483 (0.467 - 0.498) | 0.816 (0.805 - 0.828) | 8.971 | 0.075 | 0.075 | 0.126 | 0.122 | 0.207 | |
Liu_SRCN_task4a_1 | DCASE2023 t4a system1 | Chen2023a | 1.65 | 0.585 (0.572 - 0.598) | 0.817 (0.804 - 0.834) | 19.611 | 0.087 | 0.041 | 0.058 | 0.128 | 0.178 | |
Liu_SRCN_task4a_2 | DCASE2023 t4a system2 | Chen2023a | 1.40 | 0.380 (0.369 - 0.392) | 0.877 (0.867 - 0.885) | 18.171 | 0.064 | 0.029 | 0.067 | 0.113 | 0.260 | |
Liu_SRCN_task4a_3 | DCASE2023 t4a system3 | Chen2023a | 1.65 | 0.556 (0.544 - 0.569) | 0.861 (0.852 - 0.870) | 20.457 | 0.091 | 0.038 | 0.059 | 0.116 | 0.180 | |
Liu_SRCN_task4a_4 | DCASE2023 t4a system4 | Chen2023a | 1.25 | 0.412 (0.400 - 0.424) | 0.663 (0.652 - 0.676) | 4.425 | 0.014 | 0.129 | 0.208 | 0.559 | 0.899 | |
Liu_SRCN_task4a_5 | DCASE2023 t4a system5 | Chen2023a | 0.94 | 0.098 (0.086 - 0.108) | 0.851 (0.841 - 0.860) | 18.171 | 0.064 | 0.007 | 0.065 | 0.029 | 0.253 | |
Kim_GIST-HanwhaVision_task4a_1 | DCASE2023 FDY-LKA CRNN without external single | Kim2023 | 1.35 | 0.459 (0.431 - 0.484) | 0.701 (0.681 - 0.720) | 3.241 | 0.021 | 0.197 | 0.301 | 0.415 | 0.635 | |
Kim_GIST-HanwhaVision_task4a_2 | FDYLKA BEATs pool1d Stage2 | Kim2023 | 1.68 | 0.591 (0.574 - 0.611) | 0.831 (0.823 - 0.841) | 3.912 | 0.028 | 0.210 | 0.295 | 0.401 | 0.564 | |
Kim_GIST-HanwhaVision_task4a_3 | LKAFDY BEATs Stage 2 interpolate | Kim2023 | 1.66 | 0.581 (0.553 - 0.600) | 0.835 (0.826 - 0.846) | 3.432 | 0.023 | 0.235 | 0.338 | 0.480 | 0.690 | |
Kim_GIST-HanwhaVision_task4a_4 | FDYLKA BEATs pool 1d stage1 | Kim2023 | 1.63 | 0.576 (0.549 - 0.595) | 0.809 (0.797 - 0.821) | 2.783 | 0.028 | 0.288 | 0.404 | 0.391 | 0.549 | |
Kim_GIST-HanwhaVision_task4a_5 | FDYLKA BEATs all ensemble 48 | Kim2023 | 1.72 | 0.611 (0.598 - 0.623) | 0.846 (0.838 - 0.855) | 12.334 | 1.195 | 0.069 | 0.095 | 0.010 | 0.013 | |
Kim_GIST-HanwhaVision_task4a_6 | FDYLKA BEATs PSDS1 ensemble 16 | Kim2023 | 1.72 | 0.611 (0.590 - 0.628) | 0.841 (0.832 - 0.851) | 12.334 | 0.398 | 0.069 | 0.095 | 0.029 | 0.040 | |
Kim_GIST-HanwhaVision_task4a_7 | FDYLKA BEATs PSDS2 ensemble 16 | Kim2023 | 1.69 | 0.591 (0.574 - 0.604) | 0.844 (0.835 - 0.853) | 12.334 | 0.398 | 0.067 | 0.095 | 0.028 | 0.040 | |
Kim_GIST-HanwhaVision_task4a_8 | FDYLKA BEATs PSDS sum ensemble 16 | Kim2023 | 1.72 | 0.612 (0.599 - 0.626) | 0.841 (0.831 - 0.851) | 12.334 | 0.398 | 0.069 | 0.095 | 0.029 | 0.040 | |
Wenxin_TJU_task4a_1 | ensemble-pretrained-psds1-0 | Wenxin2023 | 1.63 | 0.555 (0.543 - 0.566) | 0.837 (0.828 - 0.847) | 190.549 | 0.474 | 0.004 | 0.006 | 0.022 | 0.034 | |
Wenxin_TJU_task4a_2 | ensemble-pretrained-psds1-1 | Wenxin2023 | 1.66 | 0.570 (0.559 - 0.580) | 0.844 (0.836 - 0.854) | 95.275 | 0.237 | 0.008 | 0.012 | 0.046 | 0.068 | |
Wenxin_TJU_task4a_3 | ensemble-pretrained-psds2-0 | Wenxin2023 | 0.88 | 0.080 (0.071 - 0.088) | 0.815 (0.802 - 0.825) | 171.494 | 0.427 | 0.001 | 0.007 | 0.004 | 0.036 | |
Wenxin_TJU_task4a_4 | ensemble-pretrained-psds2-1 | Wenxin2023 | 0.90 | 0.081 (0.071 - 0.090) | 0.838 (0.828 - 0.849) | 190.549 | 0.474 | 0.001 | 0.006 | 0.003 | 0.034 | |
Wenxin_TJU_task4a_5 | single-pretrained-psds1-0 | Wenxin2023 | 1.58 | 0.539 (0.528 - 0.549) | 0.816 (0.806 - 0.831) | 9.527 | 0.024 | 0.079 | 0.119 | 0.427 | 0.646 | |
Wenxin_TJU_task4a_6 | single-pretrained-psds1-1 | Wenxin2023 | 1.61 | 0.546 (0.536 - 0.556) | 0.831 (0.823 - 0.842) | 9.527 | 0.024 | 0.080 | 0.121 | 0.432 | 0.658 | |
Wenxin_TJU_task4a_7 | single-psds1 | Wenxin2023 | 1.31 | 0.440 (0.429 - 0.454) | 0.686 (0.673 - 0.699) | 7.582 | 0.017 | 0.081 | 0.126 | 0.492 | 0.766 | |
Wenxin_TJU_task4a_8 | single-psds2 | Wenxin2023 | 0.75 | 0.059 (0.049 - 0.068) | 0.707 (0.694 - 0.723) | 7.582 | 0.017 | 0.011 | 0.130 | 0.065 | 0.790 |
System characteristics
General characteristics
Rank | Code |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS 1 (Evaluation dataset) |
PSDS 2 (Evaluation dataset) |
Data augmentation |
Features |
---|---|---|---|---|---|---|---|
Baseline_task4a_1 | Turpault2023 | 1.00 | 0.327 (0.317 - 0.339) | 0.538 (0.515 - 0.566) | mixup | log-mel energies | |
Baseline_task4a_2 | Turpault2023 | 1.52 | 0.510 (0.496 - 0.523) | 0.798 (0.782 - 0.811) | mixup | log-mel energies | |
Li_USTC_task4a_1 | Li2023 | 1.54 | 0.539 (0.527 - 0.551) | 0.769 (0.758 - 0.778) | specaugmentation | log-mel energies | |
Li_USTC_task4a_2 | Li2023 | 1.58 | 0.556 (0.544 - 0.569) | 0.781 (0.769 - 0.795) | specaugmentation, mixup | log-mel energies | |
Li_USTC_task4a_3 | Li2023 | 1.54 | 0.546 (0.535 - 0.558) | 0.756 (0.745 - 0.769) | specaugmentation, mixup | log-mel energies | |
Li_USTC_task4a_4 | Li2023 | 0.89 | 0.061 (0.050 - 0.070) | 0.852 (0.843 - 0.863) | specaugmentation, mixup | log-mel energies | |
Li_USTC_task4a_5 | Li2023 | 1.52 | 0.531 (0.520 - 0.544) | 0.762 (0.751 - 0.773) | specaugmentation | log-mel energies | |
Li_USTC_task4a_6 | Li2023 | 1.56 | 0.546 (0.529 - 0.562) | 0.783 (0.771 - 0.796) | specaugmentation, mixup | log-mel energies | |
Li_USTC_task4a_7 | Li2023 | 1.20 | 0.404 (0.389 - 0.421) | 0.630 (0.612 - 0.648) | specaugmentation | log-mel energies | |
Liu_NSYSU_task4_1 | Liu2023 | 0.80 | 0.051 (0.042 - 0.060) | 0.779 (0.767 - 0.791) | mixup, filter augment | log-mel energies | |
Liu_NSYSU_task4_2 | Liu2023 | 1.36 | 0.466 (0.455 - 0.480) | 0.701 (0.688 - 0.714) | mixup, filter augment | log-mel energies | |
Liu_NSYSU_task4_3 | Liu2023 | 1.26 | 0.434 (0.420 - 0.448) | 0.646 (0.633 - 0.660) | mixup, filter augment, time shifting, pitch shifting, spec augment | log-mel energies | |
Liu_NSYSU_task4_4 | Liu2023 | 1.24 | 0.413 (0.394 - 0.438) | 0.655 (0.638 - 0.673) | mixup, filter augment, time shifting, pitch shifting, spec augment | log-mel energies | |
Liu_NSYSU_task4_5 | Liu2023 | 0.82 | 0.045 (0.035 - 0.053) | 0.806 (0.794 - 0.818) | mixup, filter augment | log-mel energies | |
Liu_NSYSU_task4_6 | Liu2023 | 1.62 | 0.552 (0.540 - 0.563) | 0.838 (0.829 - 0.848) | mixup, filter augment | log-mel energies | |
Liu_NSYSU_task4_7 | Liu2023 | 1.55 | 0.521 (0.510 - 0.531) | 0.813 (0.796 - 0.831) | mixup, filter augment | log-mel energies | |
Liu_NSYSU_task4_8 | Liu2023 | 1.53 | 0.515 (0.488 - 0.536) | 0.805 (0.791 - 0.818) | mixup, filter augment | log-mel energies | |
Lee_CAU_task4A_1 | Lee2023 | 1.24 | 0.425 (0.415 - 0.440) | 0.634 (0.618 - 0.648) | mixup, time-masking, filteraugment | log-mel spectrogram | |
Lee_CAU_task4A_2 | Lee2023 | 0.79 | 0.104 (0.090 - 0.117) | 0.674 (0.661 - 0.690) | mixup, time-masking, filteraugment | log-mel spectrogram | |
Cheimariotis_DUTH_task4a_1 | Cheimariotis2023 | 1.53 | 0.516 (0.504 - 0.529) | 0.796 (0.784 - 0.808) | mixup | log-mel spectrogram | |
Cheimariotis_DUTH_task4a_2 | Cheimariotis2023 | 1.45 | 0.487 (0.475 - 0.502) | 0.759 (0.745 - 0.773) | mixup | log-mel spectrogram | |
Chen_CHT_task4_1 | Chen2023b | 1.25 | 0.441 (0.403 - 0.468) | 0.620 (0.567 - 0.652) | mix-up, noise, sct | log-mel energies | |
Chen_CHT_task4_2 | Chen2023b | 1.58 | 0.563 (0.550 - 0.574) | 0.779 (0.768 - 0.792) | mix-up, ict | log-mel energies | |
Chen_CHT_task4_3 | Chen2023b | 1.66 | 0.596 (0.585 - 0.606) | 0.810 (0.800 - 0.822) | mix-up, ict, nosie, mask | log-mel energies | |
Chen_CHT_task4_4 | Chen2023b | 1.66 | 0.590 (0.578 - 0.601) | 0.820 (0.810 - 0.831) | mix-up, ict, nosie, mask | log-mel energies | |
Xiao_FMSG_task4a_1 | Zhang2023 | 1.23 | 0.403 (0.392 - 0.417) | 0.660 (0.646 - 0.672) | time masking, frequency shifting, mixup, filter-augmentation | log-mel energies | |
Xiao_FMSG_task4a_2 | Xiao2023 | 1.55 | 0.525 (0.516 - 0.538) | 0.808 (0.796 - 0.821) | time masking, frequency masking, mixup | log-mel energies | |
Xiao_FMSG_task4a_3 | Xiao2023 | 0.86 | 0.071 (0.062 - 0.080) | 0.807 (0.796 - 0.818) | time masking, frequency masking, mixup | log-mel energies | |
Xiao_FMSG_task4a_4 | Xiao2023 | 1.60 | 0.551 (0.543 - 0.562) | 0.813 (0.802 - 0.827) | time masking, frequency masking, mixup | log-mel energies | |
Xiao_FMSG_task4a_5 | Xiao2023 | 1.61 | 0.555 (0.545 - 0.567) | 0.821 (0.811 - 0.834) | time masking, frequency masking, mixup | log-mel energies | |
Xiao_FMSG_task4a_6 | Xiao2023 | 1.61 | 0.551 (0.541 - 0.561) | 0.829 (0.819 - 0.842) | time masking, frequency masking, mixup | log-mel energies | |
Xiao_FMSG_task4a_7 | Xiao2023 | 0.87 | 0.075 (0.066 - 0.084) | 0.811 (0.800 - 0.822) | time masking, frequency masking, mixup | log-mel energies | |
Xiao_FMSG_task4a_8 | Xiao2023 | 1.62 | 0.549 (0.540 - 0.560) | 0.834 (0.824 - 0.847) | time masking, frequency masking, mixup | log-mel energies | |
Guan_HIT_task4a_1 | Guan2023 | 1.57 | 0.536 (0.526 - 0.546) | 0.810 (0.800 - 0.822) | mixup, time mask, pitch shift, time shift | log-mel energies | |
Guan_HIT_task4a_2 | Guan2023 | 0.93 | 0.082 (0.074 - 0.090) | 0.862 (0.852 - 0.872) | mixup, time mask, pitch shift, time shift | log-mel energies | |
Guan_HIT_task4a_3 | Guan2023 | 1.55 | 0.526 (0.513 - 0.539) | 0.800 (0.788 - 0.813) | mixup, time mask, pitch shift, time shift | log-mel energies | |
Guan_HIT_task4a_4 | Guan2023 | 0.92 | 0.082 (0.073 - 0.091) | 0.855 (0.844 - 0.867) | mixup, time mask, pitch shift, time shift | log-mel energies | |
Guan_HIT_task4a_5 | Guan2023 | 1.40 | 0.488 (0.475 - 0.503) | 0.708 (0.696 - 0.720) | mixup, time mask, pitch shift, time shift | log-mel energies | |
Guan_HIT_task4a_6 | Guan2023 | 0.88 | 0.088 (0.080 - 0.096) | 0.797 (0.787 - 0.810) | mixup, time mask, pitch shift, time shift | log-mel energies | |
Wang_XiaoRice_task4a_1 | Wang2023 | 1.50 | 0.494 (0.477 - 0.510) | 0.801 (0.789 - 0.815) | Embeddings | ||
Wang_XiaoRice_task4a_2 | Wang2023 | 1.52 | 0.497 (0.486 - 0.510) | 0.814 (0.803 - 0.828) | Embeddings | ||
Wang_XiaoRice_task4a_3 | Wang2023 | 0.91 | 0.088 (0.076 - 0.098) | 0.835 (0.824 - 0.844) | Embeddings | ||
Zhang_IOA_task4_1 | Zhang2023 | 1.75 | 0.622 (0.613 - 0.634) | 0.857 (0.849 - 0.866) | specaugment, mixup, frame_shift, FilterAug | log-mel energies | |
Zhang_IOA_task4_2 | Zhang2023 | 0.95 | 0.070 (0.060 - 0.080) | 0.903 (0.895 - 0.911) | specaugment, mixup, frame_shift, FilterAug | log-mel energies | |
Zhang_IOA_task4_3 | Zhang2023 | 1.71 | 0.613 (0.603 - 0.625) | 0.828 (0.821 - 0.839) | specaugment, mixup, frame_shift, FilterAug | log-mel energies | |
Zhang_IOA_task4_4 | Zhang2023 | 1.75 | 0.625 (0.615 - 0.637) | 0.855 (0.847 - 0.864) | specaugment, mixup, frame_shift, FilterAug | log-mel energies | |
Zhang_IOA_task4_5 | Zhang2023 | 1.52 | 0.524 (0.513 - 0.537) | 0.774 (0.762 - 0.786) | specaugment, mixup, frame_shift, FilterAug | log-mel energies | |
Zhang_IOA_task4_6 | Zhang2023 | 1.60 | 0.562 (0.552 - 0.575) | 0.795 (0.786 - 0.805) | specaugment, mixup, frame_shift, FilterAug | log-mel energies | |
Zhang_IOA_task4_7 | Zhang2023 | 0.86 | 0.055 (0.048 - 0.064) | 0.830 (0.820 - 0.842) | specaugment, mixup, frame_shift, FilterAug | log-mel energies | |
Wu_NCUT_task4a_1 | Wu2023 | 1.15 | 0.391 (0.379 - 0.405) | 0.596 (0.584 - 0.610) | mixup,frameshift,FilterAugment | log-mel energies | |
Wu_NCUT_task4a_2 | Wu2023 | 1.53 | 0.519 (0.507 - 0.531) | 0.793 (0.783 - 0.806) | mixup, FilterAugment | log-mel energies | |
Wu_NCUT_task4a_3 | Wu2023 | 1.50 | 0.497 (0.486 - 0.509) | 0.793 (0.783 - 0.806) | mixup, frameshift, FilterAugment | log-mel energies | |
Barahona_AUDIAS_task4a_1 | Barahona2023 | 1.06 | 0.351 (0.333 - 0.372) | 0.562 (0.532 - 0.587) | mixup, time shifting | log-mel energies | |
Barahona_AUDIAS_task4a_2 | Barahona2023 | 1.12 | 0.380 (0.361 - 0.406) | 0.575 (0.553 - 0.594) | mixup, time shifting | log-mel energies | |
Barahona_AUDIAS_task4a_3 | Barahona2023 | 0.91 | 0.200 (0.164 - 0.225) | 0.646 (0.626 - 0.664) | mixup, filteraugment | log-mel energies | |
Barahona_AUDIAS_task4a_4 | Barahona2023 | 0.84 | 0.141 (0.124 - 0.155) | 0.673 (0.652 - 0.700) | mixup, filteraugment | log-mel energies | |
Barahona_AUDIAS_task4a_5 | Barahona2023 | 1.14 | 0.378 (0.365 - 0.392) | 0.604 (0.590 - 0.622) | mixup, time shifting | log-mel energies | |
Barahona_AUDIAS_task4a_6 | Barahona2023 | 1.18 | 0.401 (0.390 - 0.414) | 0.612 (0.596 - 0.630) | mixup, time shifting | log-mel energies | |
Barahona_AUDIAS_task4a_7 | Barahona2023 | 1.06 | 0.274 (0.262 - 0.287) | 0.684 (0.671 - 0.699) | mixup, filteraugment | log-mel energies | |
Barahona_AUDIAS_task4a_8 | Barahona2023 | 1.00 | 0.213 (0.201 - 0.226) | 0.729 (0.710 - 0.752) | mixup, filteraugment | log-mel energies | |
Gan_NCUT_task4_1 | Gan2023 | 1.12 | 0.365 (0.353 - 0.377) | 0.603 (0.589 - 0.617) | frame shift, time mask, mixup, FilterAugment | log-mel energies | |
Gan_NCUT_task4_2 | Gan2023 | 1.52 | 0.511 (0.498 - 0.524) | 0.799 (0.785 - 0.813) | frame shift, time mask, mixup, FilterAugment | log-mel energies | |
Gan_NCUT_task4_3 | Gan2023 | 1.50 | 0.483 (0.467 - 0.498) | 0.816 (0.805 - 0.828) | frame shift, time mask, mixup, FilterAugment | log-mel energies | |
Liu_SRCN_task4a_1 | Chen2023a | 1.65 | 0.585 (0.572 - 0.598) | 0.817 (0.804 - 0.834) | mixup | log-mel energies | |
Liu_SRCN_task4a_2 | Chen2023a | 1.40 | 0.380 (0.369 - 0.392) | 0.877 (0.867 - 0.885) | mixup | log-mel energies | |
Liu_SRCN_task4a_3 | Chen2023a | 1.65 | 0.556 (0.544 - 0.569) | 0.861 (0.852 - 0.870) | mixup | log-mel energies | |
Liu_SRCN_task4a_4 | Chen2023a | 1.25 | 0.412 (0.400 - 0.424) | 0.663 (0.652 - 0.676) | mixup, time stretching, pitch shifting, time mask, frequency mask | log-mel energies | |
Liu_SRCN_task4a_5 | Chen2023a | 0.94 | 0.098 (0.086 - 0.108) | 0.851 (0.841 - 0.860) | mixup | log-mel energies | |
Kim_GIST-HanwhaVision_task4a_1 | Kim2023 | 1.35 | 0.459 (0.431 - 0.484) | 0.701 (0.681 - 0.720) | frame shift, frequency shift, time masking, filter augment | log-mel energies | |
Kim_GIST-HanwhaVision_task4a_2 | Kim2023 | 1.68 | 0.591 (0.574 - 0.611) | 0.831 (0.823 - 0.841) | time masking, time and frequency shift, filter-augment, mix-up, Gaussian noise | log-mel energies | |
Kim_GIST-HanwhaVision_task4a_3 | Kim2023 | 1.66 | 0.581 (0.553 - 0.600) | 0.835 (0.826 - 0.846) | time masking, time and frequency shift, filter-augment, mix-up, Gaussian noise | log-mel energies | |
Kim_GIST-HanwhaVision_task4a_4 | Kim2023 | 1.63 | 0.576 (0.549 - 0.595) | 0.809 (0.797 - 0.821) | time masking, time and frequency shift, filter-augment, mix-up, Gaussian noise | log-mel energies | |
Kim_GIST-HanwhaVision_task4a_5 | Kim2023 | 1.72 | 0.611 (0.598 - 0.623) | 0.846 (0.838 - 0.855) | time masking, time and frequency shift, filter-augment, mix-up, Gaussian noise | log-mel energies | |
Kim_GIST-HanwhaVision_task4a_6 | Kim2023 | 1.72 | 0.611 (0.590 - 0.628) | 0.841 (0.832 - 0.851) | time masking, time and frequency shift, filter-augment, mix-up, Gaussian noise | log-mel energies | |
Kim_GIST-HanwhaVision_task4a_7 | Kim2023 | 1.69 | 0.591 (0.574 - 0.604) | 0.844 (0.835 - 0.853) | time masking, time and frequency shift, filter-augment, mix-up, Gaussian noise | log-mel energies | |
Kim_GIST-HanwhaVision_task4a_8 | Kim2023 | 1.72 | 0.612 (0.599 - 0.626) | 0.841 (0.831 - 0.851) | time masking, time and frequency shift, filter-augment, mix-up, Gaussian noise | log-mel energies | |
Wenxin_TJU_task4a_1 | Wenxin2023 | 1.63 | 0.555 (0.543 - 0.566) | 0.837 (0.828 - 0.847) | FilterAugment mixup frameshift SpecAugment ICT SCT | log-mel energies | |
Wenxin_TJU_task4a_2 | Wenxin2023 | 1.66 | 0.570 (0.559 - 0.580) | 0.844 (0.836 - 0.854) | FilterAugment mixup frameshift SpecAugment ICT SCT | log-mel energies | |
Wenxin_TJU_task4a_3 | Wenxin2023 | 0.88 | 0.080 (0.071 - 0.088) | 0.815 (0.802 - 0.825) | FilterAugment mixup frameshift SpecAugment ICT SCT | log-mel energies | |
Wenxin_TJU_task4a_4 | Wenxin2023 | 0.90 | 0.081 (0.071 - 0.090) | 0.838 (0.828 - 0.849) | FilterAugment mixup frameshift SpecAugment ICT SCT | log-mel energies | |
Wenxin_TJU_task4a_5 | Wenxin2023 | 1.58 | 0.539 (0.528 - 0.549) | 0.816 (0.806 - 0.831) | FilterAugment mixup frameshift SpecAugment ICT SCT | log-mel energies | |
Wenxin_TJU_task4a_6 | Wenxin2023 | 1.61 | 0.546 (0.536 - 0.556) | 0.831 (0.823 - 0.842) | FilterAugment mixup frameshift SpecAugment ICT SCT | log-mel energies | |
Wenxin_TJU_task4a_7 | Wenxin2023 | 1.31 | 0.440 (0.429 - 0.454) | 0.686 (0.673 - 0.699) | FilterAugment mixup frameshift SpecAugment ICT SCT | log-mel energies | |
Wenxin_TJU_task4a_8 | Wenxin2023 | 0.75 | 0.059 (0.049 - 0.068) | 0.707 (0.694 - 0.723) | FilterAugment mixup frameshift SpecAugment ICT SCT | log-mel energies |
Machine learning characteristics
Rank | Code |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS 1 (Evaluation dataset) |
PSDS 2 (Evaluation dataset) |
Classifier | Semi-supervised approach | Post-processing |
Segmentation method |
Decision making |
---|---|---|---|---|---|---|---|---|---|---|
Baseline_task4a_1 | Turpault2023 | 1.00 | 0.327 (0.317 - 0.339) | 0.538 (0.515 - 0.566) | CRNN | Mean-teacher student | median filtering | |||
Baseline_task4a_2 | Turpault2023 | 1.52 | 0.510 (0.496 - 0.523) | 0.798 (0.782 - 0.811) | CRNN | Mean-teacher student | median filtering | |||
Li_USTC_task4a_1 | Li2023 | 1.54 | 0.539 (0.527 - 0.551) | 0.769 (0.758 - 0.778) | PaSST-SED | Mean-teacher student | median filtering | |||
Li_USTC_task4a_2 | Li2023 | 1.58 | 0.556 (0.544 - 0.569) | 0.781 (0.769 - 0.795) | PaSST-SED | Mean-teacher student, Pseudo-labelling | median filtering | |||
Li_USTC_task4a_3 | Li2023 | 1.54 | 0.546 (0.535 - 0.558) | 0.756 (0.745 - 0.769) | PaSST-SED | Mean-teacher student | median filtering | |||
Li_USTC_task4a_4 | Li2023 | 0.89 | 0.061 (0.050 - 0.070) | 0.852 (0.843 - 0.863) | PaSST-SED | Mean-teacher student | max filtering | |||
Li_USTC_task4a_5 | Li2023 | 1.52 | 0.531 (0.520 - 0.544) | 0.762 (0.751 - 0.773) | PaSST-SED | Mean-teacher student | median filtering | |||
Li_USTC_task4a_6 | Li2023 | 1.56 | 0.546 (0.529 - 0.562) | 0.783 (0.771 - 0.796) | PaSST-SED | Pseudo-labelling, Mean-teacher student | median filtering | |||
Li_USTC_task4a_7 | Li2023 | 1.20 | 0.404 (0.389 - 0.421) | 0.630 (0.612 - 0.648) | SKCRNN | Mean-teacher student | median filtering | |||
Liu_NSYSU_task4_1 | Liu2023 | 0.80 | 0.051 (0.042 - 0.060) | 0.779 (0.767 - 0.791) | CRNN | Mean-teacher student | median filtering (93ms) | average | ||
Liu_NSYSU_task4_2 | Liu2023 | 1.36 | 0.466 (0.455 - 0.480) | 0.701 (0.688 - 0.714) | CRNN | Mean-teacher student | median filtering (93ms) | average | ||
Liu_NSYSU_task4_3 | Liu2023 | 1.26 | 0.434 (0.420 - 0.448) | 0.646 (0.633 - 0.660) | CRNN | Mean-teacher student | median filtering (93ms) | |||
Liu_NSYSU_task4_4 | Liu2023 | 1.24 | 0.413 (0.394 - 0.438) | 0.655 (0.638 - 0.673) | CRNN | Mean-teacher student | median filtering (93ms) | |||
Liu_NSYSU_task4_5 | Liu2023 | 0.82 | 0.045 (0.035 - 0.053) | 0.806 (0.794 - 0.818) | CRNN | Mean-teacher student | median filtering (93ms) | average | ||
Liu_NSYSU_task4_6 | Liu2023 | 1.62 | 0.552 (0.540 - 0.563) | 0.838 (0.829 - 0.848) | CRNN | Mean-teacher student | median filtering (93ms) | average | ||
Liu_NSYSU_task4_7 | Liu2023 | 1.55 | 0.521 (0.510 - 0.531) | 0.813 (0.796 - 0.831) | CRNN | Mean-teacher student | median filtering (93ms) | |||
Liu_NSYSU_task4_8 | Liu2023 | 1.53 | 0.515 (0.488 - 0.536) | 0.805 (0.791 - 0.818) | CRNN | Mean-teacher student | median filtering (93ms) | |||
Lee_CAU_task4A_1 | Lee2023 | 1.24 | 0.425 (0.415 - 0.440) | 0.634 (0.618 - 0.648) | CRNN | Mean-teacher student | median filtering | attention layers | None | |
Lee_CAU_task4A_2 | Lee2023 | 0.79 | 0.104 (0.090 - 0.117) | 0.674 (0.661 - 0.690) | CRNN | Mean-teacher student | median filtering | attention layers | None | |
Cheimariotis_DUTH_task4a_1 | Cheimariotis2023 | 1.53 | 0.516 (0.504 - 0.529) | 0.796 (0.784 - 0.808) | CRNN-FDY | Mean-teacher student | median filtering (93ms) | |||
Cheimariotis_DUTH_task4a_2 | Cheimariotis2023 | 1.45 | 0.487 (0.475 - 0.502) | 0.759 (0.745 - 0.773) | CRNN-FDY | Mean-teacher student | median filtering (93ms) | |||
Chen_CHT_task4_1 | Chen2023b | 1.25 | 0.441 (0.403 - 0.468) | 0.620 (0.567 - 0.652) | CRNN | Mean-teacher student | median filtering | average | ||
Chen_CHT_task4_2 | Chen2023b | 1.58 | 0.563 (0.550 - 0.574) | 0.779 (0.768 - 0.792) | CRNN | Mean-teacher student | median filtering | average | ||
Chen_CHT_task4_3 | Chen2023b | 1.66 | 0.596 (0.585 - 0.606) | 0.810 (0.800 - 0.822) | CRNN | Mean-teacher student | median filtering | average | ||
Chen_CHT_task4_4 | Chen2023b | 1.66 | 0.590 (0.578 - 0.601) | 0.820 (0.810 - 0.831) | CRNN | Mean-teacher student | median filtering | average | ||
Xiao_FMSG_task4a_1 | Zhang2023 | 1.23 | 0.403 (0.392 - 0.417) | 0.660 (0.646 - 0.672) | FDY_CRNN | Mean-teacher student | classwise median filtering | |||
Xiao_FMSG_task4a_2 | Xiao2023 | 1.55 | 0.525 (0.516 - 0.538) | 0.808 (0.796 - 0.821) | FDY_CRNN | Mean-teacher student | classwise median filtering | |||
Xiao_FMSG_task4a_3 | Xiao2023 | 0.86 | 0.071 (0.062 - 0.080) | 0.807 (0.796 - 0.818) | FDY_CRNN | Mean-teacher student | classwise median filtering | |||
Xiao_FMSG_task4a_4 | Xiao2023 | 1.60 | 0.551 (0.543 - 0.562) | 0.813 (0.802 - 0.827) | FDY_CRNN | Mean-teacher student | classwise median filtering | |||
Xiao_FMSG_task4a_5 | Xiao2023 | 1.61 | 0.555 (0.545 - 0.567) | 0.821 (0.811 - 0.834) | FDY_CRNN | Mean-teacher student | classwise median filtering | average | ||
Xiao_FMSG_task4a_6 | Xiao2023 | 1.61 | 0.551 (0.541 - 0.561) | 0.829 (0.819 - 0.842) | FDY_CRNN | Mean-teacher student | classwise median filtering | average | ||
Xiao_FMSG_task4a_7 | Xiao2023 | 0.87 | 0.075 (0.066 - 0.084) | 0.811 (0.800 - 0.822) | FDY_CRNN | Mean-teacher student | classwise median filtering | average | ||
Xiao_FMSG_task4a_8 | Xiao2023 | 1.62 | 0.549 (0.540 - 0.560) | 0.834 (0.824 - 0.847) | FDY_CRNN | Mean-teacher student | classwise median filtering | average | ||
Guan_HIT_task4a_1 | Guan2023 | 1.57 | 0.536 (0.526 - 0.546) | 0.810 (0.800 - 0.822) | CRNN | Mean-teacher student | classwise median filtering | average | ||
Guan_HIT_task4a_2 | Guan2023 | 0.93 | 0.082 (0.074 - 0.090) | 0.862 (0.852 - 0.872) | CRNN | Mean-teacher student | classwise median filtering | average | ||
Guan_HIT_task4a_3 | Guan2023 | 1.55 | 0.526 (0.513 - 0.539) | 0.800 (0.788 - 0.813) | CRNN | Mean-teacher student | classwise median filtering | |||
Guan_HIT_task4a_4 | Guan2023 | 0.92 | 0.082 (0.073 - 0.091) | 0.855 (0.844 - 0.867) | CRNN | Mean-teacher student | classwise median filtering | |||
Guan_HIT_task4a_5 | Guan2023 | 1.40 | 0.488 (0.475 - 0.503) | 0.708 (0.696 - 0.720) | CRNN | Mean-teacher student | classwise median filtering | average | ||
Guan_HIT_task4a_6 | Guan2023 | 0.88 | 0.088 (0.080 - 0.096) | 0.797 (0.787 - 0.810) | CRNN | Mean-teacher student | classwise median filtering | average | ||
Wang_XiaoRice_task4a_1 | Wang2023 | 1.50 | 0.494 (0.477 - 0.510) | 0.801 (0.789 - 0.815) | GRU | Mean-teacher student | median filtering (320ms) | average | ||
Wang_XiaoRice_task4a_2 | Wang2023 | 1.52 | 0.497 (0.486 - 0.510) | 0.814 (0.803 - 0.828) | GRU | Mean-teacher student | median filtering (93ms) | |||
Wang_XiaoRice_task4a_3 | Wang2023 | 0.91 | 0.088 (0.076 - 0.098) | 0.835 (0.824 - 0.844) | DNN | Unsupervised data augmentation | median filtering (93ms) | average | ||
Zhang_IOA_task4_1 | Zhang2023 | 1.75 | 0.622 (0.613 - 0.634) | 0.857 (0.849 - 0.866) | CRNN,Transformer | Mean-teacher student,Pseudo-labelling | median filtering | average | ||
Zhang_IOA_task4_2 | Zhang2023 | 0.95 | 0.070 (0.060 - 0.080) | 0.903 (0.895 - 0.911) | CRNN,Transformer | Mean-teacher student,Pseudo-labelling | median filtering | average | ||
Zhang_IOA_task4_3 | Zhang2023 | 1.71 | 0.613 (0.603 - 0.625) | 0.828 (0.821 - 0.839) | CRNN,Transformer | Mean-teacher student,Pseudo-labelling | median filtering | average | ||
Zhang_IOA_task4_4 | Zhang2023 | 1.75 | 0.625 (0.615 - 0.637) | 0.855 (0.847 - 0.864) | CRNN,Transformer | Mean-teacher student,Pseudo-labelling | median filtering | average | ||
Zhang_IOA_task4_5 | Zhang2023 | 1.52 | 0.524 (0.513 - 0.537) | 0.774 (0.762 - 0.786) | CRNN | Mean-teacher student | median filtering | |||
Zhang_IOA_task4_6 | Zhang2023 | 1.60 | 0.562 (0.552 - 0.575) | 0.795 (0.786 - 0.805) | CRNN | Mean-teacher student, Pseudo-labelling | median filtering | |||
Zhang_IOA_task4_7 | Zhang2023 | 0.86 | 0.055 (0.048 - 0.064) | 0.830 (0.820 - 0.842) | CRNN,Transformer | Mean-teacher student,Pseudo-labelling | median filtering | |||
Wu_NCUT_task4a_1 | Wu2023 | 1.15 | 0.391 (0.379 - 0.405) | 0.596 (0.584 - 0.610) | RCRNN | Mean-teacher student | median filtering (93ms) | |||
Wu_NCUT_task4a_2 | Wu2023 | 1.53 | 0.519 (0.507 - 0.531) | 0.793 (0.783 - 0.806) | RCRNN | Mean-teacher student | median filtering (93ms) | average | ||
Wu_NCUT_task4a_3 | Wu2023 | 1.50 | 0.497 (0.486 - 0.509) | 0.793 (0.783 - 0.806) | RCRNN | Mean-teacher student | median filtering (93ms) | average | ||
Barahona_AUDIAS_task4a_1 | Barahona2023 | 1.06 | 0.351 (0.333 - 0.372) | 0.562 (0.532 - 0.587) | CRNN | Mean-teacher student | median filtering (450ms) | |||
Barahona_AUDIAS_task4a_2 | Barahona2023 | 1.12 | 0.380 (0.361 - 0.406) | 0.575 (0.553 - 0.594) | CRNN | Mean-teacher student | median filtering (class dependent) | |||
Barahona_AUDIAS_task4a_3 | Barahona2023 | 0.91 | 0.200 (0.164 - 0.225) | 0.646 (0.626 - 0.664) | Conformer | Mean-teacher student | median filtering (1344 ms) | |||
Barahona_AUDIAS_task4a_4 | Barahona2023 | 0.84 | 0.141 (0.124 - 0.155) | 0.673 (0.652 - 0.700) | Conformer | Mean-teacher student | median filtering (class-dependent) | |||
Barahona_AUDIAS_task4a_5 | Barahona2023 | 1.14 | 0.378 (0.365 - 0.392) | 0.604 (0.590 - 0.622) | CRNN | Mean-teacher student | median filtering (450ms) | averaging | ||
Barahona_AUDIAS_task4a_6 | Barahona2023 | 1.18 | 0.401 (0.390 - 0.414) | 0.612 (0.596 - 0.630) | CRNN | Mean-teacher student | median filtering (class-dependent) | averaging | ||
Barahona_AUDIAS_task4a_7 | Barahona2023 | 1.06 | 0.274 (0.262 - 0.287) | 0.684 (0.671 - 0.699) | Conformer | Mean-teacher student | median filtering (1344ms) | averaging | ||
Barahona_AUDIAS_task4a_8 | Barahona2023 | 1.00 | 0.213 (0.201 - 0.226) | 0.729 (0.710 - 0.752) | Conformer | Mean-teacher student | median filtering (class-dependent) | averaging | ||
Gan_NCUT_task4_1 | Gan2023 | 1.12 | 0.365 (0.353 - 0.377) | 0.603 (0.589 - 0.617) | CRNN | Mean-teacher student | median filtering (93ms) | |||
Gan_NCUT_task4_2 | Gan2023 | 1.52 | 0.511 (0.498 - 0.524) | 0.799 (0.785 - 0.813) | CRNN | Mean-teacher student | median filtering (93ms) | average | ||
Gan_NCUT_task4_3 | Gan2023 | 1.50 | 0.483 (0.467 - 0.498) | 0.816 (0.805 - 0.828) | CRNN | Mean-teacher student | median filtering (93ms) | average | ||
Liu_SRCN_task4a_1 | Chen2023a | 1.65 | 0.585 (0.572 - 0.598) | 0.817 (0.804 - 0.834) | CRNN,Transformer,ensemble | Mean-teacher student | median filtering | |||
Liu_SRCN_task4a_2 | Chen2023a | 1.40 | 0.380 (0.369 - 0.392) | 0.877 (0.867 - 0.885) | CRNN,Transformer,ensemble | Mean-teacher student | median filtering | |||
Liu_SRCN_task4a_3 | Chen2023a | 1.65 | 0.556 (0.544 - 0.569) | 0.861 (0.852 - 0.870) | CRNN,Transformer,ensemble | Mean-teacher student | median filtering | |||
Liu_SRCN_task4a_4 | Chen2023a | 1.25 | 0.412 (0.400 - 0.424) | 0.663 (0.652 - 0.676) | CRNN | Mean-teacher student | ||||
Liu_SRCN_task4a_5 | Chen2023a | 0.94 | 0.098 (0.086 - 0.108) | 0.851 (0.841 - 0.860) | CRNN,Transformer,ensemble | Mean-teacher student | time pool | |||
Kim_GIST-HanwhaVision_task4a_1 | Kim2023 | 1.35 | 0.459 (0.431 - 0.484) | 0.701 (0.681 - 0.720) | CRNN | Mean-teacher student | class-wise median filtering (20 ms for alarm, 44 ms for blender, 20ms for cat, 20 ms for dog, 20 ms for dishes, 268 ms for electric shaver, 244 ms for frying, 196 ms for running water, 20 ms for speech, 68 ms for vacuum cleaner) | |||
Kim_GIST-HanwhaVision_task4a_2 | Kim2023 | 1.68 | 0.591 (0.574 - 0.611) | 0.831 (0.823 - 0.841) | CRNN with pretrained BEATs | Mean-teacher student, Pseudo-labelling | class-wise median filtering (20 ms for alarm, 44 ms for blender, 20ms for cat, 20 ms for dog, 20 ms for dishes, 268 ms for electric shaver, 244 ms for frying, 196 ms for running water, 20 ms for speech, 68 ms for vacuum cleaner) | |||
Kim_GIST-HanwhaVision_task4a_3 | Kim2023 | 1.66 | 0.581 (0.553 - 0.600) | 0.835 (0.826 - 0.846) | CRNN with pretrained BEATs | Mean-teacher student, Pseudo-labelling | class-wise median filtering (28 ms for alarm, 44 ms for blender, 40ms for cat, 32 ms for dog, 32 ms for dishes, 88 ms for electric shaver, 148 ms for frying, 124 ms for running water, 28 ms for speech, 60 ms for vacuum cleaner) | |||
Kim_GIST-HanwhaVision_task4a_4 | Kim2023 | 1.63 | 0.576 (0.549 - 0.595) | 0.809 (0.797 - 0.821) | CRNN with pretrained BEATs, ensemble | Mean-teacher student, Pseudo-labelling | class-wise median filtering (20 ms for alarm, 44 ms for blender, 20ms for cat, 20 ms for dog, 20 ms for dishes, 268 ms for electric shaver, 244 ms for frying, 196 ms for running water, 20 ms for speech, 68 ms for vacuum cleaner) | Average | ||
Kim_GIST-HanwhaVision_task4a_5 | Kim2023 | 1.72 | 0.611 (0.598 - 0.623) | 0.846 (0.838 - 0.855) | CRNN with pretrained BEATs, ensemble | Mean-teacher student, Pseudo-labelling | class-wise median filtering (20 ms for alarm, 44 ms for blender, 20ms for cat, 20 ms for dog, 20 ms for dishes, 268 ms for electric shaver, 244 ms for frying, 196 ms for running water, 20 ms for speech, 68 ms for vacuum cleaner) | Average | ||
Kim_GIST-HanwhaVision_task4a_6 | Kim2023 | 1.72 | 0.611 (0.590 - 0.628) | 0.841 (0.832 - 0.851) | CRNN with pretrained BEATs, ensemble | Mean-teacher student, Pseudo-labelling | class-wise median filtering (20 ms for alarm, 44 ms for blender, 20ms for cat, 20 ms for dog, 20 ms for dishes, 268 ms for electric shaver, 244 ms for frying, 196 ms for running water, 20 ms for speech, 68 ms for vacuum cleaner) | Average | ||
Kim_GIST-HanwhaVision_task4a_7 | Kim2023 | 1.69 | 0.591 (0.574 - 0.604) | 0.844 (0.835 - 0.853) | CRNN with pretrained BEATs, ensemble | Mean-teacher student, Pseudo-labelling | class-wise median filtering (20 ms for alarm, 44 ms for blender, 20ms for cat, 20 ms for dog, 20 ms for dishes, 268 ms for electric shaver, 244 ms for frying, 196 ms for running water, 20 ms for speech, 68 ms for vacuum cleaner) | Average | ||
Kim_GIST-HanwhaVision_task4a_8 | Kim2023 | 1.72 | 0.612 (0.599 - 0.626) | 0.841 (0.831 - 0.851) | CRNN with pretrained BEATs, ensemble | Mean-teacher student, Pseudo-labelling | class-wise median filtering (20 ms for alarm, 44 ms for blender, 20ms for cat, 20 ms for dog, 20 ms for dishes, 268 ms for electric shaver, 244 ms for frying, 196 ms for running water, 20 ms for speech, 68 ms for vacuum cleaner) | Average | ||
Wenxin_TJU_task4a_1 | Wenxin2023 | 1.63 | 0.555 (0.543 - 0.566) | 0.837 (0.828 - 0.847) | FDYCRNN | Mutual mean teaching | median filtering (320ms 705ms 320ms 320ms 320ms 4295ms 3910ms 3141ms 320ms 1090ms) | attention layers | Average | |
Wenxin_TJU_task4a_2 | Wenxin2023 | 1.66 | 0.570 (0.559 - 0.580) | 0.844 (0.836 - 0.854) | FDYCRNN | Mutual mean teaching | median filtering (320ms 705ms 320ms 320ms 320ms 4295ms 3910ms 3141ms 320ms 1090ms) | attention layers | Average | |
Wenxin_TJU_task4a_3 | Wenxin2023 | 0.88 | 0.080 (0.071 - 0.088) | 0.815 (0.802 - 0.825) | FDYCRNN | Mutual mean teaching | median filtering (320ms 705ms 320ms 320ms 320ms 4295ms 3910ms 3141ms 320ms 1090ms) | attention layers | Average | |
Wenxin_TJU_task4a_4 | Wenxin2023 | 0.90 | 0.081 (0.071 - 0.090) | 0.838 (0.828 - 0.849) | FDYCRNN | Mutual mean teaching | median filtering (320ms 705ms 320ms 320ms 320ms 4295ms 3910ms 3141ms 320ms 1090ms) | attention layers | Average | |
Wenxin_TJU_task4a_5 | Wenxin2023 | 1.58 | 0.539 (0.528 - 0.549) | 0.816 (0.806 - 0.831) | FDYCRNN | Mutual mean teaching | median filtering (320ms 705ms 320ms 320ms 320ms 4295ms 3910ms 3141ms 320ms 1090ms) | attention layers | Average | |
Wenxin_TJU_task4a_6 | Wenxin2023 | 1.61 | 0.546 (0.536 - 0.556) | 0.831 (0.823 - 0.842) | FDYCRNN | Mutual mean teaching | median filtering (320ms 705ms 320ms 320ms 320ms 4295ms 3910ms 3141ms 320ms 1090ms) | attention layers | Average | |
Wenxin_TJU_task4a_7 | Wenxin2023 | 1.31 | 0.440 (0.429 - 0.454) | 0.686 (0.673 - 0.699) | FDYCRNN | Mutual mean teaching | median filtering (320ms 705ms 320ms 320ms 320ms 4295ms 3910ms 3141ms 320ms 1090ms) | attention layers | Average | |
Wenxin_TJU_task4a_8 | Wenxin2023 | 0.75 | 0.059 (0.049 - 0.068) | 0.707 (0.694 - 0.723) | FDYCRNN | Mutual mean teaching | median filtering (320ms 705ms 320ms 320ms 320ms 4295ms 3910ms 3141ms 320ms 1090ms) | attention layers | Average |
Complexity
Rank | Code |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS 1 (Evaluation dataset) |
PSDS 2 (Evaluation dataset) |
Model complexity |
MACS |
Ensemble subsystems |
Training time |
---|---|---|---|---|---|---|---|---|---|
Baseline_task4a_1 | Turpault2023 | 1.00 | 0.327 (0.317 - 0.339) | 0.538 (0.515 - 0.566) | 1112420 | 930902000 | 3h (A100 40Gb) | ||
Baseline_task4a_2 | Turpault2023 | 1.52 | 0.510 (0.496 - 0.523) | 0.798 (0.782 - 0.811) | 1227236 | 948793000 | 3h (A100 40Gb) | ||
Li_USTC_task4a_1 | Li2023 | 1.54 | 0.539 (0.527 - 0.551) | 0.769 (0.758 - 0.778) | 623304000 | 714906000000 | 6 | 18h (2 GTX 3090) | |
Li_USTC_task4a_2 | Li2023 | 1.58 | 0.556 (0.544 - 0.569) | 0.781 (0.769 - 0.795) | 415536000 | 476604000000 | 4 | 6h (2 GTX 3090) | |
Li_USTC_task4a_3 | Li2023 | 1.54 | 0.546 (0.535 - 0.558) | 0.756 (0.745 - 0.769) | 415536000 | 476604000000 | 4 | 6h (2 GTX 3090) | |
Li_USTC_task4a_4 | Li2023 | 0.89 | 0.061 (0.050 - 0.070) | 0.852 (0.843 - 0.863) | 415536000 | 476604000000 | 4 | 6h (2 GTX 3090) | |
Li_USTC_task4a_5 | Li2023 | 1.52 | 0.531 (0.520 - 0.544) | 0.762 (0.751 - 0.773) | 103884000 | 119151000000 | 6h (2 GTX 3090) | ||
Li_USTC_task4a_6 | Li2023 | 1.56 | 0.546 (0.529 - 0.562) | 0.783 (0.771 - 0.796) | 103884000 | 119151000000 | 3h (2 GTX 3090) | ||
Li_USTC_task4a_7 | Li2023 | 1.20 | 0.404 (0.389 - 0.421) | 0.630 (0.612 - 0.648) | 2684000 | 15481000000 | 2.5h (2 GTX 3090) | ||
Liu_NSYSU_task4_1 | Liu2023 | 0.80 | 0.051 (0.042 - 0.060) | 0.779 (0.767 - 0.791) | 132600000 | 5470000000 | 6 | 36h (1 RTX3060) | |
Liu_NSYSU_task4_2 | Liu2023 | 1.36 | 0.466 (0.455 - 0.480) | 0.701 (0.688 - 0.714) | 132600000 | 5470000000 | 6 | 36h (1 RTX3060) | |
Liu_NSYSU_task4_3 | Liu2023 | 1.26 | 0.434 (0.420 - 0.448) | 0.646 (0.633 - 0.660) | 6600000 | 4632000000 | 6h (1 RTX3060) | ||
Liu_NSYSU_task4_4 | Liu2023 | 1.24 | 0.413 (0.394 - 0.438) | 0.655 (0.638 - 0.673) | 22100000 | 911717000 | 6h (1 RTX3060) | ||
Liu_NSYSU_task4_5 | Liu2023 | 0.82 | 0.045 (0.035 - 0.053) | 0.806 (0.794 - 0.818) | 110500000 | 4558000000 | 5 | 30h (1 RTX3060) | |
Liu_NSYSU_task4_6 | Liu2023 | 1.62 | 0.552 (0.540 - 0.563) | 0.838 (0.829 - 0.848) | 110500000 | 4558000000 | 5 | 30h (1 RTX3060) | |
Liu_NSYSU_task4_7 | Liu2023 | 1.55 | 0.521 (0.510 - 0.531) | 0.813 (0.796 - 0.831) | 22100000 | 911717000 | 6h (1 RTX3060) | ||
Liu_NSYSU_task4_8 | Liu2023 | 1.53 | 0.515 (0.488 - 0.536) | 0.805 (0.791 - 0.818) | 22100000 | 911717000 | 6h (1 RTX3060) | ||
Lee_CAU_task4A_1 | Lee2023 | 1.24 | 0.425 (0.415 - 0.440) | 0.634 (0.618 - 0.648) | 6200000 | 331754000000 | None | 6h (1 Quadro RTX 8000) | |
Lee_CAU_task4A_2 | Lee2023 | 0.79 | 0.104 (0.090 - 0.117) | 0.674 (0.661 - 0.690) | 6600000 | 335673000000 | None | 5h (1 Quadro RTX 8000) | |
Cheimariotis_DUTH_task4a_1 | Cheimariotis2023 | 1.53 | 0.516 (0.504 - 0.529) | 0.796 (0.784 - 0.808) | 6600000 | 3497000000 | 8h (1 A6000) | ||
Cheimariotis_DUTH_task4a_2 | Cheimariotis2023 | 1.45 | 0.487 (0.475 - 0.502) | 0.759 (0.745 - 0.773) | 6600000 | 3497000000 | 8h (1 A6000) | ||
Chen_CHT_task4_1 | Chen2023b | 1.25 | 0.441 (0.403 - 0.468) | 0.620 (0.567 - 0.652) | 4920104 | 5707000000 | 13h (1 A100) | ||
Chen_CHT_task4_2 | Chen2023b | 1.58 | 0.563 (0.550 - 0.574) | 0.779 (0.768 - 0.792) | 6329384 | 5854000000 | 23h (1 A100) | ||
Chen_CHT_task4_3 | Chen2023b | 1.66 | 0.596 (0.585 - 0.606) | 0.810 (0.800 - 0.822) | 37976304 | 35124000000 | 6 | 138h (1 A100) | |
Chen_CHT_task4_4 | Chen2023b | 1.66 | 0.590 (0.578 - 0.601) | 0.820 (0.810 - 0.831) | 94940760 | 87814000000 | 15 | 345h (1 A100) | |
Xiao_FMSG_task4a_1 | Zhang2023 | 1.23 | 0.403 (0.392 - 0.417) | 0.660 (0.646 - 0.672) | 2770884 | 120350000 | 4h (1 RTX A5000) | ||
Xiao_FMSG_task4a_2 | Xiao2023 | 1.55 | 0.525 (0.516 - 0.538) | 0.808 (0.796 - 0.821) | 8832068 | 3726000000 | 5h (1 RTX A5000) | ||
Xiao_FMSG_task4a_3 | Xiao2023 | 0.86 | 0.071 (0.062 - 0.080) | 0.807 (0.796 - 0.818) | 4729412 | 1483000000 | 5h (1 RTX A5000) | ||
Xiao_FMSG_task4a_4 | Xiao2023 | 1.60 | 0.551 (0.543 - 0.562) | 0.813 (0.802 - 0.827) | 4171844 | 459620000 | 4.5h (1 RTX A5000) | ||
Xiao_FMSG_task4a_5 | Xiao2023 | 1.61 | 0.555 (0.545 - 0.567) | 0.821 (0.811 - 0.834) | 21629908 | 9322000000 | 5 | 25h (1 RTX A5000) | |
Xiao_FMSG_task4a_6 | Xiao2023 | 1.61 | 0.551 (0.541 - 0.561) | 0.829 (0.819 - 0.842) | 51596456 | 12125000000 | 10 | 50h (1 RTX A5000) | |
Xiao_FMSG_task4a_7 | Xiao2023 | 0.87 | 0.075 (0.066 - 0.084) | 0.811 (0.800 - 0.822) | 16687376 | 5917000000 | 4 | 20h (1 A4000) | |
Xiao_FMSG_task4a_8 | Xiao2023 | 1.62 | 0.549 (0.540 - 0.560) | 0.834 (0.824 - 0.847) | 60916904 | 18646000000 | 10 | 50h (1 RTX A5000) | |
Guan_HIT_task4a_1 | Guan2023 | 1.57 | 0.536 (0.526 - 0.546) | 0.810 (0.800 - 0.822) | 17840496 | 88200000000 | 4 | 6h (1 GTX 3090) | |
Guan_HIT_task4a_2 | Guan2023 | 0.93 | 0.082 (0.074 - 0.090) | 0.862 (0.852 - 0.872) | 35680992 | 88200000000 | 8 | 6h (1 GTX 3090) | |
Guan_HIT_task4a_3 | Guan2023 | 1.55 | 0.526 (0.513 - 0.539) | 0.800 (0.788 - 0.813) | 4460124 | 88200000000 | 6h (1 GTX 3090) | ||
Guan_HIT_task4a_4 | Guan2023 | 0.92 | 0.082 (0.073 - 0.091) | 0.855 (0.844 - 0.867) | 4460124 | 88200000000 | 6h (1 GTX 3090) | ||
Guan_HIT_task4a_5 | Guan2023 | 1.40 | 0.488 (0.475 - 0.503) | 0.708 (0.696 - 0.720) | 49061364 | 88200000000 | 11 | 6h (1 GTX 3090) | |
Guan_HIT_task4a_6 | Guan2023 | 0.88 | 0.088 (0.080 - 0.096) | 0.797 (0.787 - 0.810) | 49061364 | 88200000000 | 11 | 6h (1 GTX 3090) | |
Wang_XiaoRice_task4a_1 | Wang2023 | 1.50 | 0.494 (0.477 - 0.510) | 0.801 (0.789 - 0.815) | 671000 | 105200000 | 20min (1 V100) | ||
Wang_XiaoRice_task4a_2 | Wang2023 | 1.52 | 0.497 (0.486 - 0.510) | 0.814 (0.803 - 0.828) | 3873000 | 606000000 | 6 | 30 min (1 GTX 1080 Ti) | |
Wang_XiaoRice_task4a_3 | Wang2023 | 0.91 | 0.088 (0.076 - 0.098) | 0.835 (0.824 - 0.844) | 1979000 | 350000000 | 6 | 30min (1 V100) | |
Zhang_IOA_task4_1 | Zhang2023 | 1.75 | 0.622 (0.613 - 0.634) | 0.857 (0.849 - 0.866) | 240652180 | 10518000000000 | 25 | 30h (4 Telsa A100) | |
Zhang_IOA_task4_2 | Zhang2023 | 0.95 | 0.070 (0.060 - 0.080) | 0.903 (0.895 - 0.911) | 64870560 | 2480000000000 | 16 | 20h (4 Telsa A100) | |
Zhang_IOA_task4_3 | Zhang2023 | 1.71 | 0.613 (0.603 - 0.625) | 0.828 (0.821 - 0.839) | 481304360 | 21036000000000 | 50 | 60h (4 Telsa A100) | |
Zhang_IOA_task4_4 | Zhang2023 | 1.75 | 0.625 (0.615 - 0.637) | 0.855 (0.847 - 0.864) | 240652180 | 10518000000000 | 25 | 30h (4 Telsa A100) | |
Zhang_IOA_task4_5 | Zhang2023 | 1.52 | 0.524 (0.513 - 0.537) | 0.774 (0.762 - 0.786) | 11325746 | 460000000000 | 6h (1 Telsa A100) | ||
Zhang_IOA_task4_6 | Zhang2023 | 1.60 | 0.562 (0.552 - 0.575) | 0.795 (0.786 - 0.805) | 9160520 | 368000000000 | 6h (1 Telsa A100) | ||
Zhang_IOA_task4_7 | Zhang2023 | 0.86 | 0.055 (0.048 - 0.064) | 0.830 (0.820 - 0.842) | 4670390 | 182000000000 | 6h (1 Telsa A100) | ||
Wu_NCUT_task4a_1 | Wu2023 | 1.15 | 0.391 (0.379 - 0.405) | 0.596 (0.584 - 0.610) | 17307000 | 98754000000 | 8h (1 GTX 4090) | ||
Wu_NCUT_task4a_2 | Wu2023 | 1.53 | 0.519 (0.507 - 0.531) | 0.793 (0.783 - 0.806) | 66637000 | 487389000000 | 4 | 29h (1 GTX 4090) | |
Wu_NCUT_task4a_3 | Wu2023 | 1.50 | 0.497 (0.486 - 0.509) | 0.793 (0.783 - 0.806) | 66637000 | 492389000000 | 4 | 29h (1 GTX 4090) | |
Barahona_AUDIAS_task4a_1 | Barahona2023 | 1.06 | 0.351 (0.333 - 0.372) | 0.562 (0.532 - 0.587) | 1112420 | 1824000000 | 14h (1 GeForce RTX 2080 Ti) | ||
Barahona_AUDIAS_task4a_2 | Barahona2023 | 1.12 | 0.380 (0.361 - 0.406) | 0.575 (0.553 - 0.594) | 1112420 | 1824000000 | 14h (1 GeForce RTX 2080 Ti) | ||
Barahona_AUDIAS_task4a_3 | Barahona2023 | 0.91 | 0.200 (0.164 - 0.225) | 0.646 (0.626 - 0.664) | 12637170 | 633627000 | 21h (1 GeForce RTX 2080 Ti) | ||
Barahona_AUDIAS_task4a_4 | Barahona2023 | 0.84 | 0.141 (0.124 - 0.155) | 0.673 (0.652 - 0.700) | 12637170 | 633627000 | 21h (1 GeForce RTX 2080 Ti) | ||
Barahona_AUDIAS_task4a_5 | Barahona2023 | 1.14 | 0.378 (0.365 - 0.392) | 0.604 (0.590 - 0.622) | 4449680 | 5432000000 | 4 | 56h (1 GeForce RTX 2080 Ti) | |
Barahona_AUDIAS_task4a_6 | Barahona2023 | 1.18 | 0.401 (0.390 - 0.414) | 0.612 (0.596 - 0.630) | 4449680 | 5432000000 | 4 | 56h (1 GeForce RTX 2080 Ti) | |
Barahona_AUDIAS_task4a_7 | Barahona2023 | 1.06 | 0.274 (0.262 - 0.287) | 0.684 (0.671 - 0.699) | 63185850 | 4426000000 | 5 | 133h (1 GeForce RTX 2080 Ti) | |
Barahona_AUDIAS_task4a_8 | Barahona2023 | 1.00 | 0.213 (0.201 - 0.226) | 0.729 (0.710 - 0.752) | 63185850 | 5432000000 | 5 | 133h (1 GeForce RTX 2080 Ti) | |
Gan_NCUT_task4_1 | Gan2023 | 1.12 | 0.365 (0.353 - 0.377) | 0.603 (0.589 - 0.617) | 11200000 | 911727000 | 7h (1 RTX 3090) | ||
Gan_NCUT_task4_2 | Gan2023 | 1.52 | 0.511 (0.498 - 0.524) | 0.799 (0.785 - 0.813) | 132500000 | 56778000000 | 6 | 40h (1 RTX 3090) | |
Gan_NCUT_task4_3 | Gan2023 | 1.50 | 0.483 (0.467 - 0.498) | 0.816 (0.805 - 0.828) | 452500000 | 138254000000 | 10 | 44h (1 RTX 3090) | |
Liu_SRCN_task4a_1 | Chen2023a | 1.65 | 0.585 (0.572 - 0.598) | 0.817 (0.804 - 0.834) | 1804800000 | 765529000000 | 26 | 92h (NVIDIA A100-PCIE-40GB) | |
Liu_SRCN_task4a_2 | Chen2023a | 1.40 | 0.380 (0.369 - 0.392) | 0.877 (0.867 - 0.885) | 1057200000 | 578115000000 | 20 | 87h (NVIDIA A100-PCIE-40GB) | |
Liu_SRCN_task4a_3 | Chen2023a | 1.65 | 0.556 (0.544 - 0.569) | 0.861 (0.852 - 0.870) | 1872000000 | 1011000000000 | 33 | 98h (NVIDIA A100-PCIE-40GB) | |
Liu_SRCN_task4a_4 | Chen2023a | 1.25 | 0.412 (0.400 - 0.424) | 0.663 (0.652 - 0.676) | 5300000 | 896000000 | 15h (NVIDIA A100-PCIE-40GB) | ||
Liu_SRCN_task4a_5 | Chen2023a | 0.94 | 0.098 (0.086 - 0.108) | 0.851 (0.841 - 0.860) | 1057200000 | 578115000000 | 20 | 87h (NVIDIA A100-PCIE-40GB) | |
Kim_GIST-HanwhaVision_task4a_1 | Kim2023 | 1.35 | 0.459 (0.431 - 0.484) | 0.701 (0.681 - 0.720) | 4542556 | 7234000000 | 29h (3 RTX A6000) | ||
Kim_GIST-HanwhaVision_task4a_2 | Kim2023 | 1.68 | 0.591 (0.574 - 0.611) | 0.831 (0.823 - 0.841) | 4804956 | 7300000000 | 14h 36m (4 RTX A6000) | ||
Kim_GIST-HanwhaVision_task4a_3 | Kim2023 | 1.66 | 0.581 (0.553 - 0.600) | 0.835 (0.826 - 0.846) | 4804956 | 7300000000 | 15h 31m (4 RTX A6000) | ||
Kim_GIST-HanwhaVision_task4a_4 | Kim2023 | 1.63 | 0.576 (0.549 - 0.595) | 0.809 (0.797 - 0.821) | 4804956 | 7300000000 | 46 | 12h (4 RTX A6000) | |
Kim_GIST-HanwhaVision_task4a_5 | Kim2023 | 1.72 | 0.611 (0.598 - 0.623) | 0.846 (0.838 - 0.855) | 9609912 | 335800000000 | 46 | 36h (4 RTX A6000) | |
Kim_GIST-HanwhaVision_task4a_6 | Kim2023 | 1.72 | 0.611 (0.590 - 0.628) | 0.841 (0.832 - 0.851) | 9609912 | 116800000000 | 46 | 36h (4 RTX A6000) | |
Kim_GIST-HanwhaVision_task4a_7 | Kim2023 | 1.69 | 0.591 (0.574 - 0.604) | 0.844 (0.835 - 0.853) | 9609912 | 116800000000 | 46 | 36h (4 RTX A6000) | |
Kim_GIST-HanwhaVision_task4a_8 | Kim2023 | 1.72 | 0.612 (0.599 - 0.626) | 0.841 (0.831 - 0.851) | 9609912 | 116800000000 | 46 | 36h (4 RTX A6000) | |
Wenxin_TJU_task4a_1 | Wenxin2023 | 1.63 | 0.555 (0.543 - 0.566) | 0.837 (0.828 - 0.847) | 942000000 | 20320000000 | 20 | 1d 7h * 20 (1 Tesla V100) | |
Wenxin_TJU_task4a_2 | Wenxin2023 | 1.66 | 0.570 (0.559 - 0.580) | 0.844 (0.836 - 0.854) | 471000000 | 10160000000 | 10 | 1d 7h * 10 (1 Tesla V100) | |
Wenxin_TJU_task4a_3 | Wenxin2023 | 0.88 | 0.080 (0.071 - 0.088) | 0.815 (0.802 - 0.825) | 847800000 | 18288000000 | 18 | 1d 7h * 18 (1 Tesla V100) | |
Wenxin_TJU_task4a_4 | Wenxin2023 | 0.90 | 0.081 (0.071 - 0.090) | 0.838 (0.828 - 0.849) | 942000000 | 20320000000 | 20 | 1d 7h * 20 (1 Tesla V100) | |
Wenxin_TJU_task4a_5 | Wenxin2023 | 1.58 | 0.539 (0.528 - 0.549) | 0.816 (0.806 - 0.831) | 47700000 | 1045000000 | 1d 7h (1 Tesla V100) | ||
Wenxin_TJU_task4a_6 | Wenxin2023 | 1.61 | 0.546 (0.536 - 0.556) | 0.831 (0.823 - 0.842) | 47700000 | 1045000000 | 1d 7h (1 Tesla V100) | ||
Wenxin_TJU_task4a_7 | Wenxin2023 | 1.31 | 0.440 (0.429 - 0.454) | 0.686 (0.673 - 0.699) | 44200000 | 911727000 | 1d 7h (1 Tesla V100) | ||
Wenxin_TJU_task4a_8 | Wenxin2023 | 0.75 | 0.059 (0.049 - 0.068) | 0.707 (0.694 - 0.723) | 44200000 | 911727000 | 1d 7h (1 Tesla V100) |
Technical reports
OPTIMIZING MULTI-RESOLUTION CONFORMER AND CRNN MODELS FOR DIFFERENT PSDS SCENARIOS IN DCASE CHALLENGE 2023 TASK 4A
Barahona, Sara and de Benito-Gorron, Diego and Segovia, Sergio and Ramos, Daniel and Toledano, Doroteo
Universidad Autónoma de Madrid, Madrid, Spain
Barahona_AUDIAS_task4a_1 Barahona_AUDIAS_task4a_2 Barahona_AUDIAS_task4a_3 Barahona_AUDIAS_task4a_4 Barahona_AUDIAS_task4a_5 Barahona_AUDIAS_task4a_6 Barahona_AUDIAS_task4a_7 Barahona_AUDIAS_task4a_8
OPTIMIZING MULTI-RESOLUTION CONFORMER AND CRNN MODELS FOR DIFFERENT PSDS SCENARIOS IN DCASE CHALLENGE 2023 TASK 4A
Barahona, Sara and de Benito-Gorron, Diego and Segovia, Sergio and Ramos, Daniel and Toledano, Doroteo
Universidad Autónoma de Madrid, Madrid, Spain
Abstract
In this technical report we describe our submission to DCASE 2023 Task 4A: Sound Event Detection with Weak Labels and Synthetic Soundscapes. Considering that the different scenarios proposed for the Polyphonic Sound Event Score (PSDS) highlight diverse properties of a Sound Event Detection (SED) system, we have employed two different architectures for optimizing each scenario. Whereas we exploit the temporal benefits of Convolution Recurrent Neural Networks (CRNNs) for maximizing the PSDS1, we employ a Conformer network for improving sound events classification and therefore enhancing PSDS2. Additionally, we follow the multi-resolution approach successfully employed in previous DCASE editions to take advantage of the temporal and spectral disparities among the different sound event categories.
System characteristics
SOUND EVENT DETECTION OF DOMESTIC ACTIVITIES USING FREQUENCY DYNAMIC CONVOLUTION AND BEATS EMBEDDINGS
Cheimariotis, Grigorios-Aris and Mitianoudis, Nikolaos
Democritus University of Thrace, Xanthi, Greece
Abstract
This technical report describes one submission for Dcase2023 Task 4a “Sound event detection of domestic activities”. The methodologies proposed are based on the baseline system, which is provided by the organizers, and consist mainly of feature extraction by passing spectrograms through frequency dynamic convolution network, concatenation of these features with BEATS embeddings, use of BiGRU for sequence modelling. Also, a mean-teacher model is employed. The results for the submissions, when using audioset real strong-labelled data are: PSDS1 0.496 PSDS2 0.788 and when the aforementioned data subset is not used are: PSDS1 0.516 PSDS2 0.781.
System characteristics
DCASE 2023 CHALLENGE TASK4 TECHNICAL REPORT
Chen, Minjun and Jin, Yongbin and Shao, Jun and Liu, Yangyang and Peng, Bo and Chen, Jie
Samsung Research China-Nanjing, Nanjing, China
Liu_SRCN_task4a_1 Liu_SRCN_task4a_2 Liu_SRCN_task4a_3 Liu_SRCN_task4a_4 Liu_SRCN_task4a_5
DCASE 2023 CHALLENGE TASK4 TECHNICAL REPORT
Chen, Minjun and Jin, Yongbin and Shao, Jun and Liu, Yangyang and Peng, Bo and Chen, Jie
Samsung Research China-Nanjing, Nanjing, China
Abstract
We describe our submitted systems for DCASE2023 Task4 in this technical report: Sound Event Detection with Weak Labels and Synthetic Soundscapes (Subtask A), and Sound Event Detection with Soft Labels (Subtask B). We focus on construct a CRNN model, which fuses the embedding extracted by the BEATs or AST pre-trained model and use the frequency dynamic convolution(FDY-CRNN) and channel-wise selective kernel attention (SKA) for having adaptive receptive field. To get multiple models of different architectures for making an ensemble, we fine-tune multiple BEATs model on the SED dataset also. In order to make use of the weak labeled and unlabeled subset of DESED dataset further, we pseudo labels these subsets by a multiple iterative of self-training. We also use a small part of audio files from the Audioset dataset, and this part of data following the same self-training procedure. We train these models using two different settings, one setting for optimizing PSDS1 score, and the other for optimizing PSDS2 score. Our proposed systems achieve poly-phonic sound event detection scores (PSDS-scores) of 0.570 (PSDS-scenario1) and 0.889 (PSDS-scenario2) respectively on development dataset of sub-task A, and macro-average F1 score with optimum threshold per class (F1MO) 49.70 on development dataset of subtask B.
System characteristics
SOUND EVENT DETECTION SYSTEM USING PRE-TRAINED MODEL FOR DCASE 2023 TASK 4
Chen, Wei-Yu and Lu, Chung-Li and Chuang, Hsiang-Feng and Cheng, Yu-Han Cheng and Chan, Bo-Cheng
Chunghwa Telecom Laboratories, Taiwan
Chen_CHT_task4a_1 Chen_CHT_task4a_2 Chen_CHT_task4a_3 Chen_CHT_task4a_4
SOUND EVENT DETECTION SYSTEM USING PRE-TRAINED MODEL FOR DCASE 2023 TASK 4
Chen, Wei-Yu and Lu, Chung-Li and Chuang, Hsiang-Feng and Cheng, Yu-Han Cheng and Chan, Bo-Cheng
Chunghwa Telecom Laboratories, Taiwan
Abstract
In this technical report, we briefly describe the system we designed for Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge Task4: Sound Event Detection with Weak Labels and Synthetic Soundscapes. Our best single system combines the embedding obtained by VGGSK and BEATs, using GRU to classify sound events for each frame. Thresholding and smoothing are utilized during the post-processing stage. The mean teacher method is applied for semi-supervised learning with the EMA strategy to update parameters of the teacher model. To utilize unlabeled data, pseudo label is generated by the student model. In the process of data augmentation, we utilize techniques such as mix-up, Gaussian noise and embedding masking. The submitted single system trained with extra data achieves the PSDS1 of 0.529 and the PSDS2 of 0.78 on the validation set.
System characteristics
SEMI-SUPERVISED SOUND EVENT DETECTION BASED ON PRETRAINED MODELS FOR DCASE 2023 TASK 4A
Gan, Yanggang and Qiao, Ziling and Wu, Juan and Cai, Xichang and Wu, Menglong
Universidad Autónoma de Madrid, Madrid, Spain
Gan_NCUT_task4_1 Gan_NCUT_task4_2 Gan_NCUT_task4_3
SEMI-SUPERVISED SOUND EVENT DETECTION BASED ON PRETRAINED MODELS FOR DCASE 2023 TASK 4A
Gan, Yanggang and Qiao, Ziling and Wu, Juan and Cai, Xichang and Wu, Menglong
Universidad Autónoma de Madrid, Madrid, Spain
Abstract
In this technical report, we present our submission system for DCASE 2023 Task4A: Sound Event Detection with Weak Labels and Synthetic Soundscapes. The proposed system is based on mean teacher framework of semi-supervised learning,selective kernel multi-scale convolutional network and frequency dynamic convolutional network. We extract the frame embeddings of the pre-trained model BEAT, and use adaptive average pooling to unify the embeddings to a fixed dimension, and finally fuse them with the features extracted by the convolutional layer of the SED model in the channel dimension. Our systems finally achieve the PSDS-scenario1 of 52.1% and PSDS-scenario2 of 82.5% on the validation set.
System characteristics
SEMI-SUPERVISED SOUND EVENT DETECTION SYSTEM FOR DCASE 2023 TASK 4
Yadong Guan, Qijie Shang
Harbin Institute of Technology, Harbin, China
Guan_HIT_task4a_1 Guan_HIT_task4a_2 Guan_HIT_task4a_3 Guan_HIT_task4a_4 Guan_HIT_task4a_5 Guan_HIT_task4a_6
SEMI-SUPERVISED SOUND EVENT DETECTION SYSTEM FOR DCASE 2023 TASK 4
Yadong Guan, Qijie Shang
Harbin Institute of Technology, Harbin, China
Abstract
In this report, we describe our submissions for the task 4 of Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge: Sound Event Detection in Domestic Environments. Our methods are mainly based on Convolutional Recurrent Neural Network. We propose to utilize sound activity detection (SAD) as an auxiliary task for sound event detection and use a multi-task learning approach to train the two tasks simultaneously, thus improving the model generalization performance. Moreover, we proposed a new local weak prediction to improve the PSDS2 index. To prevent overfitting, we adopt data augmentation using hard mixup, pitch shift, and time shift. Besides, we utilize external data and a pretrained model named Beats to further improve performance, and try an ensemble of multiple subsystems to enhance the generalization capability of our system. Our final systems achieve a PSDS1/PSDS2 score of 0.523/0.890 on development dataset.
System characteristics
SEMI-SUPERVISED LEARNING-BASED SOUND EVENT DETECTION USING FREQUENCY DYNAMIC CONVOLUTION WITH LARGE KERNEL ATTENTION FOR DCASE CHALLENGE 2023 TASK 4
Kim, Ji Won1 and Son, Sang Won1 and Song, Yoonah1 and Kim, Hong Kook1,2 and Song, Il Hoon3 and Lim, Jeong Eun3
1AI Graduate School, Gwangju Institute of Science and Technology, Gwangju, Korea 2School of EECS, Gwangju Institute of Science and Technology, Gwangju, Korea 3AI Lab. R&D Center, Hanwha Visionn Seongnam-si, Gyeonggi-do, Korea
Kim_GIST-HanwhaVision_task4a_1 Kim_GIST-HanwhaVision_task4a_2 Kim_GIST-HanwhaVision_task4a_3 Kim_GIST-HanwhaVision_task4a_4 Kim_GIST-HanwhaVision_task4a_5 Kim_GIST-HanwhaVision_task4a_6 Kim_GIST-HanwhaVision_task4a_7 Kim_GIST-HanwhaVision_task4a_8
SEMI-SUPERVISED LEARNING-BASED SOUND EVENT DETECTION USING FREQUENCY DYNAMIC CONVOLUTION WITH LARGE KERNEL ATTENTION FOR DCASE CHALLENGE 2023 TASK 4
Kim, Ji Won1 and Son, Sang Won1 and Song, Yoonah1 and Kim, Hong Kook1,2 and Song, Il Hoon3 and Lim, Jeong Eun3
1AI Graduate School, Gwangju Institute of Science and Technology, Gwangju, Korea 2School of EECS, Gwangju Institute of Science and Technology, Gwangju, Korea 3AI Lab. R&D Center, Hanwha Visionn Seongnam-si, Gyeonggi-do, Korea
Abstract
In this technical report, we present our submission system for DCASE 2023 Task4A: Sound Event Detection with Weak Labels and Synthetic Soundscapes. The proposed system is based on mean teacher framework of semi-supervised learning,selective kernel multi-scale convolutional network and frequency dynamic convolutional network. We extract the frame embeddings of the pre-trained model BEAT, and use adaptive average pooling to unify the embeddings to a fixed dimension, and finally fuse them with the features extracted by the convolutional layer of the SED model in the channel dimension. Our systems finally achieve the PSDS-scenario1 of 52.1% and PSDS-scenario2 of 82.5% on the validation set.
System characteristics
SOUND EVENT DETECTION USING CONVOLUTION ATTENTION MODULE FOR DCASE 2023 CHALLENGE TASK4A
Lee, Sumi and Kim, Narin and Lee, Juhyun and Hwang, Chaewon and Jang, Sojung and Kwak, Il-Youp
Chung-Ang University, Department of Applied Statistics, Seoul, South Korea
Lee_CAU_task4A_1 Lee_CAU_task4A_2
SOUND EVENT DETECTION USING CONVOLUTION ATTENTION MODULE FOR DCASE 2023 CHALLENGE TASK4A
Lee, Sumi and Kim, Narin and Lee, Juhyun and Hwang, Chaewon and Jang, Sojung and Kwak, Il-Youp
Chung-Ang University, Department of Applied Statistics, Seoul, South Korea
Abstract
In this technical report, we propose sound event detection models based on CRNN for DCASE 2023 challenge task4A. DCASE task4 evaluates the model with two main metrics. The two metrics are PSDS1 and PSDS2, which have different characteristics, making it difficult to dramatically raise two metrics with one model. Therefore, we have developed two models with different directions. The first model is the Flcam-CRNN, which aimed at PSDS1. Flcam is an attention module created by reflecting the features of 2D audio features in the time-frequency domain. The second model is Mha-CRNN, which aimed at PSDS2. SED data has the characteristic of containing several sounds about a space. Therefore, multi-head attention was used to extract features from various perspectives.
System characteristics
LI USTC TEAM’S SUBMISSION FOR DCASE 2023 CHALLENGE TASK4A
Li, Kang and Cai, Pengfei and Song, Yan
University of Science and Technology of China, Hefei, China
Li_USTC_task4a_1 Li_USTC_task4a_2 Li_USTC_task4a_3 Li_USTC_task4a_4 Li_USTC_task4a_5 Li_USTC_task4a_6 Li_USTC_task4a_7
LI USTC TEAM’S SUBMISSION FOR DCASE 2023 CHALLENGE TASK4A
Li, Kang and Cai, Pengfei and Song, Yan
University of Science and Technology of China, Hefei, China
Abstract
In this technical report, we present our submissions for DCASE 2023 challenge task4a. We mainly study how to fine-tune patchout fast spectrogram transformer (PaSST) for sound event detection task (PaSST-SED). Firstly, we fine-tune PaSST with weakly-labeled DESED dataset. Task-aware fine-tuning (TAFT) and self-distillated mean teacher (SdMT) are used as fine-tuning strategies, TAFT helps exploit both local and semantic information from PaSST and SdMT helps train a robust model with soft knowledge distillation. Secondly, we fine-tune PaSST with pseudo-labeled DESED with pseudo labels from DCASE2022 rank1, mix-up is used to mix the audios with true or pseudo labels. Besides, when test with PaSST-SED model, slide window clipping (SWC) is used to compensate the temporal resolution loss of PaSST feature. We also evaluate post-processing methods including median-filtering and max-filtering. Experiments on the DCASE2023 task4a validation dataset demonstrate the effectiveness of the techniques used in our systems. Specifically, our systems achieve the best PSDS1/PSDS2 of 0.5624/0.8990.
System characteristics
CHT+NSYSU SOUND EVENT DETECTION SYSTEM WITH PRETRAINED EMBEDDINGS EXTRACTED FROM BEATS MODEL FOR DCASE 2023 TASK 4
Liu, Chia-Chuan1 and Kuo, Tzu-Hao1 and Chen, Chia-Ping1 and Lu, Chung-Li2 and Chan, Bo-Cheng2 and Cheng, Yu-Han2 and Chuang, Hsiang-Feng2
1National Sun Yat-Sen University, Taiwan 2Chunghwa Telecom Laboratories, Taiwan
Liu_NSYSU_task4_1 Liu_NSYSU_task4_2 Liu_NSYSU_task4_3 Liu_NSYSU_task4_4 Liu_NSYSU_task4_5 Liu_NSYSU_task4_6 Liu_NSYSU_task4_7 Liu_NSYSU_task4_8
CHT+NSYSU SOUND EVENT DETECTION SYSTEM WITH PRETRAINED EMBEDDINGS EXTRACTED FROM BEATS MODEL FOR DCASE 2023 TASK 4
Liu, Chia-Chuan1 and Kuo, Tzu-Hao1 and Chen, Chia-Ping1 and Lu, Chung-Li2 and Chan, Bo-Cheng2 and Cheng, Yu-Han2 and Chuang, Hsiang-Feng2
1National Sun Yat-Sen University, Taiwan 2Chunghwa Telecom Laboratories, Taiwan
Abstract
In this technical report, we describe our submission system for DCASE 2023 Task4: sound event detection in domestic environments. We propose FDY CRNN systems using BEATs embeddings. The system adapted late-fusion to concate the feature maps from Frequency Dynamic Convolution and the frame-level embeddings from BEATs. After that, a classification layer produces the prediction from the late-fusion features. The system is trained by the mean teacher framework. We utilize Asymmetric Focal Loss as the supervised loss to alleviate the imbalance between positive and negative samples. Furthermore, we apply two-stage mean teacher training to utilize training data adequateately. Compared to PSDS-scenario 1 of 50% and PSDS-scenario 2 of 76.2% of the baseline system using BEATs embeddings. Our FDY CRNN system achieves 50.1% and 79.8%, respectively. The ensemble of the FDY CRNN system further improves the PSDS-scenario 1 to 52.5% and the PSDS-scenario 2 to 80.4%.
System characteristics
PEPE: PLAIN EFFICIENT PRETRAINED EMBEDDINGS FOR SOUND EVENT DETECTION
Wang, Yongqing and Dinkel, Heinrich and Yan, Zhiyong and Zhang, Junbo and Wang, Yujun
Xiaomi Corporation, Beijing, China
Wang_XiaoRice_task4a_1 Wang_XiaoRice_task4a_2 Wang_XiaoRice_task4a_3
PEPE: PLAIN EFFICIENT PRETRAINED EMBEDDINGS FOR SOUND EVENT DETECTION
Wang, Yongqing and Dinkel, Heinrich and Yan, Zhiyong and Zhang, Junbo and Wang, Yujun
Xiaomi Corporation, Beijing, China
Abstract
This paper is a system description of the XiaoRice team submission to the DCASE 2023 Task 4 challenge. In light of the increasing availability of pretrained audio embedding models, our research addresses the need for efficient utilization of these resources, taking into account their environmental impact. Our method named plain efficient pretrained (audio) embeddings (PEPE) integrates a linear classifier or a bidirectional gated recurrent network (BiGRU) with those embeddings while prioritizing energy efficiency, training speed and minimizing carbon emissions. By employing a streamlined approach, we demonstrate that a linear classifier with 52K parameters surpasses the challenge baseline for PSDS-2 scores, highlighting the potential of eco-friendly solutions in achieving superior performance. We achieve a polyphonic sound detection score (PSDS)-1 score of 53.44 via a 6-way ensemble and a PSDS-2 score of 88.60 with a simple linear classifier using PEPE. Through our work, we aim to emphasize the adoption of environmentally conscious practices in the field.
System characteristics
SEMI-SUPERVISED SOUND EVENT DETECTION SYSTEM FOR DCASE 2023 TASK4A
Duo, Wenxin1 Fang, Xiang2 and Li, Jie2
1Tianjin University, School of Electrical and Information Engineering, Tianjin, China 2China Telecom Corporation Ltd., Data&AI Technology Company, Beijing, China,
Wenxin_TJU_task4a_1 Wenxin_TJU_task4a_2 Wenxin_TJU_task4a_3 Wenxin_TJU_task4a_4 Wenxin_TJU_task4a_5 Wenxin_TJU_task4a_6 Wenxin_TJU_task4a_7 Wenxin_TJU_task4a_8
SEMI-SUPERVISED SOUND EVENT DETECTION SYSTEM FOR DCASE 2023 TASK4A
Duo, Wenxin1 Fang, Xiang2 and Li, Jie2
1Tianjin University, School of Electrical and Information Engineering, Tianjin, China 2China Telecom Corporation Ltd., Data&AI Technology Company, Beijing, China,
Abstract
In this technical report, we describe our systems for DCASE 2023 Challenge Task4a. Our systems are mainly based on Frequency Dynamic Convolutional Recurrent Neural Network (FDYCRNN) and Mutual Mean Teaching (MMT) semi-supervised strategy. In order to prevent overfitting, we adopt data augmentation using mixup, frame shift, SpecAugment, FilterAugment, Interpolation Consistency Training (ICT) and Shift Consistency Training (SCT). Besides, we utilize strongly labeled AudioSet data as external data and several pretrained models to further improve performance, and try an ensemble of multiple systems with different pretrained models to enhance the generalization capability of our system.
System characteristics
SEMI-SUPERVISED SOUND EVENT DETECTION SYSTEM WITH PRETRAINED MODEL
Wu, Juan and Gan, Yanggang and Cai, Xichang and Wu, Menglong
North China University of Technology, Beijing,China
Wu_NCUT_task4a_1 Wu_NCUT_task4a_2 Wu_NCUT_task4a_3
SEMI-SUPERVISED SOUND EVENT DETECTION SYSTEM WITH PRETRAINED MODEL
Wu, Juan and Gan, Yanggang and Cai, Xichang and Wu, Menglong
North China University of Technology, Beijing,China
Abstract
In this report, we present the sound event detection system for Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge Task 4: Sound Event Detection with Weak Labels and Synthetic Soundscapes. For Task 4A, we designed a SED system based on the Mean Teacher [1] architecture to detect event information and start and stop times in audio sequences, using semi supervised learning to address the lack of labeled data in the DCASE 2023 Challenge task. In addition, we use pre-trained models to leverage external data information to further improve the stability of the system. We finally integrated multiple systems with the best PSDS1 of 0.525 and PSDS2 of 0.783.
System characteristics
FMSG SUBMISSION FOR DCASE 2023 CHALLENGE TASK 4 ON SOUND EVENT DETECTION WITH WEAK LABELS AND SYNTHETIC SOUNDSCAPES
Xiao, Yang and Khandelwal, Tanmay and Das, Rohan Kumar
Fortemedia Singapore, Singapore
Xiao_FMSG_task4a_1 Xiao_FMSG_task4a_2 Xiao_FMSG_task4a_3 Xiao_FMSG_task4a_4 Xiao_FMSG_task4a_5 Xiao_FMSG_task4a_6 Xiao_FMSG_task4a_7 Xiao_FMSG_task4a_8
FMSG SUBMISSION FOR DCASE 2023 CHALLENGE TASK 4 ON SOUND EVENT DETECTION WITH WEAK LABELS AND SYNTHETIC SOUNDSCAPES
Xiao, Yang and Khandelwal, Tanmay and Das, Rohan Kumar
Fortemedia Singapore, Singapore
Abstract
This report presents the systems developed and submitted by Fortemedia Singapore (FMSG) for DCASE 2023 Task 4A, which focuses on sound event detection with weak labels and synthetic soundscapes. Our approach primarily involves integrating features from Bidirectional Encoder representation from Audio Transformers (BEATs) and frequency dynamic (FDY)-convolutional recurrent neural network (CRNN) into a single-stage setup. We focus on three main directions to enhance our approach. Firstly, we curate an external dataset from AudioSet by establishing relationships between AudioSet sound event categories and the target sound events. Secondly, we utilize multiple aggregation methods to leverage the strengths of different methods. Lastly, we employ the asymmetric focal loss (AFL) function to adjust the training weights based on the model’s training difficulty. Additionally, we use data augmentation techniques to prevent overfitting, apply adaptive post-processing methods, and experiment with an ensemble of multiple subsystems to improve the generalization capability of our system. Our method achieves the top PSDS1 and PSDS2 scores of 0.557 and 0.854, respectively, on the development set. Further, on the public evaluation set, our approach achieves the highest PSDS1 and PSDS2 scores of 0.607 and 0.875, respectively.
System characteristics
SOUND EVENT DETECTION WITH WEAK PREDICTION FOR DCASE 2023 CHALLENGE TASK4A
Xiao, Shengchang and Shen, Jiakun and Hu, Aolin and Zhang, Xueshuai and Zhang ,Pengyuan and Yan, Yonghong
Institute of Acoustics, Beijing, China
Zhang_IOA_task4a_1 Zhang_IOA_task4a_2 Zhang_IOA_task4a_3 Zhang_IOA_task4a_4 WZhang_IOA_task4a_5 Zhang_IOA_task4a_6 Zhang_IOA_task4a_7
SOUND EVENT DETECTION WITH WEAK PREDICTION FOR DCASE 2023 CHALLENGE TASK4A
Xiao, Shengchang and Shen, Jiakun and Hu, Aolin and Zhang, Xueshuai and Zhang ,Pengyuan and Yan, Yonghong
Institute of Acoustics, Beijing, China
Abstract
In this technical report, we describe our submitted systems for dcase 2023 Challenge Task4A: Sound Event Detection with weak labels and synthetic soundscapes. Specifically, we design two different systems respectively for PSDS1 and PSDS2. As in previous editions of the Challenge, we also predict weak labels of clips to improve PSDS2. The difference is that this year we use shorter segments for specific classes. Moreover, we adopt the energy difference based log-mel spectrogram to improve feature representation. And we use the Multi-dimensional frequency dynamic convolution (MFDConv) to strengthen the feature extraction ability of convolutional kernels. And we use the confidence-wieghted BCE loss in self-training stage. In addition, we also set higher weight for those classes with worse performances. For post-processing, we optimize the probability values of intervals between events to obtain sharper boundaries.