Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels


Challenge results

Task description

More detailed task description can be found in the task description page

All confindence intervals are computed based on the three runs per systems and bootstrapping on the evaluation set.

Team Ranking

Tables including only the best ranking score per submitting team without ensembling.

Rank Submission
code
Technical
Report

Ranking score
(Evaluation dataset)

PSDS
(DESED evaluation dataset)

mpAUC
(MAESTRO evaluation dataset)
Schmid_CPJKU_task4_2 Schmid2024 1.35 0.646 (0.640 - 0.654) 0.711 (0.704 - 0.717)
Nam_KAIST_task4_2 Nam2024 1.32 0.586 (0.585 - 0.589) 0.738 (0.732 - 0.745)
Zhang_BUPT_task4_1 Yue2024 1.23 0.523 (0.523 - 0.524) 0.704 (0.704 - 0.705)
Chen_CHT_task4_1 Chen2024 1.23 0.495 (0.486 - 0.503) 0.733 (0.730 - 0.739)
Kim_GIST-HanwhaVision_task4_1 Son2024 1.23 0.567 (0.558 - 0.573) 0.665 (0.646 - 0.677)
Chen_NCUT_task4_3 Chen2024 1.20 0.526 (0.524 - 0.527) 0.675 (0.675 - 0.675)
LEE_KT_task4_1 Lee2024 1.19 0.506 (0.482 - 0.548) 0.684 (0.672 - 0.693)
Baseline Cornell2024 1.13 0.475 (0.469 - 0.479) 0.646 (0.641 - 0.653)
XIAO_FMSG-JLESS_task4_3 Xiao2024 1.12 0.574 (0.574 - 0.574) 0.553 (0.553 - 0.553)
Lyu_SCUT_task4_2 Lyu2024 1.10 0.478 (0.474 - 0.481) 0.612 (0.596 - 0.624)
Niu_XJU_task4_1 Niu2024 1.07 0.465 (0.462 - 0.467) 0.603 (0.599 - 0.610)
Cai_USTC_task4_2 Cai2024 0.63 0.574 (0.573 - 0.574) 0.050 (0.050 - 0.050)
Huang_SJTU_task4_4 Huang2024 1.20 0.519 (0.516 - 0.522) 0.678 (0.669 - 0.685)

With ensembling

Rank Submission
code
Technical
Report

Ranking score
(Evaluation dataset)

PSDS
(DESED evaluation dataset)

mpAUC
(MAESTRO evaluation dataset)
Schmid_CPJKU_task4_4 Schmid2024 1.42 0.680 (0.679 - 0.682) 0.739 (0.736 - 0.742)
Nam_KAIST_task4_4 Nam2024 1.35 0.610 (0.609 - 0.611) 0.744 (0.744 - 0.745)
Zhang_BUPT_task4_2 Yue2024 1.27 0.570 (0.566 - 0.573) 0.691 (0.691 - 0.691)
Chen_NCUT_task4_4 Chen2024a 1.25 0.565 (0.563 - 0.566) 0.684 (0.684 - 0.684)
Chen_CHT_task4_3 Chen2024 1.25 0.527 (0.524 - 0.530) 0.711 (0.709 - 0.712)
Kim_GIST-HanwhaVision_task4_1 Son2024 1.23 0.567 (0.558 - 0.573) 0.665 (0.646 - 0.677)
LEE_KT_task4_4 Lee2024 1.20 0.509 (0.509 - 0.509) 0.690 (0.690 - 0.690)
XIAO_FMSG-JLESS_task4_4 Xiao2024 1.17 0.606 (0.606 - 0.606) 0.566 (0.566 - 0.566)
Baseline Cornell2024 1.13 0.475 (0.469 - 0.479) 0.646 (0.641 - 0.653)
Lyu_SCUT_task4_2 Lyu2024 1.10 0.478 (0.474 - 0.481) 0.612 (0.596 - 0.624)
Niu_XJU_task4_1 Niu2024 1.07 0.465 (0.462 - 0.467) 0.603 (0.599 - 0.610)
Cai_USTC_task4_2 Cai2024 0.63 0.574 (0.573 - 0.574) 0.050 (0.050 - 0.050)
Huang_SJTU_task4_4 Huang2024 1.20 0.519 (0.516 - 0.522) 0.678 (0.669 - 0.685)

Systems ranking

Performance obtained without ensembling.

Rank Submission
code
Submission
name
Technical
Report
Ranking score
(Evaluation dataset)
PSDS
(DESED evaluation dataset)
mpAUC
(MAESTRO evaluation dataset)
PSDS
(Development dataset)
mpAUC
(Development dataset)
Schmid_CPJKU_task4_2 ATST S2.I2 Devtest Schmid2024 1.35 0.646 (0.640 - 0.654) 0.711 (0.704 - 0.717) 0.617 0.749
Nam_KAIST_task4_2 NAM_SED_2 Nam2024 1.32 0.586 (0.585 - 0.589) 0.738 (0.732 - 0.745) 0.539 0.773
Schmid_CPJKU_task4_1 ATST S2.I2 Schmid2024 1.31 0.644 (0.640 - 0.647) 0.672 (0.669 - 0.676) 0.617 0.749
Nam_KAIST_task4_1 NAM_SED_1 Nam2024 1.31 0.584 (0.582 - 0.587) 0.726 (0.720 - 0.733) 0.571 0.788
Zhang_BUPT_task4_1 single_model Yue2024 1.23 0.523 (0.523 - 0.524) 0.704 (0.704 - 0.705) 0.543 0.763
Chen_CHT_task4_1 Chen_CHT_task4_1 Chen2024 1.23 0.495 (0.486 - 0.503) 0.733 (0.730 - 0.739) 0.498 0.726
Kim_GIST-HanwhaVision_task4_1 DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature Son2024 1.23 0.567 (0.558 - 0.573) 0.665 (0.646 - 0.677) 0.481 0.686
Chen_CHT_task4_2 Chen_CHT_task4_2 Chen2024 1.23 0.527 (0.524 - 0.530) 0.691 (0.663 - 0.708) 0.531 0.773
Chen_NCUT_task4_3 Chen_NCUT_SED_system_3 Chen2024a 1.20 0.526 (0.524 - 0.527) 0.675 (0.675 - 0.675) 0.514 0.697
Chen_NCUT_task4_1 Chen_NCUT_SED_system_1 Chen2024a 1.20 0.525 (0.523 - 0.527) 0.667 (0.667 - 0.667) 0.521 0.659
Chen_NCUT_task4_2 Chen_NCUT_SED_system_2 Chen2024a 1.19 0.519 (0.485 - 0.537) 0.665 (0.659 - 0.669) 0.525 0.651
LEE_KT_task4_1 CRNN-Con_with_ATST_BEATs Lee2024 1.19 0.506 (0.482 - 0.548) 0.684 (0.672 - 0.693) 0.467 0.734
LEE_KT_task4_2 FDY-CRNN_with_ATST_and_BEATs Lee2024 1.16 0.474 (0.471 - 0.479) 0.676 (0.666 - 0.690) 0.475 0.730
Baseline DCASE2024 baseline system Cornell2024 1.13 0.475 (0.469 - 0.479) 0.646 (0.641 - 0.653) 0.491 0.695
XIAO_FMSG-JLESS_task4_3 XIAO_FMSG-JLESS_task4_3_FDY Xiao2024 1.12 0.574 (0.574 - 0.574) 0.553 (0.553 - 0.553) 0.503 0.737
XIAO_FMSG-JLESS_task4_2 XIAO_FMSG-JLESS_task4_2_WIDE Xiao2024 1.12 0.597 (0.597 - 0.597) 0.530 (0.530 - 0.530) 0.479 0.748
Lyu_SCUT_task4_2 CCRN_BEATs_2 Lyu2024 1.10 0.478 (0.474 - 0.481) 0.612 (0.596 - 0.624) 0.508 0.693
Lyu_SCUT_task4_1 CCRN_BEATs_1 Lyu2024 1.08 0.474 (0.469 - 0.482) 0.602 (0.586 - 0.619) 0.494 0.655
Niu_XJU_task4_1 DCASE2024 SED system Niu2024 1.07 0.465 (0.462 - 0.467) 0.603 (0.599 - 0.610) 0.493 0.657
XIAO_FMSG-JLESS_task4_1 XIAO_FMSG-JLESS_task4_1_ORL Xiao2024 1.06 0.575 (0.575 - 0.575) 0.490 (0.490 - 0.490) 0.506 0.734
Cai_USTC_task4_2 MAT-SED-CNN Cai2024 0.63 0.574 (0.573 - 0.574) 0.050 (0.050 - 0.050) 0.588 0.000
Cai_USTC_task4_1 MAT-SED Cai2024 0.61 0.561 (0.560 - 0.561) 0.050 (0.050 - 0.050) 0.587 0.000
Huang_SJTU_task4_4 pl_mtl_single Huang2024 1.20 0.519 (0.516 - 0.522) 0.678 (0.669 - 0.685) 0.527 0.737

Supplementary metrics

DESED dataset

Rank Submission
code
Submission
name
Technical
Report
PSDS
(Development dataset)
PSDS
(Evaluation dataset)
PSDS
(DESED public evaluation)
PSDS
(DESED Vimeo dataset)
Segment-based F1
Threshold = 0.5
(DESED evaluation)
Segment-based F1
Optimal threshold
(DESED evaluation)
Collar-based F1
Threshold = 0.5
(DESED evaluation)
Collar-based F1
Optimal threshold
(DESED evaluation)
Intersection-based F1
Threshold = 0.5
(DESED evaluation)
Intersection-based F1
Optimal threshold
(DESED evaluation)
Schmid_CPJKU_task4_2 ATST S2.I2 Devtest Schmid2024 0.617 0.646 (0.640 - 0.654) 0.695 (0.692 - 0.698) 0.525 (0.517 - 0.541) 0.853 (0.849 - 0.858) 0.883 (0.882 - 0.884) 0.642 (0.634 - 0.648) 0.672 (0.666 - 0.676) 0.772 (0.768 - 0.778) 0.803 (0.802 - 0.805)
Nam_KAIST_task4_2 NAM_SED_2 Nam2024 0.539 0.586 (0.585 - 0.589) 0.640 (0.637 - 0.644) 0.446 (0.442 - 0.452) 0.858 (0.857 - 0.859) 0.884 (0.883 - 0.886) 0.654 (0.653 - 0.654) 0.673 (0.671 - 0.675) 0.772 (0.771 - 0.775) 0.788 (0.787 - 0.791)
Schmid_CPJKU_task4_1 ATST S2.I2 Schmid2024 0.617 0.644 (0.640 - 0.647) 0.687 (0.684 - 0.688) 0.545 (0.539 - 0.553) 0.851 (0.843 - 0.855) 0.885 (0.881 - 0.889) 0.638 (0.635 - 0.642) 0.678 (0.675 - 0.679) 0.767 (0.765 - 0.769) 0.805 (0.800 - 0.809)
Nam_KAIST_task4_1 NAM_SED_1 Nam2024 0.571 0.584 (0.582 - 0.587) 0.629 (0.624 - 0.636) 0.470 (0.463 - 0.476) 0.860 (0.859 - 0.861) 0.885 (0.884 - 0.886) 0.652 (0.649 - 0.655) 0.673 (0.673 - 0.673) 0.772 (0.771 - 0.774) 0.788 (0.787 - 0.789)
Zhang_BUPT_task4_1 single_model Yue2024 0.543 0.523 (0.523 - 0.524) 0.572 (0.571 - 0.573) 0.425 (0.422 - 0.427) 0.776 (0.774 - 0.777) 0.866 (0.866 - 0.866) 0.527 (0.526 - 0.528) 0.607 (0.606 - 0.608) 0.662 (0.662 - 0.663) 0.749 (0.747 - 0.751)
Chen_CHT_task4_1 Chen_CHT_task4_1 Chen2024 0.498 0.495 (0.486 - 0.503) 0.540 (0.532 - 0.548) 0.412 (0.397 - 0.420) 0.854 (0.850 - 0.859) 0.880 (0.878 - 0.885) 0.561 (0.558 - 0.567) 0.591 (0.587 - 0.597) 0.740 (0.738 - 0.745) 0.763 (0.761 - 0.765)
Kim_GIST-HanwhaVision_task4_1 DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature Son2024 0.481 0.567 (0.558 - 0.573) 0.610 (0.597 - 0.622) 0.464 (0.460 - 0.470) 0.846 (0.833 - 0.863) 0.891 (0.889 - 0.893) 0.577 (0.568 - 0.589) 0.630 (0.621 - 0.636) 0.736 (0.726 - 0.747) 0.780 (0.778 - 0.781)
Chen_CHT_task4_2 Chen_CHT_task4_2 Chen2024 0.531 0.527 (0.524 - 0.530) 0.581 (0.577 - 0.584) 0.422 (0.421 - 0.423) 0.879 (0.879 - 0.880) 0.899 (0.898 - 0.899) 0.562 (0.560 - 0.563) 0.616 (0.615 - 0.617) 0.747 (0.745 - 0.749) 0.782 (0.781 - 0.782)
Chen_NCUT_task4_3 Chen_NCUT_SED_system_3 Chen2024a 0.514 0.526 (0.524 - 0.527) 0.575 (0.572 - 0.576) 0.430 (0.429 - 0.430) 0.878 (0.877 - 0.878) 0.887 (0.887 - 0.887) 0.572 (0.572 - 0.573) 0.615 (0.614 - 0.616) 0.756 (0.754 - 0.757) 0.773 (0.773 - 0.774)
Chen_NCUT_task4_1 Chen_NCUT_SED_system_1 Chen2024a 0.521 0.525 (0.523 - 0.527) 0.575 (0.574 - 0.577) 0.431 (0.430 - 0.432) 0.861 (0.861 - 0.861) 0.877 (0.877 - 0.877) 0.547 (0.547 - 0.547) 0.590 (0.589 - 0.590) 0.740 (0.740 - 0.741) 0.766 (0.765 - 0.766)
Chen_NCUT_task4_2 Chen_NCUT_SED_system_2 Chen2024a 0.525 0.519 (0.485 - 0.537) 0.576 (0.543 - 0.594) 0.398 (0.358 - 0.419) 0.857 (0.855 - 0.860) 0.873 (0.872 - 0.874) 0.535 (0.518 - 0.546) 0.592 (0.577 - 0.601) 0.723 (0.711 - 0.731) 0.767 (0.758 - 0.772)
LEE_KT_task4_1 CRNN-Con_with_ATST_BEATs Lee2024 0.467 0.506 (0.482 - 0.548) 0.550 (0.528 - 0.585) 0.380 (0.340 - 0.441) 0.816 (0.809 - 0.829) 0.842 (0.835 - 0.855) 0.517 (0.479 - 0.575) 0.558 (0.534 - 0.595) 0.682 (0.657 - 0.722) 0.705 (0.685 - 0.739)
LEE_KT_task4_2 FDY-CRNN_with_ATST_and_BEATs Lee2024 0.475 0.474 (0.471 - 0.479) 0.519 (0.515 - 0.524) 0.378 (0.374 - 0.383) 0.814 (0.811 - 0.817) 0.849 (0.841 - 0.862) 0.477 (0.463 - 0.495) 0.537 (0.517 - 0.566) 0.666 (0.664 - 0.668) 0.712 (0.702 - 0.723)
Baseline DCASE2024 baseline system Cornell2024 0.491 0.475 (0.469 - 0.479) 0.522 (0.516 - 0.527) 0.380 (0.365 - 0.389) 0.858 (0.855 - 0.862) 0.867 (0.863 - 0.873) 0.474 (0.470 - 0.480) 0.545 (0.540 - 0.552) 0.682 (0.674 - 0.687) 0.726 (0.722 - 0.733)
XIAO_FMSG-JLESS_task4_3 XIAO_FMSG-JLESS_task4_3_FDY Xiao2024 0.503 0.574 (0.574 - 0.574) 0.631 (0.631 - 0.631) 0.443 (0.443 - 0.443) 0.869 (0.869 - 0.869) 0.885 (0.885 - 0.885) 0.592 (0.592 - 0.592) 0.611 (0.611 - 0.611) 0.775 (0.775 - 0.775) 0.787 (0.787 - 0.787)
XIAO_FMSG-JLESS_task4_2 XIAO_FMSG-JLESS_task4_2_WIDE Xiao2024 0.479 0.597 (0.597 - 0.597) 0.639 (0.639 - 0.639) 0.489 (0.489 - 0.489) 0.869 (0.869 - 0.869) 0.887 (0.887 - 0.887) 0.598 (0.598 - 0.598) 0.621 (0.621 - 0.621) 0.768 (0.768 - 0.768) 0.786 (0.786 - 0.786)
Lyu_SCUT_task4_2 CCRN_BEATs_2 Lyu2024 0.508 0.478 (0.474 - 0.481) 0.532 (0.530 - 0.533) 0.369 (0.367 - 0.371) 0.856 (0.853 - 0.858) 0.867 (0.865 - 0.869) 0.519 (0.515 - 0.523) 0.558 (0.553 - 0.562) 0.697 (0.693 - 0.703) 0.735 (0.733 - 0.736)
Lyu_SCUT_task4_1 CCRN_BEATs_1 Lyu2024 0.494 0.474 (0.469 - 0.482) 0.529 (0.523 - 0.539) 0.361 (0.357 - 0.363) 0.845 (0.843 - 0.846) 0.860 (0.859 - 0.861) 0.456 (0.454 - 0.458) 0.494 (0.493 - 0.494) 0.683 (0.678 - 0.687) 0.710 (0.709 - 0.711)
Niu_XJU_task4_1 DCASE2024 SED system Niu2024 0.493 0.465 (0.462 - 0.467) 0.511 (0.510 - 0.512) 0.367 (0.363 - 0.369) 0.863 (0.861 - 0.864) 0.874 (0.871 - 0.877) 0.543 (0.537 - 0.547) 0.565 (0.560 - 0.568) 0.713 (0.708 - 0.716) 0.730 (0.727 - 0.731)
XIAO_FMSG-JLESS_task4_1 XIAO_FMSG-JLESS_task4_1_ORL Xiao2024 0.506 0.575 (0.575 - 0.575) 0.627 (0.627 - 0.627) 0.450 (0.450 - 0.450) 0.866 (0.866 - 0.866) 0.885 (0.885 - 0.885) 0.579 (0.579 - 0.579) 0.604 (0.604 - 0.604) 0.764 (0.764 - 0.764) 0.779 (0.779 - 0.779)
Cai_USTC_task4_2 MAT-SED-CNN Cai2024 0.588 0.574 (0.573 - 0.574) 0.632 (0.632 - 0.632) 0.473 (0.473 - 0.473) 0.836 (0.836 - 0.836) 0.870 (0.870 - 0.870) 0.615 (0.615 - 0.616) 0.651 (0.650 - 0.651) 0.757 (0.757 - 0.757) 0.788 (0.788 - 0.788)
Cai_USTC_task4_1 MAT-SED Cai2024 0.587 0.561 (0.560 - 0.561) 0.607 (0.606 - 0.607) 0.478 (0.477 - 0.479) 0.824 (0.823 - 0.825) 0.869 (0.868 - 0.869) 0.590 (0.589 - 0.591) 0.630 (0.629 - 0.631) 0.741 (0.739 - 0.742) 0.769 (0.768 - 0.769)
Huang_SJTU_task4_4 pl_mtl_single Huang2024 0.527 0.519 (0.516 - 0.522) 0.568 (0.561 - 0.575) 0.417 (0.408 - 0.430) 0.858 (0.855 - 0.861) 0.871 (0.870 - 0.874) 0.550 (0.547 - 0.556) 0.593 (0.590 - 0.595) 0.731 (0.730 - 0.732) 0.757 (0.755 - 0.759)

MAESTRO dataset

Rank Submission
code
Submission
name
Technical
Report
mpAUC
(MAESTRO Development dataset)
mpAUC
(MAESTRO Evaluation dataset)
Segment-based F1
Threshold = 0.5
(MAESTRO evaluation)
Segment-based F1
Optimal threshold
(MAESTRO evaluation)
Schmid_CPJKU_task4_2 ATST S2.I2 Devtest Schmid2024 0.749 0.711 (0.704 - 0.717) 0.385 (0.376 - 0.392) 0.585 (0.581 - 0.592)
Nam_KAIST_task4_2 NAM_SED_2 Nam2024 0.773 0.738 (0.732 - 0.745) 0.219 (0.218 - 0.221) 0.593 (0.592 - 0.594)
Schmid_CPJKU_task4_1 ATST S2.I2 Schmid2024 0.749 0.672 (0.669 - 0.676) 0.380 (0.371 - 0.391) 0.560 (0.556 - 0.566)
Nam_KAIST_task4_1 NAM_SED_1 Nam2024 0.788 0.726 (0.720 - 0.733) 0.218 (0.216 - 0.221) 0.588 (0.582 - 0.593)
Zhang_BUPT_task4_1 single_model Yue2024 0.763 0.704 (0.704 - 0.705) 0.474 (0.473 - 0.477) 0.570 (0.570 - 0.571)
Chen_CHT_task4_1 Chen_CHT_task4_1 Chen2024 0.726 0.733 (0.730 - 0.739) 0.347 (0.332 - 0.361) 0.603 (0.598 - 0.609)
Kim_GIST-HanwhaVision_task4_1 DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature Son2024 0.686 0.665 (0.646 - 0.677) 0.129 (0.105 - 0.168) 0.544 (0.533 - 0.552)
Chen_CHT_task4_2 Chen_CHT_task4_2 Chen2024 0.773 0.691 (0.663 - 0.708) 0.366 (0.359 - 0.377) 0.570 (0.552 - 0.583)
Chen_NCUT_task4_3 Chen_NCUT_SED_system_3 Chen2024a 0.697 0.675 (0.675 - 0.675) 0.344 (0.332 - 0.361) 0.559 (0.559 - 0.559)
Chen_NCUT_task4_1 Chen_NCUT_SED_system_1 Chen2024a 0.659 0.667 (0.667 - 0.667) 0.422 (0.419 - 0.426) 0.542 (0.541 - 0.542)
Chen_NCUT_task4_2 Chen_NCUT_SED_system_2 Chen2024a 0.651 0.665 (0.659 - 0.669) 0.478 (0.470 - 0.486) 0.542 (0.539 - 0.543)
LEE_KT_task4_1 CRNN-Con_with_ATST_BEATs Lee2024 0.734 0.684 (0.672 - 0.693) 0.258 (0.247 - 0.266) 0.569 (0.557 - 0.576)
LEE_KT_task4_2 FDY-CRNN_with_ATST_and_BEATs Lee2024 0.730 0.676 (0.666 - 0.690) 0.230 (0.219 - 0.238) 0.567 (0.561 - 0.573)
Baseline DCASE2024 baseline system Cornell2024 0.695 0.646 (0.641 - 0.653) 0.459 (0.435 - 0.475) 0.534 (0.530 - 0.537)
XIAO_FMSG-JLESS_task4_3 XIAO_FMSG-JLESS_task4_3_FDY Xiao2024 0.737 0.553 (0.553 - 0.553) 0.113 (0.113 - 0.113) 0.491 (0.491 - 0.491)
XIAO_FMSG-JLESS_task4_2 XIAO_FMSG-JLESS_task4_2_WIDE Xiao2024 0.748 0.530 (0.530 - 0.530) 0.096 (0.096 - 0.096) 0.480 (0.480 - 0.480)
Lyu_SCUT_task4_2 CCRN_BEATs_2 Lyu2024 0.693 0.612 (0.596 - 0.624) 0.370 (0.354 - 0.380) 0.523 (0.520 - 0.526)
Lyu_SCUT_task4_1 CCRN_BEATs_1 Lyu2024 0.655 0.602 (0.586 - 0.619) 0.368 (0.346 - 0.387) 0.510 (0.503 - 0.517)
Niu_XJU_task4_1 DCASE2024 SED system Niu2024 0.657 0.603 (0.599 - 0.610) 0.261 (0.243 - 0.292) 0.515 (0.512 - 0.521)
XIAO_FMSG-JLESS_task4_1 XIAO_FMSG-JLESS_task4_1_ORL Xiao2024 0.734 0.490 (0.490 - 0.490) 0.096 (0.096 - 0.096) 0.455 (0.455 - 0.455)
Cai_USTC_task4_2 MAT-SED-CNN Cai2024 0.000 0.050 (0.050 - 0.050) 0.000 (0.000 - 0.000) 0.168 (0.168 - 0.168)
Cai_USTC_task4_1 MAT-SED Cai2024 0.000 0.050 (0.050 - 0.050) 0.000 (0.000 - 0.000) 0.168 (0.168 - 0.168)
Huang_SJTU_task4_4 pl_mtl_single Huang2024 0.737 0.678 (0.669 - 0.685) 0.410 (0.393 - 0.434) 0.556 (0.553 - 0.561)

With ensembling

Rank Submission
code
Submission
name
Technical
Report
Ranking score
(Evaluation dataset)
PSDS
(DESED evaluation dataset)
mpAUC
(MAESTRO evaluation dataset)
PSDS
(Development dataset)
mpAUC
(Development dataset)
Schmid_CPJKU_task4_4 Ensemble_15 ATST, BEATs, PaSST Devtest Schmid2024 1.42 0.680 (0.679 - 0.682) 0.739 (0.736 - 0.742) 0.632 0.746
Schmid_CPJKU_task4_3 Ensemble_18 ATST, BEATs, PaSST Schmid2024 1.39 0.676 (0.674 - 0.678) 0.715 (0.714 - 0.718) 0.632 0.743
Schmid_CPJKU_task4_2 ATST S2.I2 Devtest Schmid2024 1.35 0.646 (0.640 - 0.654) 0.711 (0.704 - 0.717) 0.617 0.749
Nam_KAIST_task4_4 NAM_SED_4 Nam2024 1.35 0.610 (0.609 - 0.611) 0.744 (0.744 - 0.745) 0.491 0.695
Nam_KAIST_task4_3 NAM_SED_3 Nam2024 1.35 0.610 (0.609 - 0.611) 0.744 (0.744 - 0.744) 0.575 0.788
Nam_KAIST_task4_2 NAM_SED_2 Nam2024 1.32 0.586 (0.585 - 0.589) 0.738 (0.732 - 0.745) 0.539 0.773
Schmid_CPJKU_task4_1 ATST S2.I2 Schmid2024 1.31 0.644 (0.640 - 0.647) 0.672 (0.669 - 0.676) 0.617 0.749
Nam_KAIST_task4_1 NAM_SED_1 Nam2024 1.31 0.584 (0.582 - 0.587) 0.726 (0.720 - 0.733) 0.571 0.788
Zhang_BUPT_task4_2 ensemble_model Yue2024 1.27 0.570 (0.566 - 0.573) 0.691 (0.691 - 0.691) 0.575 0.756
Chen_NCUT_task4_4 Chen_NCUT_SED_system_4 Chen2024a 1.25 0.565 (0.563 - 0.566) 0.684 (0.684 - 0.684) 0.535 0.677
Chen_CHT_task4_3 Chen_CHT_task4_3 Chen2024 1.25 0.527 (0.524 - 0.530) 0.711 (0.709 - 0.712) 0.531 0.740
Zhang_BUPT_task4_1 single_model Yue2024 1.23 0.523 (0.523 - 0.524) 0.704 (0.704 - 0.705) 0.543 0.763
Chen_CHT_task4_1 Chen_CHT_task4_1 Chen2024 1.23 0.495 (0.486 - 0.503) 0.733 (0.730 - 0.739) 0.498 0.726
Kim_GIST-HanwhaVision_task4_1 DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature Son2024 1.23 0.567 (0.558 - 0.573) 0.665 (0.646 - 0.677) 0.481 0.686
Chen_CHT_task4_2 Chen_CHT_task4_2 Chen2024 1.23 0.527 (0.524 - 0.530) 0.691 (0.663 - 0.708) 0.531 0.773
Kim_GIST-HanwhaVision_task4_4 DCASE2024 ensemble model with mix Son2024 1.22 0.586 (0.578 - 0.597) 0.638 (0.620 - 0.654) 0.509 0.700
Kim_GIST-HanwhaVision_task4_2 DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature Son2024 1.21 0.580 (0.560 - 0.599) 0.629 (0.620 - 0.639) 0.486 0.700
Chen_NCUT_task4_3 Chen_NCUT_SED_system_3 Chen2024a 1.20 0.526 (0.524 - 0.527) 0.675 (0.675 - 0.675) 0.514 0.697
Chen_NCUT_task4_1 Chen_NCUT_SED_system_1 Chen2024a 1.20 0.525 (0.523 - 0.527) 0.667 (0.667 - 0.667) 0.521 0.659
LEE_KT_task4_4 Ensemble_FDY-Con_with_ATST_and_BEATs Lee2024 1.20 0.509 (0.509 - 0.509) 0.690 (0.690 - 0.690) 0.507 0.757
Chen_CHT_task4_4 Chen_CHT_task4_4 Chen2024 1.20 0.500 (0.498 - 0.504) 0.691 (0.663 - 0.708) 0.525 0.773
Chen_NCUT_task4_2 Chen_NCUT_SED_system_2 Chen2024a 1.19 0.519 (0.485 - 0.537) 0.665 (0.659 - 0.669) 0.525 0.651
LEE_KT_task4_1 CRNN-Con_with_ATST_BEATs Lee2024 1.19 0.506 (0.482 - 0.548) 0.684 (0.672 - 0.693) 0.467 0.734
Kim_GIST-HanwhaVision_task4_3 DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder Son2024 1.18 0.542 (0.525 - 0.560) 0.637 (0.628 - 0.652) 0.505 0.696
LEE_KT_task4_3 Ensemble_FDY-CON Lee2024 1.17 0.468 (0.468 - 0.468) 0.692 (0.692 - 0.692) 0.510 0.692
XIAO_FMSG-JLESS_task4_4 XIAO_FMSG-JLESS_task4_4_ENSEMBLE Xiao2024 1.17 0.606 (0.606 - 0.606) 0.566 (0.566 - 0.566) 0.519 0.762
LEE_KT_task4_2 FDY-CRNN_with_ATST_and_BEATs Lee2024 1.16 0.474 (0.471 - 0.479) 0.676 (0.666 - 0.690) 0.475 0.730
Baseline DCASE2024 baseline system Cornell2024 1.13 0.475 (0.469 - 0.479) 0.646 (0.641 - 0.653) 0.491 0.695
XIAO_FMSG-JLESS_task4_3 XIAO_FMSG-JLESS_task4_3_FDY Xiao2024 1.12 0.574 (0.574 - 0.574) 0.553 (0.553 - 0.553) 0.503 0.737
XIAO_FMSG-JLESS_task4_2 XIAO_FMSG-JLESS_task4_2_WIDE Xiao2024 1.12 0.597 (0.597 - 0.597) 0.530 (0.530 - 0.530) 0.479 0.748
Lyu_SCUT_task4_2 CCRN_BEATs_2 Lyu2024 1.10 0.478 (0.474 - 0.481) 0.612 (0.596 - 0.624) 0.508 0.693
Lyu_SCUT_task4_1 CCRN_BEATs_1 Lyu2024 1.08 0.474 (0.469 - 0.482) 0.602 (0.586 - 0.619) 0.494 0.655
Niu_XJU_task4_1 DCASE2024 SED system Niu2024 1.07 0.465 (0.462 - 0.467) 0.603 (0.599 - 0.610) 0.493 0.657
XIAO_FMSG-JLESS_task4_1 XIAO_FMSG-JLESS_task4_1_ORL Xiao2024 1.06 0.575 (0.575 - 0.575) 0.490 (0.490 - 0.490) 0.506 0.734
Cai_USTC_task4_2 MAT-SED-CNN Cai2024 0.63 0.574 (0.573 - 0.574) 0.050 (0.050 - 0.050) 0.588 0.000
Cai_USTC_task4_1 MAT-SED Cai2024 0.61 0.561 (0.560 - 0.561) 0.050 (0.050 - 0.050) 0.587 0.000
Cai_USTC_task4_4 MAT-ATST2 Cai2024 0.56 0.506 (0.505 - 0.507) 0.050 (0.050 - 0.050) 0.600 0.000
Cai_USTC_task4_3 MAT-ATST Cai2024 0.47 0.417 (0.402 - 0.428) 0.050 (0.050 - 0.050) 0.600 0.000
Huang_SJTU_task4_1 pl_mtl_ensemble Huang2024 0.20 0.000 (0.000 - 0.000) 0.196 (0.189 - 0.202) 0.545 0.759
Huang_SJTU_task4_3 pl_mtl_ensemble Huang2024 0.17 0.000 (0.000 - 0.000) 0.172 (0.165 - 0.179) 0.545 0.757
Huang_SJTU_task4_2 pl_mtl_ensemble Huang2024 0.15 0.000 (0.000 - 0.000) 0.149 (0.137 - 0.159) 0.541 0.758
Huang_SJTU_task4_4 pl_mtl_single Huang2024 1.20 0.519 (0.516 - 0.522) 0.678 (0.669 - 0.685) 0.527 0.737

Supplementary metrics

DESED dataset

Rank Submission
code
Submission
name
Technical
Report
PSDS
(Development dataset)
PSDS
(Evaluation dataset)
PSDS
(DESED public evaluation)
PSDS
(DESED Vimeo dataset)
Segment-based F1
Threshold = 0.5
(DESED evaluation)
Segment-based F1
Optimal threshold
(DESED evaluation)
Collar-based F1
Threshold = 0.5
(DESED evaluation)
Collar-based F1
Optimal threshold
(DESED evaluation)
Intersection-based F1
Threshold = 0.5
(DESED evaluation)
Intersection-based F1
Optimal threshold
(DESED evaluation)
Schmid_CPJKU_task4_4 Ensemble_15 ATST, BEATs, PaSST Devtest Schmid2024 0.632 0.680 (0.679 - 0.682) 0.733 (0.730 - 0.737) 0.555 (0.553 - 0.559) 0.874 (0.872 - 0.876) 0.903 (0.902 - 0.904) 0.677 (0.674 - 0.683) 0.710 (0.708 - 0.711) 0.801 (0.798 - 0.803) 0.829 (0.827 - 0.831)
Schmid_CPJKU_task4_3 Ensemble_18 ATST, BEATs, PaSST Schmid2024 0.632 0.676 (0.674 - 0.678) 0.724 (0.722 - 0.726) 0.560 (0.555 - 0.565) 0.875 (0.873 - 0.877) 0.904 (0.904 - 0.905) 0.670 (0.666 - 0.673) 0.703 (0.696 - 0.709) 0.796 (0.793 - 0.797) 0.827 (0.826 - 0.827)
Schmid_CPJKU_task4_2 ATST S2.I2 Devtest Schmid2024 0.617 0.646 (0.640 - 0.654) 0.695 (0.692 - 0.698) 0.525 (0.517 - 0.541) 0.853 (0.849 - 0.858) 0.883 (0.882 - 0.884) 0.642 (0.634 - 0.648) 0.672 (0.666 - 0.676) 0.772 (0.768 - 0.778) 0.803 (0.802 - 0.805)
Nam_KAIST_task4_4 NAM_SED_4 Nam2024 0.491 0.610 (0.609 - 0.611) 0.664 (0.663 - 0.665) 0.468 (0.468 - 0.469) 0.857 (0.857 - 0.858) 0.889 (0.888 - 0.889) 0.664 (0.663 - 0.665) 0.683 (0.682 - 0.684) 0.776 (0.775 - 0.776) 0.796 (0.795 - 0.796)
Nam_KAIST_task4_3 NAM_SED_3 Nam2024 0.575 0.610 (0.609 - 0.611) 0.664 (0.663 - 0.666) 0.470 (0.469 - 0.470) 0.858 (0.858 - 0.858) 0.889 (0.889 - 0.889) 0.663 (0.662 - 0.665) 0.683 (0.681 - 0.685) 0.777 (0.776 - 0.777) 0.796 (0.796 - 0.797)
Nam_KAIST_task4_2 NAM_SED_2 Nam2024 0.539 0.586 (0.585 - 0.589) 0.640 (0.637 - 0.644) 0.446 (0.442 - 0.452) 0.858 (0.857 - 0.859) 0.884 (0.883 - 0.886) 0.654 (0.653 - 0.654) 0.673 (0.671 - 0.675) 0.772 (0.771 - 0.775) 0.788 (0.787 - 0.791)
Schmid_CPJKU_task4_1 ATST S2.I2 Schmid2024 0.617 0.644 (0.640 - 0.647) 0.687 (0.684 - 0.688) 0.545 (0.539 - 0.553) 0.851 (0.843 - 0.855) 0.885 (0.881 - 0.889) 0.638 (0.635 - 0.642) 0.678 (0.675 - 0.679) 0.767 (0.765 - 0.769) 0.805 (0.800 - 0.809)
Nam_KAIST_task4_1 NAM_SED_1 Nam2024 0.571 0.584 (0.582 - 0.587) 0.629 (0.624 - 0.636) 0.470 (0.463 - 0.476) 0.860 (0.859 - 0.861) 0.885 (0.884 - 0.886) 0.652 (0.649 - 0.655) 0.673 (0.673 - 0.673) 0.772 (0.771 - 0.774) 0.788 (0.787 - 0.789)
Zhang_BUPT_task4_2 ensemble_model Yue2024 0.575 0.570 (0.566 - 0.573) 0.626 (0.623 - 0.630) 0.469 (0.463 - 0.473) 0.853 (0.852 - 0.854) 0.877 (0.875 - 0.879) 0.614 (0.610 - 0.617) 0.664 (0.661 - 0.666) 0.769 (0.765 - 0.772) 0.792 (0.787 - 0.796)
Chen_NCUT_task4_4 Chen_NCUT_SED_system_4 Chen2024a 0.535 0.565 (0.563 - 0.566) 0.613 (0.612 - 0.614) 0.460 (0.459 - 0.460) 0.868 (0.867 - 0.869) 0.888 (0.888 - 0.888) 0.592 (0.591 - 0.593) 0.658 (0.657 - 0.658) 0.755 (0.754 - 0.756) 0.792 (0.791 - 0.792)
Chen_CHT_task4_3 Chen_CHT_task4_3 Chen2024 0.531 0.527 (0.524 - 0.530) 0.581 (0.577 - 0.584) 0.422 (0.421 - 0.423) 0.879 (0.879 - 0.880) 0.899 (0.898 - 0.899) 0.562 (0.560 - 0.563) 0.616 (0.615 - 0.617) 0.747 (0.745 - 0.749) 0.782 (0.781 - 0.782)
Zhang_BUPT_task4_1 single_model Yue2024 0.543 0.523 (0.523 - 0.524) 0.572 (0.571 - 0.573) 0.425 (0.422 - 0.427) 0.776 (0.774 - 0.777) 0.866 (0.866 - 0.866) 0.527 (0.526 - 0.528) 0.607 (0.606 - 0.608) 0.662 (0.662 - 0.663) 0.749 (0.747 - 0.751)
Chen_CHT_task4_1 Chen_CHT_task4_1 Chen2024 0.498 0.495 (0.486 - 0.503) 0.540 (0.532 - 0.548) 0.412 (0.397 - 0.420) 0.854 (0.850 - 0.859) 0.880 (0.878 - 0.885) 0.561 (0.558 - 0.567) 0.591 (0.587 - 0.597) 0.740 (0.738 - 0.745) 0.763 (0.761 - 0.765)
Kim_GIST-HanwhaVision_task4_1 DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature Son2024 0.481 0.567 (0.558 - 0.573) 0.610 (0.597 - 0.622) 0.464 (0.460 - 0.470) 0.846 (0.833 - 0.863) 0.891 (0.889 - 0.893) 0.577 (0.568 - 0.589) 0.630 (0.621 - 0.636) 0.736 (0.726 - 0.747) 0.780 (0.778 - 0.781)
Chen_CHT_task4_2 Chen_CHT_task4_2 Chen2024 0.531 0.527 (0.524 - 0.530) 0.581 (0.577 - 0.584) 0.422 (0.421 - 0.423) 0.879 (0.879 - 0.880) 0.899 (0.898 - 0.899) 0.562 (0.560 - 0.563) 0.616 (0.615 - 0.617) 0.747 (0.745 - 0.749) 0.782 (0.781 - 0.782)
Kim_GIST-HanwhaVision_task4_4 DCASE2024 ensemble model with mix Son2024 0.509 0.586 (0.578 - 0.597) 0.622 (0.609 - 0.639) 0.496 (0.495 - 0.497) 0.732 (0.658 - 0.823) 0.899 (0.897 - 0.902) 0.557 (0.547 - 0.566) 0.649 (0.619 - 0.675) 0.668 (0.618 - 0.731) 0.792 (0.788 - 0.797)
Kim_GIST-HanwhaVision_task4_2 DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature Son2024 0.486 0.580 (0.560 - 0.599) 0.617 (0.597 - 0.634) 0.490 (0.472 - 0.514) 0.843 (0.826 - 0.863) 0.898 (0.896 - 0.900) 0.579 (0.550 - 0.614) 0.624 (0.608 - 0.648) 0.745 (0.722 - 0.771) 0.787 (0.776 - 0.799)
Chen_NCUT_task4_3 Chen_NCUT_SED_system_3 Chen2024a 0.514 0.526 (0.524 - 0.527) 0.575 (0.572 - 0.576) 0.430 (0.429 - 0.430) 0.878 (0.877 - 0.878) 0.887 (0.887 - 0.887) 0.572 (0.572 - 0.573) 0.615 (0.614 - 0.616) 0.756 (0.754 - 0.757) 0.773 (0.773 - 0.774)
Chen_NCUT_task4_1 Chen_NCUT_SED_system_1 Chen2024a 0.521 0.525 (0.523 - 0.527) 0.575 (0.574 - 0.577) 0.431 (0.430 - 0.432) 0.861 (0.861 - 0.861) 0.877 (0.877 - 0.877) 0.547 (0.547 - 0.547) 0.590 (0.589 - 0.590) 0.740 (0.740 - 0.741) 0.766 (0.765 - 0.766)
LEE_KT_task4_4 Ensemble_FDY-Con_with_ATST_and_BEATs Lee2024 0.507 0.509 (0.509 - 0.509) 0.544 (0.544 - 0.544) 0.412 (0.412 - 0.412) 0.796 (0.796 - 0.796) 0.850 (0.850 - 0.850) 0.524 (0.524 - 0.524) 0.557 (0.557 - 0.557) 0.688 (0.688 - 0.688) 0.718 (0.718 - 0.718)
Chen_CHT_task4_4 Chen_CHT_task4_4 Chen2024 0.525 0.500 (0.498 - 0.504) 0.546 (0.541 - 0.551) 0.412 (0.411 - 0.415) 0.872 (0.870 - 0.873) 0.882 (0.880 - 0.883) 0.520 (0.520 - 0.521) 0.570 (0.568 - 0.572) 0.728 (0.724 - 0.732) 0.756 (0.753 - 0.760)
Chen_NCUT_task4_2 Chen_NCUT_SED_system_2 Chen2024a 0.525 0.519 (0.485 - 0.537) 0.576 (0.543 - 0.594) 0.398 (0.358 - 0.419) 0.857 (0.855 - 0.860) 0.873 (0.872 - 0.874) 0.535 (0.518 - 0.546) 0.592 (0.577 - 0.601) 0.723 (0.711 - 0.731) 0.767 (0.758 - 0.772)
LEE_KT_task4_1 CRNN-Con_with_ATST_BEATs Lee2024 0.467 0.506 (0.482 - 0.548) 0.550 (0.528 - 0.585) 0.380 (0.340 - 0.441) 0.816 (0.809 - 0.829) 0.842 (0.835 - 0.855) 0.517 (0.479 - 0.575) 0.558 (0.534 - 0.595) 0.682 (0.657 - 0.722) 0.705 (0.685 - 0.739)
Kim_GIST-HanwhaVision_task4_3 DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder Son2024 0.505 0.542 (0.525 - 0.560) 0.572 (0.552 - 0.601) 0.464 (0.456 - 0.473) 0.841 (0.832 - 0.847) 0.894 (0.890 - 0.898) 0.545 (0.532 - 0.555) 0.597 (0.576 - 0.613) 0.723 (0.717 - 0.727) 0.764 (0.762 - 0.768)
LEE_KT_task4_3 Ensemble_FDY-CON Lee2024 0.510 0.468 (0.468 - 0.468) 0.515 (0.515 - 0.515) 0.386 (0.386 - 0.387) 0.801 (0.801 - 0.802) 0.857 (0.857 - 0.857) 0.481 (0.481 - 0.481) 0.555 (0.555 - 0.555) 0.667 (0.667 - 0.667) 0.724 (0.724 - 0.724)
XIAO_FMSG-JLESS_task4_4 XIAO_FMSG-JLESS_task4_4_ENSEMBLE Xiao2024 0.519 0.606 (0.606 - 0.606) 0.656 (0.656 - 0.656) 0.479 (0.479 - 0.479) 0.875 (0.875 - 0.875) 0.896 (0.896 - 0.896) 0.626 (0.626 - 0.626) 0.645 (0.645 - 0.645) 0.786 (0.786 - 0.786) 0.804 (0.804 - 0.804)
LEE_KT_task4_2 FDY-CRNN_with_ATST_and_BEATs Lee2024 0.475 0.474 (0.471 - 0.479) 0.519 (0.515 - 0.524) 0.378 (0.374 - 0.383) 0.814 (0.811 - 0.817) 0.849 (0.841 - 0.862) 0.477 (0.463 - 0.495) 0.537 (0.517 - 0.566) 0.666 (0.664 - 0.668) 0.712 (0.702 - 0.723)
Baseline DCASE2024 baseline system Cornell2024 0.491 0.475 (0.469 - 0.479) 0.522 (0.516 - 0.527) 0.380 (0.365 - 0.389) 0.858 (0.855 - 0.862) 0.867 (0.863 - 0.873) 0.474 (0.470 - 0.480) 0.545 (0.540 - 0.552) 0.682 (0.674 - 0.687) 0.726 (0.722 - 0.733)
XIAO_FMSG-JLESS_task4_3 XIAO_FMSG-JLESS_task4_3_FDY Xiao2024 0.503 0.574 (0.574 - 0.574) 0.631 (0.631 - 0.631) 0.443 (0.443 - 0.443) 0.869 (0.869 - 0.869) 0.885 (0.885 - 0.885) 0.592 (0.592 - 0.592) 0.611 (0.611 - 0.611) 0.775 (0.775 - 0.775) 0.787 (0.787 - 0.787)
XIAO_FMSG-JLESS_task4_2 XIAO_FMSG-JLESS_task4_2_WIDE Xiao2024 0.479 0.597 (0.597 - 0.597) 0.639 (0.639 - 0.639) 0.489 (0.489 - 0.489) 0.869 (0.869 - 0.869) 0.887 (0.887 - 0.887) 0.598 (0.598 - 0.598) 0.621 (0.621 - 0.621) 0.768 (0.768 - 0.768) 0.786 (0.786 - 0.786)
Lyu_SCUT_task4_2 CCRN_BEATs_2 Lyu2024 0.508 0.478 (0.474 - 0.481) 0.532 (0.530 - 0.533) 0.369 (0.367 - 0.371) 0.856 (0.853 - 0.858) 0.867 (0.865 - 0.869) 0.519 (0.515 - 0.523) 0.558 (0.553 - 0.562) 0.697 (0.693 - 0.703) 0.735 (0.733 - 0.736)
Lyu_SCUT_task4_1 CCRN_BEATs_1 Lyu2024 0.494 0.474 (0.469 - 0.482) 0.529 (0.523 - 0.539) 0.361 (0.357 - 0.363) 0.845 (0.843 - 0.846) 0.860 (0.859 - 0.861) 0.456 (0.454 - 0.458) 0.494 (0.493 - 0.494) 0.683 (0.678 - 0.687) 0.710 (0.709 - 0.711)
Niu_XJU_task4_1 DCASE2024 SED system Niu2024 0.493 0.465 (0.462 - 0.467) 0.511 (0.510 - 0.512) 0.367 (0.363 - 0.369) 0.863 (0.861 - 0.864) 0.874 (0.871 - 0.877) 0.543 (0.537 - 0.547) 0.565 (0.560 - 0.568) 0.713 (0.708 - 0.716) 0.730 (0.727 - 0.731)
XIAO_FMSG-JLESS_task4_1 XIAO_FMSG-JLESS_task4_1_ORL Xiao2024 0.506 0.575 (0.575 - 0.575) 0.627 (0.627 - 0.627) 0.450 (0.450 - 0.450) 0.866 (0.866 - 0.866) 0.885 (0.885 - 0.885) 0.579 (0.579 - 0.579) 0.604 (0.604 - 0.604) 0.764 (0.764 - 0.764) 0.779 (0.779 - 0.779)
Cai_USTC_task4_2 MAT-SED-CNN Cai2024 0.588 0.574 (0.573 - 0.574) 0.632 (0.632 - 0.632) 0.473 (0.473 - 0.473) 0.836 (0.836 - 0.836) 0.870 (0.870 - 0.870) 0.615 (0.615 - 0.616) 0.651 (0.650 - 0.651) 0.757 (0.757 - 0.757) 0.788 (0.788 - 0.788)
Cai_USTC_task4_1 MAT-SED Cai2024 0.587 0.561 (0.560 - 0.561) 0.607 (0.606 - 0.607) 0.478 (0.477 - 0.479) 0.824 (0.823 - 0.825) 0.869 (0.868 - 0.869) 0.590 (0.589 - 0.591) 0.630 (0.629 - 0.631) 0.741 (0.739 - 0.742) 0.769 (0.768 - 0.769)
Cai_USTC_task4_4 MAT-ATST2 Cai2024 0.600 0.506 (0.505 - 0.507) 0.557 (0.556 - 0.557) 0.406 (0.405 - 0.406) 0.829 (0.829 - 0.830) 0.852 (0.852 - 0.852) 0.603 (0.603 - 0.603) 0.632 (0.631 - 0.632) 0.747 (0.747 - 0.747) 0.771 (0.771 - 0.772)
Cai_USTC_task4_3 MAT-ATST Cai2024 0.600 0.417 (0.402 - 0.428) 0.467 (0.450 - 0.480) 0.330 (0.321 - 0.338) 0.752 (0.729 - 0.772) 0.823 (0.822 - 0.825) 0.472 (0.424 - 0.510) 0.536 (0.514 - 0.556) 0.635 (0.592 - 0.671) 0.697 (0.673 - 0.715)
Huang_SJTU_task4_1 pl_mtl_ensemble Huang2024 0.545 0.000 (0.000 - 0.000) 0.000 (0.000 - 0.000) 0.000 (0.000 - 0.000) 0.000 (0.000 - 0.000) 0.275 (0.268 - 0.280) 0.000 (0.000 - 0.000) 0.100 (0.094 - 0.105) 0.000 (0.000 - 0.000) 0.181 (0.176 - 0.186)
Huang_SJTU_task4_3 pl_mtl_ensemble Huang2024 0.545 0.000 (0.000 - 0.000) 0.000 (0.000 - 0.000) 0.000 (0.000 - 0.000) 0.000 (0.000 - 0.001) 0.298 (0.287 - 0.308) 0.000 (0.000 - 0.000) 0.122 (0.109 - 0.130) 0.000 (0.000 - 0.000) 0.198 (0.189 - 0.204)
Huang_SJTU_task4_2 pl_mtl_ensemble Huang2024 0.541 0.000 (0.000 - 0.000) 0.000 (0.000 - 0.000) 0.000 (0.000 - 0.000) 0.006 (0.000 - 0.015) 0.279 (0.274 - 0.283) 0.000 (0.000 - 0.000) 0.083 (0.058 - 0.101) 0.000 (0.000 - 0.000) 0.166 (0.137 - 0.184)
Huang_SJTU_task4_4 pl_mtl_single Huang2024 0.527 0.519 (0.516 - 0.522) 0.568 (0.561 - 0.575) 0.417 (0.408 - 0.430) 0.858 (0.855 - 0.861) 0.871 (0.870 - 0.874) 0.550 (0.547 - 0.556) 0.593 (0.590 - 0.595) 0.731 (0.730 - 0.732) 0.757 (0.755 - 0.759)

MAESTRO dataset

Rank Submission
code
Submission
name
Technical
Report
mpAUC
(MAESTRO Development dataset)
mpAUC
(MAESTRO Evaluation dataset)
Segment-based F1
Threshold = 0.5
(MAESTRO evaluation)
Segment-based F1
Optimal threshold
(MAESTRO evaluation)
Schmid_CPJKU_task4_4 Ensemble_15 ATST, BEATs, PaSST Devtest Schmid2024 0.746 0.739 (0.736 - 0.742) 0.392 (0.387 - 0.394) 0.600 (0.597 - 0.604)
Schmid_CPJKU_task4_3 Ensemble_18 ATST, BEATs, PaSST Schmid2024 0.743 0.715 (0.714 - 0.718) 0.379 (0.374 - 0.384) 0.585 (0.583 - 0.587)
Schmid_CPJKU_task4_2 ATST S2.I2 Devtest Schmid2024 0.749 0.711 (0.704 - 0.717) 0.385 (0.376 - 0.392) 0.585 (0.581 - 0.592)
Nam_KAIST_task4_4 NAM_SED_4 Nam2024 0.695 0.744 (0.744 - 0.745) 0.214 (0.214 - 0.215) 0.601 (0.601 - 0.602)
Nam_KAIST_task4_3 NAM_SED_3 Nam2024 0.788 0.744 (0.744 - 0.744) 0.213 (0.213 - 0.214) 0.600 (0.600 - 0.601)
Nam_KAIST_task4_2 NAM_SED_2 Nam2024 0.773 0.738 (0.732 - 0.745) 0.219 (0.218 - 0.221) 0.593 (0.592 - 0.594)
Schmid_CPJKU_task4_1 ATST S2.I2 Schmid2024 0.749 0.672 (0.669 - 0.676) 0.380 (0.371 - 0.391) 0.560 (0.556 - 0.566)
Nam_KAIST_task4_1 NAM_SED_1 Nam2024 0.788 0.726 (0.720 - 0.733) 0.218 (0.216 - 0.221) 0.588 (0.582 - 0.593)
Zhang_BUPT_task4_2 ensemble_model Yue2024 0.756 0.691 (0.691 - 0.691) 0.485 (0.480 - 0.491) 0.565 (0.563 - 0.568)
Chen_NCUT_task4_4 Chen_NCUT_SED_system_4 Chen2024a 0.677 0.684 (0.684 - 0.684) 0.461 (0.419 - 0.489) 0.560 (0.559 - 0.560)
Chen_CHT_task4_3 Chen_CHT_task4_3 Chen2024 0.740 0.711 (0.709 - 0.712) 0.344 (0.337 - 0.349) 0.589 (0.588 - 0.591)
Zhang_BUPT_task4_1 single_model Yue2024 0.763 0.704 (0.704 - 0.705) 0.474 (0.473 - 0.477) 0.570 (0.570 - 0.571)
Chen_CHT_task4_1 Chen_CHT_task4_1 Chen2024 0.726 0.733 (0.730 - 0.739) 0.347 (0.332 - 0.361) 0.603 (0.598 - 0.609)
Kim_GIST-HanwhaVision_task4_1 DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature Son2024 0.686 0.665 (0.646 - 0.677) 0.129 (0.105 - 0.168) 0.544 (0.533 - 0.552)
Chen_CHT_task4_2 Chen_CHT_task4_2 Chen2024 0.773 0.691 (0.663 - 0.708) 0.366 (0.359 - 0.377) 0.570 (0.552 - 0.583)
Kim_GIST-HanwhaVision_task4_4 DCASE2024 ensemble model with mix Son2024 0.700 0.638 (0.620 - 0.654) 0.049 (0.012 - 0.111) 0.542 (0.533 - 0.553)
Kim_GIST-HanwhaVision_task4_2 DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature Son2024 0.700 0.629 (0.620 - 0.639) 0.131 (0.104 - 0.156) 0.533 (0.530 - 0.537)
Chen_NCUT_task4_3 Chen_NCUT_SED_system_3 Chen2024a 0.697 0.675 (0.675 - 0.675) 0.344 (0.332 - 0.361) 0.559 (0.559 - 0.559)
Chen_NCUT_task4_1 Chen_NCUT_SED_system_1 Chen2024a 0.659 0.667 (0.667 - 0.667) 0.422 (0.419 - 0.426) 0.542 (0.541 - 0.542)
LEE_KT_task4_4 Ensemble_FDY-Con_with_ATST_and_BEATs Lee2024 0.757 0.690 (0.690 - 0.690) 0.232 (0.232 - 0.232) 0.572 (0.572 - 0.572)
Chen_CHT_task4_4 Chen_CHT_task4_4 Chen2024 0.773 0.691 (0.663 - 0.708) 0.366 (0.359 - 0.377) 0.570 (0.552 - 0.583)
Chen_NCUT_task4_2 Chen_NCUT_SED_system_2 Chen2024a 0.651 0.665 (0.659 - 0.669) 0.478 (0.470 - 0.486) 0.542 (0.539 - 0.543)
LEE_KT_task4_1 CRNN-Con_with_ATST_BEATs Lee2024 0.734 0.684 (0.672 - 0.693) 0.258 (0.247 - 0.266) 0.569 (0.557 - 0.576)
Kim_GIST-HanwhaVision_task4_3 DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder Son2024 0.696 0.637 (0.628 - 0.652) 0.144 (0.122 - 0.179) 0.537 (0.528 - 0.550)
LEE_KT_task4_3 Ensemble_FDY-CON Lee2024 0.692 0.692 (0.692 - 0.692) 0.211 (0.211 - 0.211) 0.575 (0.575 - 0.576)
XIAO_FMSG-JLESS_task4_4 XIAO_FMSG-JLESS_task4_4_ENSEMBLE Xiao2024 0.762 0.566 (0.566 - 0.566) 0.091 (0.091 - 0.091) 0.517 (0.517 - 0.517)
LEE_KT_task4_2 FDY-CRNN_with_ATST_and_BEATs Lee2024 0.730 0.676 (0.666 - 0.690) 0.230 (0.219 - 0.238) 0.567 (0.561 - 0.573)
Baseline DCASE2024 baseline system Cornell2024 0.695 0.646 (0.641 - 0.653) 0.459 (0.435 - 0.475) 0.534 (0.530 - 0.537)
XIAO_FMSG-JLESS_task4_3 XIAO_FMSG-JLESS_task4_3_FDY Xiao2024 0.737 0.553 (0.553 - 0.553) 0.113 (0.113 - 0.113) 0.491 (0.491 - 0.491)
XIAO_FMSG-JLESS_task4_2 XIAO_FMSG-JLESS_task4_2_WIDE Xiao2024 0.748 0.530 (0.530 - 0.530) 0.096 (0.096 - 0.096) 0.480 (0.480 - 0.480)
Lyu_SCUT_task4_2 CCRN_BEATs_2 Lyu2024 0.693 0.612 (0.596 - 0.624) 0.370 (0.354 - 0.380) 0.523 (0.520 - 0.526)
Lyu_SCUT_task4_1 CCRN_BEATs_1 Lyu2024 0.655 0.602 (0.586 - 0.619) 0.368 (0.346 - 0.387) 0.510 (0.503 - 0.517)
Niu_XJU_task4_1 DCASE2024 SED system Niu2024 0.657 0.603 (0.599 - 0.610) 0.261 (0.243 - 0.292) 0.515 (0.512 - 0.521)
XIAO_FMSG-JLESS_task4_1 XIAO_FMSG-JLESS_task4_1_ORL Xiao2024 0.734 0.490 (0.490 - 0.490) 0.096 (0.096 - 0.096) 0.455 (0.455 - 0.455)
Cai_USTC_task4_2 MAT-SED-CNN Cai2024 0.000 0.050 (0.050 - 0.050) 0.000 (0.000 - 0.000) 0.168 (0.168 - 0.168)
Cai_USTC_task4_1 MAT-SED Cai2024 0.000 0.050 (0.050 - 0.050) 0.000 (0.000 - 0.000) 0.168 (0.168 - 0.168)
Cai_USTC_task4_4 MAT-ATST2 Cai2024 0.000 0.050 (0.050 - 0.050) 0.000 (0.000 - 0.000) 0.168 (0.168 - 0.168)
Cai_USTC_task4_3 MAT-ATST Cai2024 0.000 0.050 (0.050 - 0.050) 0.000 (0.000 - 0.000) 0.168 (0.168 - 0.168)
Huang_SJTU_task4_1 pl_mtl_ensemble Huang2024 0.759 0.196 (0.189 - 0.202) 0.000 (0.000 - 0.000) 0.270 (0.268 - 0.272)
Huang_SJTU_task4_3 pl_mtl_ensemble Huang2024 0.757 0.172 (0.165 - 0.179) 0.000 (0.000 - 0.000) 0.294 (0.291 - 0.296)
Huang_SJTU_task4_2 pl_mtl_ensemble Huang2024 0.758 0.149 (0.137 - 0.159) 0.000 (0.000 - 0.000) 0.252 (0.240 - 0.262)
Huang_SJTU_task4_4 pl_mtl_single Huang2024 0.737 0.678 (0.669 - 0.685) 0.410 (0.393 - 0.434) 0.556 (0.553 - 0.561)

Class-wise performance

DESED

PSDS

Rank Submission
code
Submission
name
Technical
Report
Ranking score
(DESED evaluation dataset)
Alarm
Bell
Ringing
Blender Cat Dishes Dog Electric
shave
toothbrush
Frying Running
water
Speech Vacuum
cleaner
Schmid_CPJKU_task4_4 Ensemble_15 ATST, BEATs, PaSST Devtest Schmid2024 1.42 0.812 (0.797 - 0.830) 0.948 (0.945 - 0.950) 0.877 (0.872 - 0.881) 0.523 (0.517 - 0.527) 0.704 (0.696 - 0.708) 0.823 (0.799 - 0.843) 0.880 (0.879 - 0.881) 0.710 (0.706 - 0.717) 0.854 (0.851 - 0.857) 0.930 (0.922 - 0.939)
Schmid_CPJKU_task4_3 Ensemble_18 ATST, BEATs, PaSST Schmid2024 1.39 0.795 (0.793 - 0.797) 0.964 (0.959 - 0.970) 0.874 (0.872 - 0.875) 0.512 (0.508 - 0.515) 0.717 (0.705 - 0.725) 0.815 (0.790 - 0.835) 0.884 (0.880 - 0.889) 0.708 (0.697 - 0.715) 0.852 (0.851 - 0.853) 0.920 (0.914 - 0.924)
Schmid_CPJKU_task4_2 ATST S2.I2 Devtest Schmid2024 1.35 0.772 (0.752 - 0.787) 0.928 (0.921 - 0.936) 0.881 (0.873 - 0.888) 0.475 (0.462 - 0.485) 0.699 (0.691 - 0.712) 0.776 (0.749 - 0.807) 0.830 (0.823 - 0.835) 0.681 (0.668 - 0.698) 0.841 (0.838 - 0.843) 0.895 (0.882 - 0.907)
Nam_KAIST_task4_4 NAM_SED_4 Nam2024 1.35 0.741 (0.739 - 0.744) 0.929 (0.928 - 0.930) 0.829 (0.827 - 0.831) 0.413 (0.412 - 0.415) 0.627 (0.626 - 0.627) 0.864 (0.860 - 0.866) 0.778 (0.778 - 0.779) 0.712 (0.711 - 0.713) 0.779 (0.778 - 0.779) 0.930 (0.930 - 0.930)
Nam_KAIST_task4_3 NAM_SED_3 Nam2024 1.35 0.739 (0.738 - 0.740) 0.931 (0.931 - 0.932) 0.833 (0.831 - 0.834) 0.416 (0.415 - 0.417) 0.625 (0.622 - 0.627) 0.869 (0.866 - 0.871) 0.787 (0.783 - 0.790) 0.703 (0.702 - 0.705) 0.779 (0.778 - 0.780) 0.930 (0.929 - 0.931)
Nam_KAIST_task4_2 NAM_SED_2 Nam2024 1.32 0.741 (0.736 - 0.745) 0.895 (0.888 - 0.908) 0.831 (0.827 - 0.836) 0.394 (0.389 - 0.398) 0.616 (0.609 - 0.621) 0.825 (0.813 - 0.833) 0.764 (0.756 - 0.771) 0.636 (0.633 - 0.640) 0.762 (0.758 - 0.765) 0.914 (0.909 - 0.918)
Schmid_CPJKU_task4_1 ATST S2.I2 Schmid2024 1.31 0.785 (0.773 - 0.796) 0.922 (0.913 - 0.930) 0.893 (0.887 - 0.899) 0.458 (0.448 - 0.468) 0.701 (0.688 - 0.719) 0.809 (0.795 - 0.819) 0.820 (0.812 - 0.828) 0.673 (0.659 - 0.685) 0.843 (0.839 - 0.848) 0.892 (0.887 - 0.895)
Nam_KAIST_task4_1 NAM_SED_1 Nam2024 1.31 0.764 (0.756 - 0.770) 0.874 (0.864 - 0.884) 0.829 (0.826 - 0.832) 0.395 (0.384 - 0.401) 0.604 (0.596 - 0.610) 0.820 (0.805 - 0.841) 0.755 (0.747 - 0.760) 0.623 (0.599 - 0.639) 0.771 (0.766 - 0.776) 0.912 (0.905 - 0.916)
Zhang_BUPT_task4_2 ensemble_model Yue2024 1.27 0.712 (0.702 - 0.729) 0.855 (0.852 - 0.858) 0.840 (0.840 - 0.841) 0.395 (0.394 - 0.398) 0.581 (0.578 - 0.584) 0.737 (0.708 - 0.764) 0.766 (0.745 - 0.783) 0.614 (0.606 - 0.621) 0.827 (0.826 - 0.828) 0.895 (0.890 - 0.898)
Chen_NCUT_task4_4 Chen_NCUT_SED_system_4 Chen2024a 1.25 0.731 (0.731 - 0.731) 0.842 (0.838 - 0.846) 0.792 (0.791 - 0.792) 0.352 (0.352 - 0.353) 0.571 (0.570 - 0.571) 0.780 (0.780 - 0.780) 0.819 (0.819 - 0.820) 0.663 (0.663 - 0.664) 0.815 (0.813 - 0.817) 0.887 (0.887 - 0.887)
Chen_CHT_task4_3 Chen_CHT_task4_3 Chen2024 1.25 0.659 (0.650 - 0.668) 0.799 (0.797 - 0.803) 0.779 (0.777 - 0.781) 0.330 (0.328 - 0.333) 0.491 (0.486 - 0.500) 0.806 (0.800 - 0.812) 0.780 (0.772 - 0.787) 0.647 (0.637 - 0.660) 0.804 (0.801 - 0.807) 0.898 (0.897 - 0.898)
Zhang_BUPT_task4_1 single_model Yue2024 1.23 0.610 (0.608 - 0.611) 0.865 (0.863 - 0.868) 0.814 (0.813 - 0.814) 0.366 (0.366 - 0.366) 0.494 (0.492 - 0.496) 0.762 (0.761 - 0.762) 0.802 (0.798 - 0.807) 0.588 (0.587 - 0.589) 0.760 (0.757 - 0.762) 0.870 (0.864 - 0.874)
Chen_CHT_task4_1 Chen_CHT_task4_1 Chen2024 1.23 0.648 (0.627 - 0.668) 0.763 (0.756 - 0.771) 0.810 (0.801 - 0.822) 0.305 (0.289 - 0.315) 0.474 (0.451 - 0.504) 0.815 (0.804 - 0.821) 0.763 (0.751 - 0.779) 0.571 (0.554 - 0.597) 0.741 (0.728 - 0.757) 0.877 (0.863 - 0.888)
Kim_GIST-HanwhaVision_task4_1 DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature Son2024 1.23 0.764 (0.757 - 0.774) 0.883 (0.880 - 0.885) 0.796 (0.793 - 0.798) 0.346 (0.344 - 0.347) 0.519 (0.495 - 0.539) 0.861 (0.858 - 0.865) 0.884 (0.878 - 0.890) 0.687 (0.679 - 0.692) 0.790 (0.770 - 0.806) 0.898 (0.889 - 0.904)
Chen_CHT_task4_2 Chen_CHT_task4_2 Chen2024 1.23 0.659 (0.650 - 0.668) 0.799 (0.797 - 0.803) 0.779 (0.777 - 0.781) 0.330 (0.328 - 0.333) 0.491 (0.486 - 0.500) 0.806 (0.800 - 0.812) 0.780 (0.772 - 0.787) 0.647 (0.637 - 0.660) 0.804 (0.801 - 0.807) 0.898 (0.897 - 0.898)
Kim_GIST-HanwhaVision_task4_4 DCASE2024 ensemble model with mix Son2024 1.22 0.774 (0.760 - 0.784) 0.918 (0.916 - 0.919) 0.819 (0.803 - 0.831) 0.361 (0.348 - 0.377) 0.540 (0.526 - 0.560) 0.886 (0.879 - 0.895) 0.902 (0.896 - 0.906) 0.720 (0.713 - 0.726) 0.799 (0.791 - 0.806) 0.916 (0.908 - 0.926)
Kim_GIST-HanwhaVision_task4_2 DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature Son2024 1.21 0.791 (0.763 - 0.832) 0.902 (0.899 - 0.904) 0.800 (0.790 - 0.809) 0.362 (0.341 - 0.383) 0.518 (0.488 - 0.541) 0.890 (0.882 - 0.900) 0.900 (0.892 - 0.910) 0.713 (0.707 - 0.721) 0.795 (0.785 - 0.806) 0.925 (0.921 - 0.928)
Chen_NCUT_task4_3 Chen_NCUT_SED_system_3 Chen2024a 1.20 0.736 (0.735 - 0.736) 0.849 (0.848 - 0.850) 0.716 (0.716 - 0.717) 0.330 (0.329 - 0.330) 0.455 (0.453 - 0.457) 0.830 (0.830 - 0.831) 0.791 (0.790 - 0.791) 0.657 (0.657 - 0.657) 0.787 (0.785 - 0.789) 0.883 (0.883 - 0.883)
Chen_NCUT_task4_1 Chen_NCUT_SED_system_1 Chen2024a 1.20 0.724 (0.724 - 0.725) 0.829 (0.824 - 0.835) 0.788 (0.787 - 0.789) 0.339 (0.339 - 0.339) 0.562 (0.561 - 0.563) 0.593 (0.589 - 0.597) 0.812 (0.812 - 0.812) 0.582 (0.581 - 0.582) 0.811 (0.810 - 0.812) 0.885 (0.883 - 0.886)
LEE_KT_task4_4 Ensemble_FDY-Con_with_ATST_and_BEATs Lee2024 1.20 0.633 (0.633 - 0.633) 0.835 (0.835 - 0.835) 0.761 (0.761 - 0.761) 0.320 (0.320 - 0.320) 0.455 (0.455 - 0.455) 0.725 (0.725 - 0.725) 0.712 (0.712 - 0.712) 0.659 (0.659 - 0.659) 0.766 (0.766 - 0.766) 0.853 (0.853 - 0.853)
Chen_CHT_task4_4 Chen_CHT_task4_4 Chen2024 1.20 0.649 (0.627 - 0.668) 0.778 (0.778 - 0.778) 0.802 (0.781 - 0.827) 0.282 (0.282 - 0.282) 0.455 (0.455 - 0.455) 0.724 (0.724 - 0.724) 0.816 (0.816 - 0.816) 0.652 (0.652 - 0.652) 0.783 (0.783 - 0.783) 0.853 (0.853 - 0.853)
Chen_NCUT_task4_2 Chen_NCUT_SED_system_2 Chen2024a 1.19 0.697 (0.683 - 0.706) 0.831 (0.818 - 0.841) 0.763 (0.759 - 0.766) 0.314 (0.304 - 0.320) 0.553 (0.546 - 0.558) 0.721 (0.679 - 0.748) 0.780 (0.756 - 0.795) 0.608 (0.575 - 0.627) 0.750 (0.677 - 0.795) 0.858 (0.845 - 0.865)
LEE_KT_task4_1 CRNN-Con_with_ATST_BEATs Lee2024 1.19 0.615 (0.556 - 0.678) 0.821 (0.786 - 0.871) 0.714 (0.689 - 0.740) 0.348 (0.302 - 0.416) 0.454 (0.437 - 0.467) 0.748 (0.719 - 0.793) 0.719 (0.685 - 0.760) 0.615 (0.572 - 0.667) 0.756 (0.752 - 0.759) 0.808 (0.760 - 0.881)
Kim_GIST-HanwhaVision_task4_3 DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder Son2024 1.18 0.711 (0.688 - 0.734) 0.910 (0.906 - 0.914) 0.804 (0.782 - 0.821) 0.331 (0.312 - 0.347) 0.455 (0.409 - 0.504) 0.877 (0.867 - 0.882) 0.876 (0.865 - 0.892) 0.690 (0.680 - 0.700) 0.780 (0.757 - 0.795) 0.922 (0.917 - 0.929)
LEE_KT_task4_3 Ensemble_FDY-CON Lee2024 1.17 0.586 (0.586 - 0.586) 0.796 (0.796 - 0.796) 0.744 (0.744 - 0.744) 0.273 (0.273 - 0.273) 0.472 (0.472 - 0.472) 0.605 (0.605 - 0.605) 0.728 (0.728 - 0.728) 0.574 (0.574 - 0.574) 0.792 (0.792 - 0.792) 0.856 (0.856 - 0.856)
XIAO_FMSG-JLESS_task4_4 XIAO_FMSG-JLESS_task4_4_ENSEMBLE Xiao2024 1.17 0.792 (0.792 - 0.792) 0.901 (0.901 - 0.901) 0.779 (0.779 - 0.779) 0.397 (0.397 - 0.397) 0.595 (0.595 - 0.595) 0.879 (0.879 - 0.879) 0.904 (0.904 - 0.904) 0.714 (0.714 - 0.714) 0.792 (0.792 - 0.792) 0.869 (0.869 - 0.869)
LEE_KT_task4_2 FDY-CRNN_with_ATST_and_BEATs Lee2024 1.16 0.582 (0.535 - 0.626) 0.801 (0.761 - 0.854) 0.698 (0.683 - 0.714) 0.319 (0.309 - 0.325) 0.470 (0.382 - 0.540) 0.577 (0.549 - 0.624) 0.763 (0.742 - 0.797) 0.572 (0.547 - 0.601) 0.778 (0.762 - 0.790) 0.827 (0.805 - 0.851)
Baseline DCASE2024 baseline system Cornell2024 1.13 0.634 (0.611 - 0.654) 0.793 (0.781 - 0.808) 0.720 (0.707 - 0.734) 0.303 (0.289 - 0.316) 0.454 (0.442 - 0.466) 0.576 (0.558 - 0.592) 0.754 (0.728 - 0.786) 0.587 (0.578 - 0.598) 0.779 (0.774 - 0.783) 0.855 (0.840 - 0.872)
XIAO_FMSG-JLESS_task4_3 XIAO_FMSG-JLESS_task4_3_FDY Xiao2024 1.12 0.769 (0.769 - 0.769) 0.871 (0.871 - 0.871) 0.764 (0.764 - 0.764) 0.354 (0.354 - 0.354) 0.575 (0.575 - 0.575) 0.861 (0.861 - 0.861) 0.893 (0.893 - 0.893) 0.669 (0.669 - 0.669) 0.760 (0.760 - 0.760) 0.858 (0.858 - 0.858)
XIAO_FMSG-JLESS_task4_2 XIAO_FMSG-JLESS_task4_2_WIDE Xiao2024 1.12 0.745 (0.745 - 0.745) 0.878 (0.878 - 0.878) 0.764 (0.764 - 0.764) 0.410 (0.410 - 0.410) 0.643 (0.643 - 0.643) 0.857 (0.857 - 0.857) 0.755 (0.755 - 0.755) 0.646 (0.646 - 0.646) 0.798 (0.798 - 0.798) 0.878 (0.878 - 0.878)
Lyu_SCUT_task4_2 CCRN_BEATs_2 Lyu2024 1.10 0.582 (0.554 - 0.618) 0.837 (0.834 - 0.840) 0.757 (0.751 - 0.763) 0.302 (0.293 - 0.308) 0.467 (0.450 - 0.485) 0.781 (0.781 - 0.782) 0.683 (0.662 - 0.699) 0.552 (0.542 - 0.562) 0.748 (0.741 - 0.753) 0.870 (0.862 - 0.881)
Lyu_SCUT_task4_1 CCRN_BEATs_1 Lyu2024 1.08 0.560 (0.534 - 0.591) 0.800 (0.796 - 0.807) 0.674 (0.671 - 0.677) 0.312 (0.302 - 0.322) 0.471 (0.465 - 0.481) 0.685 (0.671 - 0.706) 0.728 (0.718 - 0.738) 0.558 (0.549 - 0.571) 0.740 (0.731 - 0.751) 0.857 (0.839 - 0.874)
Niu_XJU_task4_1 DCASE2024 SED system Niu2024 1.07 0.605 (0.603 - 0.608) 0.807 (0.800 - 0.819) 0.765 (0.763 - 0.769) 0.285 (0.278 - 0.289) 0.368 (0.361 - 0.380) 0.761 (0.747 - 0.769) 0.777 (0.754 - 0.790) 0.575 (0.555 - 0.586) 0.758 (0.755 - 0.763) 0.861 (0.858 - 0.867)
XIAO_FMSG-JLESS_task4_1 XIAO_FMSG-JLESS_task4_1_ORL Xiao2024 1.06 0.721 (0.721 - 0.721) 0.874 (0.874 - 0.874) 0.748 (0.748 - 0.748) 0.373 (0.373 - 0.373) 0.565 (0.565 - 0.565) 0.866 (0.866 - 0.866) 0.877 (0.877 - 0.877) 0.668 (0.668 - 0.668) 0.820 (0.820 - 0.820) 0.804 (0.804 - 0.804)
Cai_USTC_task4_2 MAT-SED-CNN Cai2024 0.63 0.694 (0.694 - 0.694) 0.804 (0.804 - 0.804) 0.850 (0.850 - 0.850) 0.429 (0.429 - 0.429) 0.590 (0.590 - 0.590) 0.830 (0.826 - 0.832) 0.814 (0.814 - 0.814) 0.559 (0.557 - 0.561) 0.804 (0.804 - 0.804) 0.908 (0.908 - 0.908)
Cai_USTC_task4_1 MAT-SED Cai2024 0.61 0.655 (0.655 - 0.655) 0.797 (0.786 - 0.812) 0.821 (0.821 - 0.821) 0.414 (0.414 - 0.414) 0.556 (0.552 - 0.560) 0.744 (0.743 - 0.744) 0.829 (0.829 - 0.829) 0.595 (0.595 - 0.595) 0.802 (0.802 - 0.802) 0.863 (0.860 - 0.867)
Cai_USTC_task4_4 MAT-ATST2 Cai2024 0.56 0.637 (0.635 - 0.639) 0.719 (0.715 - 0.722) 0.803 (0.803 - 0.804) 0.386 (0.386 - 0.387) 0.547 (0.546 - 0.547) 0.676 (0.676 - 0.676) 0.795 (0.795 - 0.796) 0.546 (0.546 - 0.546) 0.699 (0.699 - 0.700) 0.842 (0.841 - 0.842)
Cai_USTC_task4_3 MAT-ATST Cai2024 0.47 0.502 (0.468 - 0.525) 0.625 (0.623 - 0.627) 0.678 (0.619 - 0.729) 0.282 (0.270 - 0.292) 0.483 (0.471 - 0.493) 0.606 (0.602 - 0.609) 0.790 (0.790 - 0.791) 0.498 (0.497 - 0.499) 0.624 (0.611 - 0.635) 0.830 (0.829 - 0.830)
Huang_SJTU_task4_1 pl_mtl_ensemble Huang2024 0.20 0.000 (0.000 - 0.001) 0.000 (0.000 - 0.000) 0.018 (0.018 - 0.018) 0.001 (0.000 - 0.003) 0.007 (0.001 - 0.014) 0.005 (0.005 - 0.005) 0.036 (0.022 - 0.050) 0.000 (0.000 - 0.000) 0.001 (0.001 - 0.001) 0.000 (0.000 - 0.000)
Huang_SJTU_task4_3 pl_mtl_ensemble Huang2024 0.17 0.005 (0.001 - 0.011) 0.000 (0.000 - 0.000) 0.005 (0.001 - 0.007) 0.036 (0.029 - 0.045) 0.016 (0.013 - 0.019) 0.030 (0.016 - 0.040) 0.000 (0.000 - 0.001) 0.000 (0.000 - 0.000) 0.002 (0.001 - 0.002) 0.000 (0.000 - 0.000)
Huang_SJTU_task4_2 pl_mtl_ensemble Huang2024 0.15 0.002 (0.000 - 0.006) 0.000 (0.000 - 0.000) 0.009 (0.000 - 0.020) 0.000 (0.000 - 0.001) 0.010 (0.001 - 0.024) 0.004 (0.001 - 0.006) 0.063 (0.034 - 0.085) 0.001 (0.000 - 0.001) 0.002 (0.001 - 0.002) 0.000 (0.000 - 0.000)
Huang_SJTU_task4_4 pl_mtl_single Huang2024 1.20 0.596 (0.566 - 0.622) 0.826 (0.817 - 0.837) 0.759 (0.748 - 0.771) 0.325 (0.320 - 0.329) 0.549 (0.545 - 0.554) 0.767 (0.763 - 0.770) 0.787 (0.778 - 0.800) 0.563 (0.555 - 0.574) 0.818 (0.818 - 0.819) 0.850 (0.847 - 0.852)

mpAUC

Rank Submission
code
Submission
name
Technical
Report
Ranking score
(DESED evaluation dataset)
Alarm
Bell
Ringing
Blender Cat Dishes Dog Electric
shave
toothbrush
Frying Running
water
Speech Vacuum
cleaner
Schmid_CPJKU_task4_4 Ensemble_15 ATST, BEATs, PaSST Devtest Schmid2024 1.42 0.977 (0.975 - 0.978) 0.935 (0.931 - 0.941) 0.976 (0.975 - 0.976) 0.888 (0.887 - 0.889) 0.973 (0.972 - 0.974) 0.929 (0.926 - 0.931) 0.905 (0.904 - 0.907) 0.776 (0.774 - 0.779) 0.945 (0.944 - 0.945) 0.949 (0.946 - 0.953)
Schmid_CPJKU_task4_3 Ensemble_18 ATST, BEATs, PaSST Schmid2024 1.39 0.975 (0.975 - 0.976) 0.934 (0.930 - 0.938) 0.975 (0.975 - 0.976) 0.886 (0.884 - 0.888) 0.976 (0.975 - 0.976) 0.930 (0.930 - 0.931) 0.906 (0.903 - 0.909) 0.772 (0.770 - 0.775) 0.943 (0.942 - 0.944) 0.949 (0.949 - 0.950)
Schmid_CPJKU_task4_2 ATST S2.I2 Devtest Schmid2024 1.35 0.961 (0.959 - 0.963) 0.908 (0.904 - 0.910) 0.977 (0.972 - 0.981) 0.878 (0.872 - 0.883) 0.969 (0.967 - 0.971) 0.905 (0.895 - 0.918) 0.836 (0.825 - 0.845) 0.720 (0.708 - 0.732) 0.938 (0.937 - 0.939) 0.923 (0.918 - 0.928)
Nam_KAIST_task4_4 NAM_SED_4 Nam2024 1.35 0.959 (0.958 - 0.960) 0.885 (0.885 - 0.885) 0.973 (0.972 - 0.973) 0.858 (0.857 - 0.858) 0.967 (0.966 - 0.967) 0.924 (0.923 - 0.925) 0.861 (0.861 - 0.861) 0.783 (0.782 - 0.783) 0.917 (0.917 - 0.917) 0.945 (0.945 - 0.946)
Nam_KAIST_task4_3 NAM_SED_3 Nam2024 1.35 0.959 (0.959 - 0.959) 0.887 (0.886 - 0.888) 0.971 (0.971 - 0.971) 0.857 (0.857 - 0.859) 0.967 (0.966 - 0.967) 0.924 (0.923 - 0.925) 0.861 (0.861 - 0.862) 0.779 (0.778 - 0.781) 0.918 (0.918 - 0.918) 0.945 (0.945 - 0.946)
Nam_KAIST_task4_2 NAM_SED_2 Nam2024 1.32 0.955 (0.953 - 0.956) 0.872 (0.868 - 0.876) 0.965 (0.963 - 0.967) 0.837 (0.834 - 0.839) 0.962 (0.960 - 0.964) 0.912 (0.906 - 0.917) 0.843 (0.839 - 0.846) 0.742 (0.737 - 0.748) 0.914 (0.911 - 0.918) 0.943 (0.940 - 0.947)
Schmid_CPJKU_task4_1 ATST S2.I2 Schmid2024 1.31 0.960 (0.957 - 0.963) 0.913 (0.909 - 0.917) 0.977 (0.975 - 0.980) 0.867 (0.863 - 0.871) 0.974 (0.973 - 0.975) 0.915 (0.906 - 0.920) 0.822 (0.814 - 0.827) 0.709 (0.698 - 0.717) 0.937 (0.934 - 0.939) 0.923 (0.921 - 0.926)
Nam_KAIST_task4_1 NAM_SED_1 Nam2024 1.31 0.954 (0.953 - 0.956) 0.869 (0.862 - 0.873) 0.966 (0.964 - 0.968) 0.840 (0.836 - 0.843) 0.961 (0.960 - 0.962) 0.908 (0.904 - 0.911) 0.836 (0.832 - 0.839) 0.745 (0.742 - 0.748) 0.919 (0.916 - 0.923) 0.941 (0.936 - 0.944)
Zhang_BUPT_task4_2 ensemble_model Yue2024 1.27 0.960 (0.959 - 0.961) 0.888 (0.883 - 0.892) 0.979 (0.979 - 0.979) 0.875 (0.874 - 0.876) 0.971 (0.970 - 0.972) 0.919 (0.900 - 0.932) 0.857 (0.848 - 0.863) 0.737 (0.734 - 0.742) 0.967 (0.967 - 0.968) 0.927 (0.924 - 0.929)
Chen_NCUT_task4_4 Chen_NCUT_SED_system_4 Chen2024a 1.25 0.976 (0.976 - 0.976) 0.883 (0.883 - 0.883) 0.977 (0.977 - 0.977) 0.858 (0.858 - 0.858) 0.976 (0.976 - 0.976) 0.922 (0.922 - 0.923) 0.886 (0.886 - 0.887) 0.821 (0.820 - 0.822) 0.960 (0.960 - 0.960) 0.947 (0.947 - 0.947)
Chen_CHT_task4_3 Chen_CHT_task4_3 Chen2024 1.25 0.970 (0.970 - 0.971) 0.877 (0.875 - 0.879) 0.978 (0.977 - 0.979) 0.872 (0.869 - 0.874) 0.976 (0.975 - 0.976) 0.941 (0.941 - 0.943) 0.904 (0.900 - 0.907) 0.840 (0.838 - 0.842) 0.958 (0.957 - 0.959) 0.959 (0.958 - 0.960)
Zhang_BUPT_task4_1 single_model Yue2024 1.23 0.929 (0.926 - 0.931) 0.865 (0.865 - 0.866) 0.966 (0.965 - 0.966) 0.857 (0.856 - 0.858) 0.957 (0.957 - 0.958) 0.898 (0.897 - 0.899) 0.859 (0.858 - 0.860) 0.682 (0.679 - 0.686) 0.938 (0.938 - 0.938) 0.924 (0.924 - 0.925)
Chen_CHT_task4_1 Chen_CHT_task4_1 Chen2024 1.23 0.923 (0.900 - 0.942) 0.857 (0.851 - 0.862) 0.970 (0.968 - 0.973) 0.791 (0.780 - 0.798) 0.928 (0.926 - 0.930) 0.924 (0.912 - 0.938) 0.884 (0.870 - 0.903) 0.716 (0.688 - 0.754) 0.939 (0.936 - 0.944) 0.945 (0.938 - 0.951)
Kim_GIST-HanwhaVision_task4_1 DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature Son2024 1.23 0.971 (0.967 - 0.974) 0.895 (0.890 - 0.902) 0.970 (0.967 - 0.974) 0.865 (0.858 - 0.874) 0.971 (0.968 - 0.973) 0.935 (0.932 - 0.937) 0.894 (0.884 - 0.901) 0.791 (0.785 - 0.796) 0.926 (0.922 - 0.928) 0.942 (0.940 - 0.943)
Chen_CHT_task4_2 Chen_CHT_task4_2 Chen2024 1.23 0.970 (0.970 - 0.971) 0.877 (0.875 - 0.879) 0.978 (0.977 - 0.979) 0.872 (0.869 - 0.874) 0.976 (0.975 - 0.976) 0.941 (0.941 - 0.943) 0.904 (0.900 - 0.907) 0.840 (0.838 - 0.842) 0.958 (0.957 - 0.959) 0.959 (0.958 - 0.960)
Kim_GIST-HanwhaVision_task4_4 DCASE2024 ensemble model with mix Son2024 1.22 0.977 (0.976 - 0.977) 0.905 (0.901 - 0.908) 0.977 (0.975 - 0.979) 0.877 (0.873 - 0.879) 0.971 (0.966 - 0.974) 0.941 (0.940 - 0.943) 0.910 (0.906 - 0.913) 0.817 (0.814 - 0.819) 0.936 (0.932 - 0.940) 0.951 (0.949 - 0.953)
Kim_GIST-HanwhaVision_task4_2 DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature Son2024 1.21 0.976 (0.975 - 0.978) 0.905 (0.902 - 0.907) 0.975 (0.974 - 0.978) 0.874 (0.869 - 0.881) 0.969 (0.965 - 0.971) 0.939 (0.937 - 0.941) 0.907 (0.905 - 0.909) 0.812 (0.805 - 0.817) 0.934 (0.931 - 0.936) 0.952 (0.951 - 0.953)
Chen_NCUT_task4_3 Chen_NCUT_SED_system_3 Chen2024a 1.20 0.970 (0.970 - 0.970) 0.886 (0.886 - 0.887) 0.973 (0.973 - 0.973) 0.851 (0.851 - 0.851) 0.967 (0.966 - 0.967) 0.924 (0.924 - 0.924) 0.867 (0.867 - 0.867) 0.809 (0.809 - 0.809) 0.948 (0.948 - 0.948) 0.945 (0.945 - 0.945)
Chen_NCUT_task4_1 Chen_NCUT_SED_system_1 Chen2024a 1.20 0.962 (0.962 - 0.962) 0.860 (0.859 - 0.860) 0.975 (0.975 - 0.975) 0.839 (0.838 - 0.839) 0.978 (0.978 - 0.978) 0.884 (0.882 - 0.885) 0.902 (0.902 - 0.903) 0.756 (0.756 - 0.756) 0.955 (0.955 - 0.955) 0.943 (0.943 - 0.943)
LEE_KT_task4_4 Ensemble_FDY-Con_with_ATST_and_BEATs Lee2024 1.20 0.917 (0.917 - 0.917) 0.838 (0.838 - 0.838) 0.948 (0.948 - 0.948) 0.820 (0.820 - 0.820) 0.924 (0.924 - 0.924) 0.873 (0.873 - 0.873) 0.836 (0.836 - 0.836) 0.744 (0.744 - 0.745) 0.904 (0.904 - 0.904) 0.874 (0.874 - 0.874)
Chen_CHT_task4_4 Chen_CHT_task4_4 Chen2024 1.20 0.923 (0.900 - 0.942) 0.835 (0.835 - 0.835) 0.964 (0.957 - 0.972) 0.852 (0.852 - 0.852) 0.974 (0.974 - 0.974) 0.906 (0.906 - 0.906) 0.889 (0.889 - 0.889) 0.809 (0.809 - 0.809) 0.949 (0.949 - 0.949) 0.934 (0.934 - 0.934)
Chen_NCUT_task4_2 Chen_NCUT_SED_system_2 Chen2024a 1.19 0.954 (0.949 - 0.958) 0.851 (0.845 - 0.855) 0.959 (0.953 - 0.963) 0.809 (0.795 - 0.818) 0.964 (0.959 - 0.967) 0.898 (0.884 - 0.907) 0.869 (0.861 - 0.874) 0.765 (0.743 - 0.778) 0.950 (0.942 - 0.954) 0.926 (0.923 - 0.928)
LEE_KT_task4_1 CRNN-Con_with_ATST_BEATs Lee2024 1.19 0.923 (0.913 - 0.935) 0.840 (0.829 - 0.855) 0.936 (0.928 - 0.946) 0.804 (0.781 - 0.837) 0.918 (0.913 - 0.926) 0.878 (0.864 - 0.898) 0.826 (0.802 - 0.854) 0.715 (0.696 - 0.746) 0.908 (0.907 - 0.910) 0.866 (0.844 - 0.894)
Kim_GIST-HanwhaVision_task4_3 DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder Son2024 1.18 0.977 (0.975 - 0.978) 0.901 (0.900 - 0.902) 0.978 (0.976 - 0.979) 0.873 (0.868 - 0.878) 0.962 (0.945 - 0.974) 0.944 (0.941 - 0.948) 0.906 (0.902 - 0.910) 0.810 (0.801 - 0.818) 0.930 (0.921 - 0.940) 0.946 (0.943 - 0.950)
LEE_KT_task4_3 Ensemble_FDY-CON Lee2024 1.17 0.953 (0.953 - 0.953) 0.865 (0.865 - 0.865) 0.970 (0.970 - 0.970) 0.836 (0.836 - 0.836) 0.942 (0.942 - 0.942) 0.891 (0.891 - 0.891) 0.875 (0.875 - 0.875) 0.747 (0.747 - 0.747) 0.950 (0.950 - 0.950) 0.906 (0.906 - 0.906)
XIAO_FMSG-JLESS_task4_4 XIAO_FMSG-JLESS_task4_4_ENSEMBLE Xiao2024 1.17 0.966 (0.966 - 0.966) 0.884 (0.884 - 0.884) 0.968 (0.968 - 0.968) 0.880 (0.880 - 0.880) 0.972 (0.972 - 0.972) 0.937 (0.937 - 0.937) 0.906 (0.906 - 0.906) 0.804 (0.804 - 0.804) 0.907 (0.907 - 0.907) 0.928 (0.928 - 0.928)
LEE_KT_task4_2 FDY-CRNN_with_ATST_and_BEATs Lee2024 1.16 0.941 (0.927 - 0.960) 0.868 (0.849 - 0.899) 0.952 (0.940 - 0.967) 0.841 (0.828 - 0.861) 0.939 (0.932 - 0.943) 0.861 (0.840 - 0.888) 0.852 (0.826 - 0.886) 0.733 (0.703 - 0.763) 0.942 (0.936 - 0.948) 0.884 (0.869 - 0.908)
Baseline DCASE2024 baseline system Cornell2024 1.13 0.942 (0.937 - 0.948) 0.853 (0.844 - 0.858) 0.967 (0.965 - 0.969) 0.838 (0.829 - 0.846) 0.967 (0.965 - 0.970) 0.866 (0.861 - 0.874) 0.848 (0.843 - 0.855) 0.749 (0.723 - 0.776) 0.947 (0.946 - 0.949) 0.925 (0.918 - 0.933)
XIAO_FMSG-JLESS_task4_3 XIAO_FMSG-JLESS_task4_3_FDY Xiao2024 1.12 0.949 (0.949 - 0.949) 0.870 (0.870 - 0.870) 0.959 (0.959 - 0.959) 0.861 (0.861 - 0.861) 0.970 (0.970 - 0.970) 0.921 (0.921 - 0.921) 0.899 (0.899 - 0.899) 0.766 (0.766 - 0.766) 0.899 (0.899 - 0.899) 0.912 (0.912 - 0.912)
XIAO_FMSG-JLESS_task4_2 XIAO_FMSG-JLESS_task4_2_WIDE Xiao2024 1.12 0.967 (0.967 - 0.967) 0.868 (0.868 - 0.868) 0.963 (0.963 - 0.963) 0.850 (0.850 - 0.850) 0.971 (0.971 - 0.971) 0.934 (0.934 - 0.934) 0.884 (0.884 - 0.884) 0.808 (0.808 - 0.808) 0.923 (0.923 - 0.923) 0.926 (0.926 - 0.926)
Lyu_SCUT_task4_2 CCRN_BEATs_2 Lyu2024 1.10 0.931 (0.928 - 0.937) 0.867 (0.862 - 0.872) 0.967 (0.964 - 0.969) 0.823 (0.818 - 0.829) 0.962 (0.959 - 0.963) 0.908 (0.905 - 0.913) 0.835 (0.826 - 0.844) 0.702 (0.687 - 0.723) 0.933 (0.930 - 0.936) 0.930 (0.925 - 0.935)
Lyu_SCUT_task4_1 CCRN_BEATs_1 Lyu2024 1.08 0.945 (0.936 - 0.951) 0.860 (0.851 - 0.870) 0.944 (0.942 - 0.946) 0.810 (0.800 - 0.820) 0.958 (0.957 - 0.960) 0.880 (0.876 - 0.884) 0.834 (0.827 - 0.840) 0.733 (0.707 - 0.752) 0.930 (0.926 - 0.933) 0.925 (0.920 - 0.931)
Niu_XJU_task4_1 DCASE2024 SED system Niu2024 1.07 0.920 (0.917 - 0.924) 0.846 (0.845 - 0.849) 0.949 (0.949 - 0.949) 0.807 (0.804 - 0.812) 0.941 (0.937 - 0.946) 0.901 (0.899 - 0.905) 0.857 (0.852 - 0.861) 0.778 (0.746 - 0.796) 0.936 (0.935 - 0.938) 0.918 (0.914 - 0.923)
XIAO_FMSG-JLESS_task4_1 XIAO_FMSG-JLESS_task4_1_ORL Xiao2024 1.06 0.954 (0.954 - 0.954) 0.874 (0.874 - 0.874) 0.966 (0.966 - 0.966) 0.868 (0.868 - 0.868) 0.972 (0.972 - 0.972) 0.934 (0.934 - 0.934) 0.901 (0.901 - 0.901) 0.796 (0.796 - 0.796) 0.926 (0.926 - 0.926) 0.904 (0.904 - 0.904)
Cai_USTC_task4_2 MAT-SED-CNN Cai2024 0.63 0.953 (0.953 - 0.953) 0.847 (0.846 - 0.848) 0.972 (0.972 - 0.972) 0.842 (0.842 - 0.842) 0.948 (0.948 - 0.948) 0.914 (0.914 - 0.914) 0.885 (0.885 - 0.885) 0.702 (0.701 - 0.703) 0.937 (0.937 - 0.937) 0.939 (0.939 - 0.939)
Cai_USTC_task4_1 MAT-SED Cai2024 0.61 0.959 (0.959 - 0.959) 0.830 (0.824 - 0.835) 0.975 (0.975 - 0.975) 0.849 (0.849 - 0.849) 0.964 (0.964 - 0.964) 0.910 (0.909 - 0.910) 0.890 (0.890 - 0.891) 0.769 (0.768 - 0.770) 0.942 (0.942 - 0.943) 0.927 (0.925 - 0.929)
Cai_USTC_task4_4 MAT-ATST2 Cai2024 0.56 0.915 (0.915 - 0.915) 0.808 (0.807 - 0.808) 0.954 (0.954 - 0.954) 0.792 (0.792 - 0.792) 0.932 (0.932 - 0.932) 0.868 (0.867 - 0.868) 0.877 (0.877 - 0.877) 0.668 (0.668 - 0.668) 0.932 (0.931 - 0.932) 0.903 (0.903 - 0.904)
Cai_USTC_task4_3 MAT-ATST Cai2024 0.47 0.906 (0.905 - 0.907) 0.788 (0.786 - 0.789) 0.942 (0.939 - 0.946) 0.760 (0.758 - 0.763) 0.924 (0.923 - 0.925) 0.851 (0.850 - 0.852) 0.874 (0.874 - 0.875) 0.657 (0.656 - 0.657) 0.905 (0.901 - 0.908) 0.891 (0.891 - 0.892)
Huang_SJTU_task4_1 pl_mtl_ensemble Huang2024 0.20 0.004 (0.003 - 0.005) 0.009 (0.003 - 0.017) 0.049 (0.048 - 0.050) 0.062 (0.047 - 0.081) 0.034 (0.007 - 0.072) 0.005 (0.004 - 0.005) 0.376 (0.352 - 0.397) 0.020 (0.020 - 0.020) 0.092 (0.074 - 0.111) 0.000 (0.000 - 0.001)
Huang_SJTU_task4_3 pl_mtl_ensemble Huang2024 0.17 0.053 (0.040 - 0.072) 0.002 (0.000 - 0.003) 0.015 (0.004 - 0.022) 0.259 (0.223 - 0.300) 0.078 (0.065 - 0.085) 0.021 (0.003 - 0.034) 0.088 (0.045 - 0.137) 0.015 (0.014 - 0.015) 0.161 (0.089 - 0.224) 0.001 (0.001 - 0.001)
Huang_SJTU_task4_2 pl_mtl_ensemble Huang2024 0.15 0.007 (0.005 - 0.009) 0.032 (0.005 - 0.065) 0.029 (0.005 - 0.048) 0.014 (0.004 - 0.021) 0.058 (0.009 - 0.134) 0.005 (0.004 - 0.006) 0.352 (0.318 - 0.378) 0.022 (0.019 - 0.023) 0.136 (0.096 - 0.175) 0.000 (0.000 - 0.000)
Huang_SJTU_task4_4 pl_mtl_single Huang2024 1.20 0.941 (0.938 - 0.943) 0.855 (0.841 - 0.867) 0.967 (0.966 - 0.968) 0.840 (0.836 - 0.844) 0.972 (0.970 - 0.974) 0.909 (0.903 - 0.916) 0.865 (0.855 - 0.878) 0.748 (0.738 - 0.761) 0.957 (0.956 - 0.958) 0.923 (0.922 - 0.924)

MAESTRO mpAUC

Rank Submission
code
Submission
name
Technical
Report
Ranking score
(Evaluation dataset)
Birds
singing
Brakes
squeaking
Car Children
voices
Cutlery
and
dishes
Footsteps Large
vehicle
Metro
approaching
Metro
leaving
People
talking
Wind
blowing
Schmid_CPJKU_task4_4 Ensemble_15 ATST, BEATs, PaSST Devtest Schmid2024 1.42 0.921 (0.917 - 0.924) 0.430 (0.411 - 0.460) 0.920 (0.918 - 0.923) 0.688 (0.678 - 0.699) 0.743 (0.729 - 0.752) 0.725 (0.722 - 0.726) 0.638 (0.635 - 0.642) 0.879 (0.878 - 0.880) 0.849 (0.844 - 0.855) 0.850 (0.849 - 0.850) 0.487 (0.461 - 0.510)
Schmid_CPJKU_task4_3 Ensemble_18 ATST, BEATs, PaSST Schmid2024 1.39 0.907 (0.903 - 0.912) 0.489 (0.481 - 0.496) 0.916 (0.913 - 0.920) 0.615 (0.592 - 0.628) 0.725 (0.720 - 0.733) 0.718 (0.715 - 0.723) 0.588 (0.583 - 0.592) 0.856 (0.852 - 0.859) 0.828 (0.826 - 0.831) 0.843 (0.842 - 0.845) 0.382 (0.361 - 0.401)
Schmid_CPJKU_task4_2 ATST S2.I2 Devtest Schmid2024 1.35 0.886 (0.872 - 0.902) 0.499 (0.479 - 0.520) 0.888 (0.875 - 0.896) 0.594 (0.539 - 0.640) 0.635 (0.628 - 0.640) 0.710 (0.704 - 0.719) 0.655 (0.643 - 0.670) 0.853 (0.841 - 0.866) 0.808 (0.795 - 0.826) 0.829 (0.823 - 0.837) 0.466 (0.393 - 0.524)
Nam_KAIST_task4_4 NAM_SED_4 Nam2024 1.35 0.916 (0.916 - 0.917) 0.617 (0.615 - 0.620) 0.924 (0.924 - 0.924) 0.748 (0.747 - 0.748) 0.604 (0.603 - 0.606) 0.717 (0.717 - 0.717) 0.561 (0.561 - 0.562) 0.843 (0.842 - 0.843) 0.833 (0.833 - 0.833) 0.868 (0.868 - 0.868) 0.556 (0.554 - 0.559)
Nam_KAIST_task4_3 NAM_SED_3 Nam2024 1.35 0.916 (0.916 - 0.916) 0.612 (0.609 - 0.616) 0.924 (0.924 - 0.925) 0.746 (0.746 - 0.748) 0.603 (0.601 - 0.605) 0.718 (0.718 - 0.718) 0.558 (0.557 - 0.559) 0.844 (0.844 - 0.845) 0.833 (0.832 - 0.834) 0.868 (0.868 - 0.868) 0.561 (0.559 - 0.564)
Nam_KAIST_task4_2 NAM_SED_2 Nam2024 1.32 0.907 (0.905 - 0.908) 0.606 (0.594 - 0.618) 0.916 (0.908 - 0.924) 0.732 (0.719 - 0.751) 0.579 (0.565 - 0.592) 0.686 (0.675 - 0.701) 0.567 (0.560 - 0.571) 0.837 (0.830 - 0.847) 0.834 (0.831 - 0.838) 0.853 (0.848 - 0.856) 0.598 (0.583 - 0.618)
Schmid_CPJKU_task4_1 ATST S2.I2 Schmid2024 1.31 0.869 (0.847 - 0.885) 0.467 (0.453 - 0.487) 0.885 (0.869 - 0.900) 0.405 (0.391 - 0.416) 0.616 (0.571 - 0.669) 0.723 (0.719 - 0.726) 0.611 (0.589 - 0.627) 0.815 (0.809 - 0.822) 0.795 (0.784 - 0.806) 0.825 (0.818 - 0.830) 0.380 (0.378 - 0.382)
Nam_KAIST_task4_1 NAM_SED_1 Nam2024 1.31 0.912 (0.908 - 0.916) 0.587 (0.552 - 0.637) 0.925 (0.919 - 0.932) 0.730 (0.718 - 0.743) 0.564 (0.530 - 0.598) 0.684 (0.675 - 0.695) 0.521 (0.496 - 0.540) 0.828 (0.821 - 0.834) 0.819 (0.818 - 0.822) 0.852 (0.849 - 0.854) 0.565 (0.519 - 0.593)
Zhang_BUPT_task4_2 ensemble_model Yue2024 1.27 0.905 (0.897 - 0.915) 0.707 (0.693 - 0.717) 0.903 (0.899 - 0.905) 0.659 (0.653 - 0.662) 0.349 (0.331 - 0.366) 0.642 (0.639 - 0.645) 0.501 (0.496 - 0.507) 0.837 (0.835 - 0.840) 0.802 (0.801 - 0.804) 0.834 (0.831 - 0.839) 0.461 (0.435 - 0.480)
Chen_NCUT_task4_4 Chen_NCUT_SED_system_4 Chen2024a 1.25 0.874 (0.874 - 0.874) 0.474 (0.474 - 0.474) 0.907 (0.907 - 0.907) 0.657 (0.656 - 0.657) 0.598 (0.597 - 0.598) 0.663 (0.663 - 0.663) 0.531 (0.531 - 0.532) 0.850 (0.850 - 0.851) 0.821 (0.821 - 0.821) 0.849 (0.849 - 0.849) 0.303 (0.302 - 0.305)
Chen_CHT_task4_3 Chen_CHT_task4_3 Chen2024 1.25 0.886 (0.886 - 0.887) 0.504 (0.475 - 0.521) 0.920 (0.919 - 0.920) 0.684 (0.678 - 0.690) 0.672 (0.666 - 0.677) 0.691 (0.688 - 0.693) 0.539 (0.525 - 0.555) 0.872 (0.868 - 0.874) 0.859 (0.857 - 0.860) 0.828 (0.826 - 0.828) 0.365 (0.352 - 0.373)
Zhang_BUPT_task4_1 single_model Yue2024 1.23 0.903 (0.902 - 0.903) 0.727 (0.727 - 0.728) 0.902 (0.901 - 0.903) 0.671 (0.671 - 0.671) 0.417 (0.415 - 0.420) 0.640 (0.640 - 0.642) 0.518 (0.517 - 0.519) 0.833 (0.833 - 0.834) 0.804 (0.804 - 0.804) 0.833 (0.832 - 0.833) 0.498 (0.496 - 0.501)
Chen_CHT_task4_1 Chen_CHT_task4_1 Chen2024 1.23 0.867 (0.859 - 0.873) 0.691 (0.633 - 0.766) 0.897 (0.891 - 0.901) 0.663 (0.637 - 0.690) 0.765 (0.753 - 0.775) 0.666 (0.662 - 0.673) 0.542 (0.496 - 0.595) 0.871 (0.858 - 0.883) 0.850 (0.839 - 0.857) 0.812 (0.807 - 0.816) 0.442 (0.418 - 0.479)
Kim_GIST-HanwhaVision_task4_1 DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature Son2024 1.23 0.871 (0.855 - 0.883) 0.202 (0.155 - 0.253) 0.906 (0.889 - 0.920) 0.633 (0.583 - 0.666) 0.717 (0.691 - 0.747) 0.614 (0.554 - 0.665) 0.592 (0.558 - 0.617) 0.839 (0.825 - 0.847) 0.792 (0.779 - 0.803) 0.805 (0.797 - 0.813) 0.342 (0.278 - 0.382)
Chen_CHT_task4_2 Chen_CHT_task4_2 Chen2024 1.23 0.872 (0.872 - 0.872) 0.491 (0.229 - 0.659) 0.905 (0.905 - 0.905) 0.680 (0.680 - 0.680) 0.558 (0.558 - 0.558) 0.690 (0.690 - 0.690) 0.542 (0.496 - 0.595) 0.852 (0.852 - 0.852) 0.854 (0.854 - 0.854) 0.842 (0.842 - 0.842) 0.315 (0.315 - 0.315)
Kim_GIST-HanwhaVision_task4_4 DCASE2024 ensemble model with mix Son2024 1.22 0.861 (0.856 - 0.867) 0.166 (0.145 - 0.186) 0.902 (0.901 - 0.904) 0.470 (0.392 - 0.542) 0.704 (0.685 - 0.734) 0.623 (0.614 - 0.635) 0.520 (0.477 - 0.551) 0.830 (0.828 - 0.833) 0.791 (0.778 - 0.801) 0.770 (0.730 - 0.804) 0.380 (0.347 - 0.431)
Kim_GIST-HanwhaVision_task4_2 DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature Son2024 1.21 0.857 (0.850 - 0.867) 0.161 (0.154 - 0.168) 0.902 (0.890 - 0.913) 0.435 (0.348 - 0.500) 0.684 (0.636 - 0.746) 0.631 (0.605 - 0.650) 0.518 (0.476 - 0.548) 0.834 (0.825 - 0.843) 0.801 (0.799 - 0.802) 0.743 (0.726 - 0.765) 0.358 (0.347 - 0.366)
Chen_NCUT_task4_3 Chen_NCUT_SED_system_3 Chen2024a 1.20 0.862 (0.862 - 0.862) 0.358 (0.357 - 0.358) 0.871 (0.871 - 0.871) 0.717 (0.716 - 0.717) 0.601 (0.601 - 0.601) 0.663 (0.663 - 0.663) 0.537 (0.537 - 0.538) 0.853 (0.853 - 0.853) 0.826 (0.826 - 0.827) 0.822 (0.822 - 0.822) 0.313 (0.311 - 0.314)
Chen_NCUT_task4_1 Chen_NCUT_SED_system_1 Chen2024a 1.20 0.813 (0.813 - 0.813) 0.458 (0.458 - 0.458) 0.905 (0.905 - 0.905) 0.596 (0.596 - 0.596) 0.712 (0.712 - 0.712) 0.648 (0.648 - 0.648) 0.523 (0.523 - 0.523) 0.824 (0.824 - 0.824) 0.746 (0.746 - 0.746) 0.820 (0.820 - 0.820) 0.296 (0.296 - 0.297)
LEE_KT_task4_4 Ensemble_FDY-Con_with_ATST_and_BEATs Lee2024 1.20 0.877 (0.877 - 0.877) 0.689 (0.689 - 0.689) 0.891 (0.891 - 0.891) 0.622 (0.622 - 0.622) 0.621 (0.621 - 0.621) 0.682 (0.682 - 0.682) 0.447 (0.447 - 0.447) 0.822 (0.822 - 0.822) 0.800 (0.800 - 0.800) 0.846 (0.846 - 0.846) 0.292 (0.292 - 0.292)
Chen_CHT_task4_4 Chen_CHT_task4_4 Chen2024 1.20 0.872 (0.872 - 0.872) 0.491 (0.229 - 0.659) 0.905 (0.905 - 0.905) 0.680 (0.680 - 0.680) 0.558 (0.558 - 0.558) 0.690 (0.690 - 0.690) 0.542 (0.496 - 0.595) 0.852 (0.852 - 0.852) 0.854 (0.854 - 0.854) 0.842 (0.842 - 0.842) 0.315 (0.315 - 0.315)
Chen_NCUT_task4_2 Chen_NCUT_SED_system_2 Chen2024a 1.19 0.871 (0.868 - 0.872) 0.559 (0.550 - 0.565) 0.875 (0.871 - 0.878) 0.622 (0.619 - 0.623) 0.483 (0.470 - 0.491) 0.609 (0.607 - 0.611) 0.513 (0.511 - 0.515) 0.824 (0.818 - 0.828) 0.802 (0.801 - 0.804) 0.838 (0.834 - 0.841) 0.322 (0.300 - 0.335)
LEE_KT_task4_1 CRNN-Con_with_ATST_BEATs Lee2024 1.19 0.868 (0.859 - 0.878) 0.681 (0.637 - 0.722) 0.865 (0.863 - 0.868) 0.589 (0.536 - 0.646) 0.574 (0.567 - 0.581) 0.661 (0.650 - 0.672) 0.460 (0.444 - 0.472) 0.810 (0.806 - 0.817) 0.787 (0.777 - 0.797) 0.843 (0.820 - 0.860) 0.380 (0.376 - 0.386)
Kim_GIST-HanwhaVision_task4_3 DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder Son2024 1.18 0.864 (0.860 - 0.868) 0.181 (0.168 - 0.199) 0.896 (0.891 - 0.899) 0.457 (0.380 - 0.545) 0.726 (0.691 - 0.753) 0.588 (0.586 - 0.591) 0.514 (0.470 - 0.549) 0.824 (0.820 - 0.830) 0.783 (0.764 - 0.796) 0.778 (0.743 - 0.818) 0.402 (0.377 - 0.441)
LEE_KT_task4_3 Ensemble_FDY-CON Lee2024 1.17 0.883 (0.883 - 0.883) 0.641 (0.641 - 0.641) 0.853 (0.853 - 0.853) 0.684 (0.684 - 0.684) 0.587 (0.587 - 0.587) 0.709 (0.709 - 0.709) 0.461 (0.461 - 0.461) 0.809 (0.809 - 0.809) 0.821 (0.821 - 0.821) 0.854 (0.854 - 0.854) 0.306 (0.306 - 0.306)
XIAO_FMSG-JLESS_task4_4 XIAO_FMSG-JLESS_task4_4_ENSEMBLE Xiao2024 1.17 0.808 (0.808 - 0.808) 0.056 (0.056 - 0.056) 0.834 (0.834 - 0.834) 0.689 (0.689 - 0.689) 0.576 (0.576 - 0.576) 0.465 (0.465 - 0.465) 0.317 (0.317 - 0.317) 0.709 (0.709 - 0.709) 0.660 (0.660 - 0.660) 0.668 (0.668 - 0.668) 0.441 (0.441 - 0.441)
LEE_KT_task4_2 FDY-CRNN_with_ATST_and_BEATs Lee2024 1.16 0.877 (0.874 - 0.882) 0.567 (0.538 - 0.588) 0.854 (0.841 - 0.863) 0.635 (0.557 - 0.690) 0.539 (0.475 - 0.617) 0.657 (0.624 - 0.684) 0.471 (0.442 - 0.488) 0.800 (0.751 - 0.844) 0.795 (0.783 - 0.805) 0.847 (0.836 - 0.863) 0.399 (0.342 - 0.451)
Baseline DCASE2024 baseline system Cornell2024 1.13 0.837 (0.829 - 0.849) 0.338 (0.288 - 0.405) 0.902 (0.901 - 0.903) 0.588 (0.575 - 0.596) 0.605 (0.585 - 0.618) 0.640 (0.625 - 0.664) 0.497 (0.472 - 0.532) 0.828 (0.824 - 0.831) 0.792 (0.763 - 0.811) 0.814 (0.808 - 0.821) 0.270 (0.266 - 0.275)
XIAO_FMSG-JLESS_task4_3 XIAO_FMSG-JLESS_task4_3_FDY Xiao2024 1.12 0.789 (0.789 - 0.789) 0.073 (0.073 - 0.073) 0.773 (0.773 - 0.773) 0.636 (0.636 - 0.636) 0.469 (0.469 - 0.469) 0.405 (0.405 - 0.405) 0.332 (0.332 - 0.332) 0.730 (0.730 - 0.730) 0.702 (0.702 - 0.702) 0.721 (0.721 - 0.721) 0.454 (0.454 - 0.454)
XIAO_FMSG-JLESS_task4_2 XIAO_FMSG-JLESS_task4_2_WIDE Xiao2024 1.12 0.759 (0.759 - 0.759) 0.210 (0.210 - 0.210) 0.784 (0.784 - 0.784) 0.552 (0.552 - 0.552) 0.466 (0.466 - 0.466) 0.407 (0.407 - 0.407) 0.284 (0.284 - 0.284) 0.689 (0.689 - 0.689) 0.608 (0.608 - 0.608) 0.591 (0.591 - 0.591) 0.477 (0.477 - 0.477)
Lyu_SCUT_task4_2 CCRN_BEATs_2 Lyu2024 1.10 0.807 (0.797 - 0.822) 0.277 (0.217 - 0.341) 0.885 (0.876 - 0.898) 0.601 (0.581 - 0.622) 0.429 (0.378 - 0.497) 0.631 (0.606 - 0.655) 0.439 (0.424 - 0.451) 0.811 (0.807 - 0.817) 0.803 (0.773 - 0.822) 0.806 (0.803 - 0.809) 0.244 (0.180 - 0.291)
Lyu_SCUT_task4_1 CCRN_BEATs_1 Lyu2024 1.08 0.822 (0.808 - 0.833) 0.197 (0.139 - 0.274) 0.884 (0.876 - 0.888) 0.551 (0.535 - 0.569) 0.549 (0.506 - 0.583) 0.607 (0.596 - 0.619) 0.436 (0.414 - 0.459) 0.796 (0.793 - 0.798) 0.802 (0.773 - 0.823) 0.778 (0.764 - 0.789) 0.201 (0.149 - 0.243)
Niu_XJU_task4_1 DCASE2024 SED system Niu2024 1.07 0.792 (0.784 - 0.807) 0.078 (0.062 - 0.105) 0.880 (0.871 - 0.896) 0.623 (0.603 - 0.657) 0.556 (0.543 - 0.578) 0.595 (0.587 - 0.599) 0.432 (0.428 - 0.438) 0.796 (0.795 - 0.797) 0.745 (0.737 - 0.759) 0.767 (0.755 - 0.775) 0.370 (0.327 - 0.395)
XIAO_FMSG-JLESS_task4_1 XIAO_FMSG-JLESS_task4_1_ORL Xiao2024 1.06 0.621 (0.621 - 0.621) 0.053 (0.053 - 0.053) 0.784 (0.784 - 0.784) 0.673 (0.673 - 0.673) 0.532 (0.532 - 0.532) 0.409 (0.409 - 0.409) 0.322 (0.322 - 0.322) 0.571 (0.571 - 0.571) 0.532 (0.532 - 0.532) 0.541 (0.541 - 0.541) 0.356 (0.356 - 0.356)
Cai_USTC_task4_2 MAT-SED-CNN Cai2024 0.63 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050)
Cai_USTC_task4_1 MAT-SED Cai2024 0.61 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050)
Cai_USTC_task4_4 MAT-ATST2 Cai2024 0.56 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050)
Cai_USTC_task4_3 MAT-ATST Cai2024 0.47 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050) 0.050 (0.050 - 0.050)
Huang_SJTU_task4_1 pl_mtl_ensemble Huang2024 0.20 0.002 (0.000 - 0.006) 0.001 (0.000 - 0.002) 0.791 (0.725 - 0.855) 0.005 (0.000 - 0.014) 0.001 (0.000 - 0.001) 0.119 (0.046 - 0.196) 0.020 (0.020 - 0.020) 0.000 (0.000 - 0.000) 0.583 (0.568 - 0.594) 0.633 (0.581 - 0.676) 0.000 (0.000 - 0.000)
Huang_SJTU_task4_3 pl_mtl_ensemble Huang2024 0.17 0.022 (0.010 - 0.043) 0.125 (0.123 - 0.129) 0.752 (0.679 - 0.824) 0.230 (0.109 - 0.388) 0.000 (0.000 - 0.000) 0.047 (0.013 - 0.088) 0.019 (0.014 - 0.021) 0.000 (0.000 - 0.000) 0.528 (0.323 - 0.652) 0.153 (0.096 - 0.228) 0.018 (0.000 - 0.050)
Huang_SJTU_task4_2 pl_mtl_ensemble Huang2024 0.15 0.010 (0.000 - 0.026) 0.000 (0.000 - 0.000) 0.532 (0.499 - 0.585) 0.096 (0.002 - 0.245) 0.000 (0.000 - 0.000) 0.007 (0.001 - 0.015) 0.025 (0.017 - 0.032) 0.000 (0.000 - 0.000) 0.246 (0.227 - 0.270) 0.719 (0.661 - 0.781) 0.000 (0.000 - 0.000)
Huang_SJTU_task4_4 pl_mtl_single Huang2024 1.20 0.850 (0.842 - 0.858) 0.445 (0.403 - 0.485) 0.889 (0.878 - 0.898) 0.686 (0.676 - 0.695) 0.581 (0.561 - 0.598) 0.657 (0.646 - 0.669) 0.551 (0.521 - 0.584) 0.836 (0.823 - 0.851) 0.806 (0.802 - 0.810) 0.827 (0.818 - 0.838) 0.328 (0.300 - 0.368)

Energy Consumption

Rank Submission
code
Submission
name
Technical
Report

Ranking score
(Evaluation dataset)

PSDS
(DESED evaluation dataset)

mpAUC
(MAESTRO evaluation dataset)

Energy (kWh)
(training, normalized)

Energy (kWh)
(GPU, training, normalized)

Energy (kWh)
(test, normalized)

Energy (kWh)
(GPU, test, normalized)

EW-PSDS
(DESED, training energy)

EW-mpAUC
(MAESTRO, training energy)
Schmid_CPJKU_task4_2 ATST S2.I2 Devtest Schmid2024 1.35 0.646 (0.640 - 0.654) 0.711 (0.704 - 0.717) 3.461 1.302 0.059 0.014 0.517 0.572
Nam_KAIST_task4_2 NAM_SED_2 Nam2024 1.32 0.586 (0.585 - 0.589) 0.738 (0.732 - 0.745) 12.504 10.242 0.119 0.012 0.056 0.071
Schmid_CPJKU_task4_1 ATST S2.I2 Schmid2024 1.31 0.644 (0.640 - 0.647) 0.672 (0.669 - 0.676) 3.461 1.302 0.059 0.014 0.515 0.540
Nam_KAIST_task4_1 NAM_SED_1 Nam2024 1.31 0.584 (0.582 - 0.587) 0.726 (0.720 - 0.733) 12.504 10.232 0.042 0.034 0.056 0.070
Zhang_BUPT_task4_1 single_model Yue2024 1.23 0.523 (0.523 - 0.524) 0.704 (0.704 - 0.705) 11.596 8.459 0.061 0.036 0.125 0.167
Chen_CHT_task4_1 Chen_CHT_task4_1 Chen2024 1.23 0.495 (0.486 - 0.503) 0.733 (0.730 - 0.739) 8.517 0.086 0.527 0.773
Kim_GIST-HanwhaVision_task4_1 DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature Son2024 1.23 0.567 (0.558 - 0.573) 0.665 (0.646 - 0.677) 8.777 3.956 0.176 0.050 0.122 0.144
Chen_CHT_task4_2 Chen_CHT_task4_2 Chen2024 1.23 0.527 (0.524 - 0.530) 0.691 (0.663 - 0.708) 49.984 1.075 0.552 0.714
Chen_NCUT_task4_3 Chen_NCUT_SED_system_3 Chen2024a 1.20 0.526 (0.524 - 0.527) 0.675 (0.675 - 0.675) 0.155 0.085 0.016 0.009 1.403 1.790
Chen_NCUT_task4_1 Chen_NCUT_SED_system_1 Chen2024a 1.20 0.525 (0.523 - 0.527) 0.667 (0.667 - 0.667) 0.260 0.175 0.011 0.005 0.838 1.056
Chen_NCUT_task4_2 Chen_NCUT_SED_system_2 Chen2024a 1.19 0.519 (0.485 - 0.537) 0.665 (0.659 - 0.669) 0.788 0.621 0.018 0.010 0.273 0.347
LEE_KT_task4_1 CRNN-Con_with_ATST_BEATs Lee2024 1.19 0.506 (0.482 - 0.548) 0.684 (0.672 - 0.693) 0.069 0.126 8.999 12.221
LEE_KT_task4_2 FDY-CRNN_with_ATST_and_BEATs Lee2024 1.16 0.474 (0.471 - 0.479) 0.676 (0.666 - 0.690) 0.102 0.151 5.810 8.181
Baseline DCASE2024 baseline system Cornell2024 1.13 0.475 (0.469 - 0.479) 0.646 (0.641 - 0.653) 0.946 0.113 0.119 0.013 0.483 0.648
XIAO_FMSG-JLESS_task4_3 XIAO_FMSG-JLESS_task4_3_FDY Xiao2024 1.12 0.574 (0.574 - 0.574) 0.553 (0.553 - 0.553) 4.140 1.910 0.100 0.040 0.188 0.182
XIAO_FMSG-JLESS_task4_2 XIAO_FMSG-JLESS_task4_2_WIDE Xiao2024 1.12 0.597 (0.597 - 0.597) 0.530 (0.530 - 0.530) 2.860 1.380 0.082 0.037 0.282 0.252
Lyu_SCUT_task4_2 CCRN_BEATs_2 Lyu2024 1.10 0.478 (0.474 - 0.481) 0.612 (0.596 - 0.624) 11.394 3.093 0.152 0.014 0.083 0.105
Lyu_SCUT_task4_1 CCRN_BEATs_1 Lyu2024 1.08 0.474 (0.469 - 0.482) 0.602 (0.586 - 0.619) 10.456 2.855 0.143 0.013 0.090 0.113
Niu_XJU_task4_1 DCASE2024 SED system Niu2024 1.07 0.465 (0.462 - 0.467) 0.603 (0.599 - 0.610) 3.273 1.606 0.062 0.018 0.177 0.227
XIAO_FMSG-JLESS_task4_1 XIAO_FMSG-JLESS_task4_1_ORL Xiao2024 1.06 0.575 (0.575 - 0.575) 0.490 (0.490 - 0.490) 1.790 0.856 0.045 0.012 0.434 0.373
Cai_USTC_task4_2 MAT-SED-CNN Cai2024 0.63 0.574 (0.573 - 0.574) 0.050 (0.050 - 0.050) 1.180 0.113 0.119 0.013 0.603 0.052
Cai_USTC_task4_1 MAT-SED Cai2024 0.61 0.561 (0.560 - 0.561) 0.050 (0.050 - 0.050) 1.180 0.113 0.119 0.013 0.590 0.052
Huang_SJTU_task4_4 pl_mtl_single Huang2024 1.20 0.519 (0.516 - 0.522) 0.678 (0.669 - 0.685) 1.358 0.664 0.039 0.016 0.475 0.616

System characteristics

General characteristics

Rank Code Technical
Report

Ranking score
(Evaluation dataset)

PSDS
(DESED evaluation dataset)

mpAUC
(MAESTRO evaluation dataset)
Data
augmentation
Features
Schmid_CPJKU_task4_4 Schmid2024 1.42 0.680 (0.679 - 0.682) 0.739 (0.736 - 0.742) Freq-MixStyle, Filter augmentation, Time shifting, Time masking, WavMix, Mixup, Device-Impulse-Response Augmentation, Frequency warping, Pitch Shifting log-mel energies
Schmid_CPJKU_task4_3 Schmid2024 1.39 0.676 (0.674 - 0.678) 0.715 (0.714 - 0.718) Freq-MixStyle, Filter augmentation, Time shifting, Time masking, WavMix, Mixup, Device-Impulse-Response Augmentation, Frequency warping, Pitch Shifting log-mel energies
Schmid_CPJKU_task4_2 Schmid2024 1.35 0.646 (0.640 - 0.654) 0.711 (0.704 - 0.717) Freq-MixStyle, Filter augmentation, Time shifting, Time masking, WavMix, Mixup, Device-Impulse-Response Augmentation, Frequency warping, Pitch Shifting log-mel energies
Nam_KAIST_task4_4 Nam2024 1.35 0.610 (0.609 - 0.611) 0.744 (0.744 - 0.745) Mixup, Frequency warping, Filter augmentation log-mel energies
Nam_KAIST_task4_3 Nam2024 1.35 0.610 (0.609 - 0.611) 0.744 (0.744 - 0.744) Mixup, Frequency warping, Filter augmentation log-mel energies
Nam_KAIST_task4_2 Nam2024 1.32 0.586 (0.585 - 0.589) 0.738 (0.732 - 0.745) Mixup, Frequency warping, Filter augmentation log-mel energies
Schmid_CPJKU_task4_1 Schmid2024 1.31 0.644 (0.640 - 0.647) 0.672 (0.669 - 0.676) Freq-MixStyle, Filter augmentation, Time shifting, Time masking, WavMix, Mixup, Device-Impulse-Response Augmentation, Frequency warping, Pitch Shifting log-mel energies
Nam_KAIST_task4_1 Nam2024 1.31 0.584 (0.582 - 0.587) 0.726 (0.720 - 0.733) Mixup, Frequency warping, Filter augmentation log-mel energies
Zhang_BUPT_task4_2 Yue2024 1.27 0.570 (0.566 - 0.573) 0.691 (0.691 - 0.691) Mixup, Time masking, Frequency masking log-mel energies
Chen_NCUT_task4_4 Chen2024a 1.25 0.565 (0.563 - 0.566) 0.684 (0.684 - 0.684) Mixup log-mel energies
Chen_CHT_task4_3 Chen2024 1.25 0.527 (0.524 - 0.530) 0.711 (0.709 - 0.712) Mixup, SpecAugment log-mel energies
Zhang_BUPT_task4_1 Yue2024 1.23 0.523 (0.523 - 0.524) 0.704 (0.704 - 0.705) Mixup, Time masking, Frequency masking log-mel energies
Chen_CHT_task4_1 Chen2024 1.23 0.495 (0.486 - 0.503) 0.733 (0.730 - 0.739) Mixup, SpecAugment log-mel energies
Kim_GIST-HanwhaVision_task4_1 Son2024 1.23 0.567 (0.558 - 0.573) 0.665 (0.646 - 0.677) Mixup, Frequency shift, Time shifting, Time masking, filter augmentation, Adding Gaussian noise log-mel energies, MFCC
Chen_CHT_task4_2 Chen2024 1.23 0.527 (0.524 - 0.530) 0.691 (0.663 - 0.708) Mixup, SpecAugment log-mel energies
Kim_GIST-HanwhaVision_task4_4 Son2024 1.22 0.586 (0.578 - 0.597) 0.638 (0.620 - 0.654) Mixup, Frequency shift, Time shifting, Time masking, filter augmentation, Adding Gaussian noise log-mel energies, MFCC
Kim_GIST-HanwhaVision_task4_2 Son2024 1.21 0.580 (0.560 - 0.599) 0.629 (0.620 - 0.639) Mixup, Frequency shift, Time shifting, Time masking, filter augmentation, Adding Gaussian noise log-mel energies, MFCC
Chen_NCUT_task4_3 Chen2024a 1.20 0.526 (0.524 - 0.527) 0.675 (0.675 - 0.675) Mixup, Frame shifting, time_mask, Filter augmentation, Frequency masking, Adding noise log-mel energies
Chen_NCUT_task4_1 Chen2024a 1.20 0.525 (0.523 - 0.527) 0.667 (0.667 - 0.667) Mixup, Frame shifting, time_mask, Filter augmentation, Frequency masking, Adding noise log-mel energies
LEE_KT_task4_4 Lee2024 1.20 0.509 (0.509 - 0.509) 0.690 (0.690 - 0.690) Frequency warping, Filter augmentation log-mel energies
Chen_CHT_task4_4 Chen2024 1.20 0.500 (0.498 - 0.504) 0.691 (0.663 - 0.708) Mixup, SpecAugment log-mel energies
Chen_NCUT_task4_2 Chen2024a 1.19 0.519 (0.485 - 0.537) 0.665 (0.659 - 0.669) Mixup, Frame shifting, time_mask, Filter augmentation, Frequency masking, Adding noise log-mel energies
LEE_KT_task4_1 Lee2024 1.19 0.506 (0.482 - 0.548) 0.684 (0.672 - 0.693) Frequency warping, Filter augmentation log-mel energies
Kim_GIST-HanwhaVision_task4_3 Son2024 1.18 0.542 (0.525 - 0.560) 0.637 (0.628 - 0.652) Mixup, Frequency shift, Time shifting, Time masking, filter augmentation, Adding Gaussian noise log-mel energies
LEE_KT_task4_3 Lee2024 1.17 0.468 (0.468 - 0.468) 0.692 (0.692 - 0.692) Frequency warping, Filter augmentation log-mel energies
XIAO_FMSG-JLESS_task4_4 Xiao2024 1.17 0.606 (0.606 - 0.606) 0.566 (0.566 - 0.566) SpecAugment, Filter augmentation, Mixup, Freq-MixStyle log-mel energies
LEE_KT_task4_2 Lee2024 1.16 0.474 (0.471 - 0.479) 0.676 (0.666 - 0.690) Frequency warping, Filter augmentation log-mel energies
Baseline Cornell2024 1.13 0.475 (0.469 - 0.479) 0.646 (0.641 - 0.653) log-mel energies
XIAO_FMSG-JLESS_task4_3 Xiao2024 1.12 0.574 (0.574 - 0.574) 0.553 (0.553 - 0.553) SpecAugment, Filter augmentation, Mixup, Freq-MixStyle log-mel energies
XIAO_FMSG-JLESS_task4_2 Xiao2024 1.12 0.597 (0.597 - 0.597) 0.530 (0.530 - 0.530) SpecAugment, Filter augmentation, Mixup, Freq-MixStyle log-mel energies
Lyu_SCUT_task4_2 Lyu2024 1.10 0.478 (0.474 - 0.481) 0.612 (0.596 - 0.624) Mixup complex spectrogram
Lyu_SCUT_task4_1 Lyu2024 1.08 0.474 (0.469 - 0.482) 0.602 (0.586 - 0.619) Mixup complex spectrogram
Niu_XJU_task4_1 Niu2024 1.07 0.465 (0.462 - 0.467) 0.603 (0.599 - 0.610) Mixup, SpecAugment, Audio cutmix, Random linear fedar log-mel energies
XIAO_FMSG-JLESS_task4_1 Xiao2024 1.06 0.575 (0.575 - 0.575) 0.490 (0.490 - 0.490) SpecAugment, Filter augmentation, Mixup, Freq-MixStyle log-mel energies
Cai_USTC_task4_2 Cai2024 0.63 0.574 (0.573 - 0.574) 0.050 (0.050 - 0.050) Mixup, Frame shifting, Filter augmentation log-mel energies
Cai_USTC_task4_1 Cai2024 0.61 0.561 (0.560 - 0.561) 0.050 (0.050 - 0.050) Mixup, Frame shifting, Filter augmentation log-mel energies
Cai_USTC_task4_4 Cai2024 0.56 0.506 (0.505 - 0.507) 0.050 (0.050 - 0.050) Mixup, Frame shifting, Filter augmentation log-mel energies
Cai_USTC_task4_3 Cai2024 0.47 0.417 (0.402 - 0.428) 0.050 (0.050 - 0.050) Mixup, Frame shifting, Filter augmentation log-mel energies
Huang_SJTU_task4_1 Huang2024 0.20 0.000 (0.000 - 0.000) 0.196 (0.189 - 0.202) Mixup, SpecAugment log-mel energies
Huang_SJTU_task4_3 Huang2024 0.17 0.000 (0.000 - 0.000) 0.172 (0.165 - 0.179) Mixup, SpecAugment log-mel energies
Huang_SJTU_task4_2 Huang2024 0.15 0.000 (0.000 - 0.000) 0.149 (0.137 - 0.159) Mixup, SpecAugment log-mel energies
Huang_SJTU_task4_4 Huang2024 1.20 0.519 (0.516 - 0.522) 0.678 (0.669 - 0.685) mixup, specaugment log-mel energies



Machine learning characteristics

Rank Code Technical
Report

Ranking score
(Evaluation dataset)

PSDS
(DESED evaluation dataset)

mpAUC
(MAESTRO evaluation dataset)
Classifier Semi-supervised approach Post-processing Segmentation
method
Decision
making
Schmid_CPJKU_task4_4 Schmid2024 1.42 0.680 (0.679 - 0.682) 0.739 (0.736 - 0.742) ATST_CRNN, PaSST_CRNN, BEATs_CRNN pseudo-labelling, mean-teacher student, interpolation consistency training Sound Event Bounding Boxes
Schmid_CPJKU_task4_3 Schmid2024 1.39 0.676 (0.674 - 0.678) 0.715 (0.714 - 0.718) ATST_CRNN, PaSST_CRNN, BEATs_CRNN pseudo-labelling, mean-teacher student, interpolation consistency training Sound Event Bounding Boxes
Schmid_CPJKU_task4_2 Schmid2024 1.35 0.646 (0.640 - 0.654) 0.711 (0.704 - 0.717) ATST_CRNN pseudo-labelling, mean-teacher student, interpolation consistency training Sound Event Bounding Boxes
Nam_KAIST_task4_4 Nam2024 1.35 0.610 (0.609 - 0.611) 0.744 (0.744 - 0.745) CRNN, ensemble mean-teacher student, self training cSEBBs mean
Nam_KAIST_task4_3 Nam2024 1.35 0.610 (0.609 - 0.611) 0.744 (0.744 - 0.744) CRNN, ensemble mean-teacher student, self training cSEBBs mean
Nam_KAIST_task4_2 Nam2024 1.32 0.586 (0.585 - 0.589) 0.738 (0.732 - 0.745) CRNN mean-teacher student, self training cSEBBs
Schmid_CPJKU_task4_1 Schmid2024 1.31 0.644 (0.640 - 0.647) 0.672 (0.669 - 0.676) ATST_CRNN pseudo-labelling, mean-teacher student, interpolation consistency training Sound Event Bounding Boxes
Nam_KAIST_task4_1 Nam2024 1.31 0.584 (0.582 - 0.587) 0.726 (0.720 - 0.733) CRNN mean-teacher student, self training cSEBBs
Zhang_BUPT_task4_2 Yue2024 1.27 0.570 (0.566 - 0.573) 0.691 (0.691 - 0.691) CRNN with pretrained BEATs pseudo-labelling, mean-teacher student median filtering average
Chen_NCUT_task4_4 Chen2024a 1.25 0.565 (0.563 - 0.566) 0.684 (0.684 - 0.684) CRNN, FFDCRNN, RNN mean-teacher student median filtering, weak prediction masking weighted mean
Chen_CHT_task4_3 Chen2024 1.25 0.527 (0.524 - 0.530) 0.711 (0.709 - 0.712) Transformer, RNN mean-teacher student median filtering average
Zhang_BUPT_task4_1 Yue2024 1.23 0.523 (0.523 - 0.524) 0.704 (0.704 - 0.705) CRNN with pretrained BEATs pseudo-labelling, mean-teacher student median filtering
Chen_CHT_task4_1 Chen2024 1.23 0.495 (0.486 - 0.503) 0.733 (0.730 - 0.739) Transformer, RNN mean-teacher student median filtering
Kim_GIST-HanwhaVision_task4_1 Son2024 1.23 0.567 (0.558 - 0.573) 0.665 (0.646 - 0.677) CRNN with pretrained transformer mean-teacher student median filtering, csebbs
Chen_CHT_task4_2 Chen2024 1.23 0.527 (0.524 - 0.530) 0.691 (0.663 - 0.708) Transformer, RNN mean-teacher student median filtering average, majority vote
Kim_GIST-HanwhaVision_task4_4 Son2024 1.22 0.586 (0.578 - 0.597) 0.638 (0.620 - 0.654) CRNN with pretrained transformer mean-teacher student median filtering, csebbs averaging
Kim_GIST-HanwhaVision_task4_2 Son2024 1.21 0.580 (0.560 - 0.599) 0.629 (0.620 - 0.639) CRNN with pretrained transformer mean-teacher student median filtering, csebbs averaging
Chen_NCUT_task4_3 Chen2024a 1.20 0.526 (0.524 - 0.527) 0.675 (0.675 - 0.675) RNN mean-teacher student median filtering, weak prediction masking
Chen_NCUT_task4_1 Chen2024a 1.20 0.525 (0.523 - 0.527) 0.667 (0.667 - 0.667) CRNN mean-teacher student median filtering, weak prediction masking
LEE_KT_task4_4 Lee2024 1.20 0.509 (0.509 - 0.509) 0.690 (0.690 - 0.690) CRNN, Conformer, ensemble mean-teacher student median filtering, Sound Event Bounding Boxes
Chen_CHT_task4_4 Chen2024 1.20 0.500 (0.498 - 0.504) 0.691 (0.663 - 0.708) Transformer, RNN mean-teacher student median filtering majority vote
Chen_NCUT_task4_2 Chen2024a 1.19 0.519 (0.485 - 0.537) 0.665 (0.659 - 0.669) FFDCRNN mean-teacher student, pseudo-labelling median filtering, weak prediction masking
LEE_KT_task4_1 Lee2024 1.19 0.506 (0.482 - 0.548) 0.684 (0.672 - 0.693) CRNN, Conformer mean-teacher student median filtering, Sound Event Bounding Boxes
Kim_GIST-HanwhaVision_task4_3 Son2024 1.18 0.542 (0.525 - 0.560) 0.637 (0.628 - 0.652) CRNN with pretrained transformer mean-teacher student median filtering, csebbs averaging
LEE_KT_task4_3 Lee2024 1.17 0.468 (0.468 - 0.468) 0.692 (0.692 - 0.692) CRNN, Conformer, ensemble mean-teacher student median filtering
XIAO_FMSG-JLESS_task4_4 Xiao2024 1.17 0.606 (0.606 - 0.606) 0.566 (0.566 - 0.566) FDYCRNN, CRNN mean-teacher student median filtering, sebbs
LEE_KT_task4_2 Lee2024 1.16 0.474 (0.471 - 0.479) 0.676 (0.666 - 0.690) CRNN mean-teacher student median filtering
Baseline Cornell2024 1.13 0.475 (0.469 - 0.479) 0.646 (0.641 - 0.653) CRNN mean-teacher student median filtering
XIAO_FMSG-JLESS_task4_3 Xiao2024 1.12 0.574 (0.574 - 0.574) 0.553 (0.553 - 0.553) FDYCRNN mean-teacher student median filtering, sebbs
XIAO_FMSG-JLESS_task4_2 Xiao2024 1.12 0.597 (0.597 - 0.597) 0.530 (0.530 - 0.530) CRNN mean-teacher student median filtering, sebbs
Lyu_SCUT_task4_2 Lyu2024 1.10 0.478 (0.474 - 0.481) 0.612 (0.596 - 0.624) CRNN mean-teacher student median filtering
Lyu_SCUT_task4_1 Lyu2024 1.08 0.474 (0.469 - 0.482) 0.602 (0.586 - 0.619) CRNN mean-teacher student median filtering
Niu_XJU_task4_1 Niu2024 1.07 0.465 (0.462 - 0.467) 0.603 (0.599 - 0.610) CRNN mean-teacher student median filtering
XIAO_FMSG-JLESS_task4_1 Xiao2024 1.06 0.575 (0.575 - 0.575) 0.490 (0.490 - 0.490) CRNN mean-teacher student median filtering, sebbs
Cai_USTC_task4_2 Cai2024 0.63 0.574 (0.573 - 0.574) 0.050 (0.050 - 0.050) MAT-SED mean-teacher student median filtering
Cai_USTC_task4_1 Cai2024 0.61 0.561 (0.560 - 0.561) 0.050 (0.050 - 0.050) MAT-SED mean-teacher student median filtering
Cai_USTC_task4_4 Cai2024 0.56 0.506 (0.505 - 0.507) 0.050 (0.050 - 0.050) MAT-SED, ATST-SED mean-teacher student median filtering
Cai_USTC_task4_3 Cai2024 0.47 0.417 (0.402 - 0.428) 0.050 (0.050 - 0.050) MAT-SED, ATST-SED mean-teacher student median filtering
Huang_SJTU_task4_1 Huang2024 0.20 0.000 (0.000 - 0.000) 0.196 (0.189 - 0.202) CRNN pseudo-labelling, mean-teacher student median filtering averaging
Huang_SJTU_task4_3 Huang2024 0.17 0.000 (0.000 - 0.000) 0.172 (0.165 - 0.179) CRNN pseudo-labelling, mean-teacher student median filtering averaging
Huang_SJTU_task4_2 Huang2024 0.15 0.000 (0.000 - 0.000) 0.149 (0.137 - 0.159) CRNN pseudo-labelling, mean-teacher student median filtering averaging
Huang_SJTU_task4_4 Huang2024 1.20 0.519 (0.516 - 0.522) 0.678 (0.669 - 0.685) CRNN pseudo-labelling, mean-teacher student median filtering

Complexity

Rank Code Technical
Report

Ranking score
(Evaluation dataset)

PSDS
(DESED evaluation dataset)

mpAUC
(MAESTRO evaluation dataset)
Model
complexity
MACS Ensemble
subsystems
Training time
Schmid_CPJKU_task4_4 Schmid2024 1.42 0.680 (0.679 - 0.682) 0.739 (0.736 - 0.742) 1342986395 450300000000 15 160h (1 Nvidia A40)
Schmid_CPJKU_task4_3 Schmid2024 1.39 0.676 (0.674 - 0.678) 0.715 (0.714 - 0.718) 1608946202 560410000000 18 199h (1 Nvidia A40)
Schmid_CPJKU_task4_2 Schmid2024 1.35 0.646 (0.640 - 0.654) 0.711 (0.704 - 0.717) 88411541 22590000000 8h (1 Nvidia A40)
Nam_KAIST_task4_4 Nam2024 1.35 0.610 (0.609 - 0.611) 0.744 (0.744 - 0.745) 181600000 26085000000 17 3h (1 GTX 1080 Ti)
Nam_KAIST_task4_3 Nam2024 1.35 0.610 (0.609 - 0.611) 0.744 (0.744 - 0.744) 181600000 26085000000 15 3h (1 GTX 1080 Ti)
Nam_KAIST_task4_2 Nam2024 1.32 0.586 (0.585 - 0.589) 0.738 (0.732 - 0.745) 181600000 26085000000 3h (1 GTX 1080 Ti)
Schmid_CPJKU_task4_1 Schmid2024 1.31 0.644 (0.640 - 0.647) 0.672 (0.669 - 0.676) 88411541 22590000000 8h (1 Nvidia A40)
Nam_KAIST_task4_1 Nam2024 1.31 0.584 (0.582 - 0.587) 0.726 (0.720 - 0.733) 181600000 26085000000 10h (1 RTX A6000)
Zhang_BUPT_task4_2 Yue2024 1.27 0.570 (0.566 - 0.573) 0.691 (0.691 - 0.691) 63300000 4986000000 6 4h (1 GeForce RTX 3090)
Chen_NCUT_task4_4 Chen2024a 1.25 0.565 (0.563 - 0.566) 0.684 (0.684 - 0.684) 40000000 4104000000 3 3h15m (1 GTX 4090)
Chen_CHT_task4_3 Chen2024 1.25 0.527 (0.524 - 0.530) 0.711 (0.709 - 0.712) 389600000 202214000000 11 137h (1 A100)
Zhang_BUPT_task4_1 Yue2024 1.23 0.523 (0.523 - 0.524) 0.704 (0.704 - 0.705) 10500000 831000000 34h (1 GeForce RTX 3090)
Chen_CHT_task4_1 Chen2024 1.23 0.495 (0.486 - 0.503) 0.733 (0.730 - 0.739) 92100000 45259000000 23h (1 A100)
Kim_GIST-HanwhaVision_task4_1 Son2024 1.23 0.567 (0.558 - 0.573) 0.665 (0.646 - 0.677) 4822398 7304359968 20~24h (3 A6000)
Chen_CHT_task4_2 Chen2024 1.23 0.527 (0.524 - 0.530) 0.691 (0.663 - 0.708) 393000000 202768000000 144h (1 A100)
Kim_GIST-HanwhaVision_task4_4 Son2024 1.22 0.586 (0.578 - 0.597) 0.638 (0.620 - 0.654) 617266944 7304359968 128 24h (3 A6000)
Kim_GIST-HanwhaVision_task4_2 Son2024 1.21 0.580 (0.560 - 0.599) 0.629 (0.620 - 0.639) 308633472 7304359968 64 24h (3 A6000)
Chen_NCUT_task4_3 Chen2024a 1.20 0.526 (0.524 - 0.527) 0.675 (0.675 - 0.675) 17400000 1362000000 30m (1 GTX 4090)
Chen_NCUT_task4_1 Chen2024a 1.20 0.525 (0.523 - 0.527) 0.667 (0.667 - 0.667) 2500000 950200000 45m (1 GTX 4090)
LEE_KT_task4_4 Lee2024 1.20 0.509 (0.509 - 0.509) 0.690 (0.690 - 0.690) 1834547 104724000000 4 2h (A6000)
Chen_CHT_task4_4 Chen2024 1.20 0.500 (0.498 - 0.504) 0.691 (0.663 - 0.708) 207000000 111214000000 9 95h (1 A100)
Chen_NCUT_task4_2 Chen2024a 1.19 0.519 (0.485 - 0.537) 0.665 (0.659 - 0.669) 20100000 1792000000 2h (1 GTX 4090)
LEE_KT_task4_1 Lee2024 1.19 0.506 (0.482 - 0.548) 0.684 (0.672 - 0.693) 1097897 26181000000 4h (1 NVIDIA GeForce RTX 4090)
Kim_GIST-HanwhaVision_task4_3 Son2024 1.18 0.542 (0.525 - 0.560) 0.637 (0.628 - 0.652) 308633472 7304359968 64 24h (3 A6000)
LEE_KT_task4_3 Lee2024 1.17 0.468 (0.468 - 0.468) 0.692 (0.692 - 0.692) 1898428 96896000000 4 2h (A6000)
XIAO_FMSG-JLESS_task4_4 Xiao2024 1.17 0.606 (0.606 - 0.606) 0.566 (0.566 - 0.566) 15658236 4380000000 6 30h (1 RTX A5000)
LEE_KT_task4_2 Lee2024 1.16 0.474 (0.471 - 0.479) 0.676 (0.666 - 0.690) 1161778 22267000000 4h (1 NVIDIA GeForce RTX 4090)
Baseline Cornell2024 1.13 0.475 (0.469 - 0.479) 0.646 (0.641 - 0.653) 1800000 1036000000 3h (1 GTX 1080 Ti)
XIAO_FMSG-JLESS_task4_3 Xiao2024 1.12 0.574 (0.574 - 0.574) 0.553 (0.553 - 0.553) 3438938 345260000 5h (1 RTX A5000)
XIAO_FMSG-JLESS_task4_2 Xiao2024 1.12 0.597 (0.597 - 0.597) 0.530 (0.530 - 0.530) 1780474 1659000000 6h (1 RTX A5000)
Lyu_SCUT_task4_2 Lyu2024 1.10 0.478 (0.474 - 0.481) 0.612 (0.596 - 0.624) 1100000 20730000000 10.5h (1 RTX 4090 D)
Lyu_SCUT_task4_1 Lyu2024 1.08 0.474 (0.469 - 0.482) 0.602 (0.586 - 0.619) 1400000 20822000000 10h (1 RTX 4090 D)
Niu_XJU_task4_1 Niu2024 1.07 0.465 (0.462 - 0.467) 0.603 (0.599 - 0.610) 28 1431000000 9h (1 GTX TITAN)
XIAO_FMSG-JLESS_task4_1 Xiao2024 1.06 0.575 (0.575 - 0.575) 0.490 (0.490 - 0.490) 1780474 1035000000 4h (1 RTX A5000)
Cai_USTC_task4_2 Cai2024 0.63 0.574 (0.573 - 0.574) 0.050 (0.050 - 0.050) 92608612 110175122688 12h (NVIDIA GeForce RTX 3090)
Cai_USTC_task4_1 Cai2024 0.61 0.561 (0.560 - 0.561) 0.050 (0.050 - 0.050) 90592532 108500000000 12h (NVIDIA GeForce RTX 3090)
Cai_USTC_task4_4 Cai2024 0.56 0.506 (0.505 - 0.507) 0.050 (0.050 - 0.050) 185217224 110175122688 2 12h (NVIDIA GeForce RTX 3090)
Cai_USTC_task4_3 Cai2024 0.47 0.417 (0.402 - 0.428) 0.050 (0.050 - 0.050) 185217224 110175122688 2 12h (NVIDIA GeForce RTX 3090)
Huang_SJTU_task4_1 Huang2024 0.20 0.000 (0.000 - 0.000) 0.196 (0.189 - 0.202) 9100000 1688000000 7 8h (1 NVIDIA A10)
Huang_SJTU_task4_3 Huang2024 0.17 0.000 (0.000 - 0.000) 0.172 (0.165 - 0.179) 26000000 1688000000 20 8h (1 NVIDIA A10)
Huang_SJTU_task4_2 Huang2024 0.15 0.000 (0.000 - 0.000) 0.149 (0.137 - 0.159) 13000000 1688000000 10 8h (1 NVIDIA A10)
Huang_SJTU_task4_4 Huang2024 1.20 0.519 (0.516 - 0.522) 0.678 (0.669 - 0.685) 1300000 1688000000 8h (1 NVIDIA A10)

Technical reports

TRANSFORMER-BASED SOUND EVENT DETECTION SYSTEM FOR DCASE2024 TASK4

Pengfei Cai, Yan Song
University of Science and Technology of China

Abstract

In this technical report, we describe our systems for DCASE 2024 Challenge Task4. Our systems are mainly based on MAT-SED, a pure Transformer-based SED model with masked-reconstruction based pre-training. In MAT-SED, a Transformer with relative positional encoding is first designed as the context network instead of RNNs. The Transformer-based context network is pre-trained by the masked-reconstruction task on all available target data in a self-supervised way. Both the encoder and the context network are jointly fine-tuned in a semi-supervised manner. Our final systems achieve PSDS1 of 0.588(single model) and 0.600(ensemble) on the validation set of DESED dataset.

PDF

SOUND EVENT DETECTION WITH HETEROGENEOUS TRAINING DATASET AND POTENTIALLY MISSING LABELS FOR DCASE 2024 TASK 4

Wei-Yu Chen, Chung-Li Lu, Hsiang-Feng Chuang, Yu-Han Cheng, Bo-Cheng Chan
Advanced Technology Laboratory, Telecommunication Laboratories, Chunghwa Telecom Co., Ltd., Taiwan

Abstract

In this technical report, we briefly describe the system we designed for Detection and Classification of Acoustic Scenes and Events (DCASE) 2024 Challenge Task4: Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels. Our optimal single system employs a two-stage training process. Pretrained BEATs[1] model is utilized as front-end feature extractor, with Bi-GRU module as back-end classifier for each single frame. We employ the mean teacher method for semi-supervised learning, incorporating the EMA strategy to update parameters of the teacher model. Additionally, we generate pseudo-labels using the student model to leverage unlabeled data. For data augmentation, techniques such as mix-up and SpecAugment [2] are employed. Median filter is used for post-processing. The submitted system without ensemble, achieves a Polyphonic sound event detection scores-scenario 1 (PSDS1)[3] score of 0.50 and a mean partial AUC(mean pAUC) of 0.73, while with ensemble it achieves a PSDS1 score of 0.53 and a mean pAUC of 0.77 on the validation set.

PDF

SEMI-SUPERVISED SOUND EVENT DETECTION BASED ON PRETRAINED MODELS FOR DCASE 2024 TASK 4

Jingxuan Chen, Xichang Cai, Ziyi Liu, Haiyue Zhang, Liangxiao Zuo, Menglong Wu
North China University of Technology, China

Abstract

In this technical report, we present our submission system for DCASE 2024 Task 4: Sound Event Detection in Domestic Environments with Heterogeneous Training Dataset and Potentially Missing Labels. Firstly, our proposed system employs a full-frequency dynamic convolution (FFD-Conv) network based on the Mean Teacher semi-supervised learning framework. Secondly, we utilize a two-stage training framework, where in the first stage, a large unlabeled in-domain set is converted into pseudo-weak labels to balance the number of strongly labeled datasets in the second stage. Additionally, we employ various methods such as data augmentation, post-processing, and model ensembling to further enhance the generalization capability of the system. Ultimately, our system achieved a PSDS-scenario1 score of 0.535 and a macro-average pAUC score of 0.697 on the validation set.

PDF

DCASE 2024 TASK 4: SOUND EVENT DETECTION WITH HETEROGENEOUS DATA AND MISSING LABELS

Samuele Cornell1, Janek Ebbers2, Constance Douwes3, Irene Martı́n-Morató4, Manu Harju4, Annamaria Mesaros4, Romain Serizel3
1Carnegie Mellon University, USA, 2Mitsubishi Electric Research Laboratories, USA, 3Universite de Lorraine, CNRS, Inria, Loria, France, 4Tampere University, Finland

Abstract

The Detection and Classification of Acoustic Scenes and Events Challenge Task 4 aims to advance sound event detection (SED) systems in domestic environments by leveraging training data with different supervision uncertainty. Participants are challenged in exploring how to best use training data from different domains and with varying annotation granularity (strong/weak temporal resolution, soft/hard labels), to obtain a robust SED system that can generalize across different scenarios. Crucially, annotation across available training datasets can be inconsistent and hence sound labels of one dataset may be present but not annotated in the other one and vice-versa. As such, systems will have to cope with potentially missing target labels during training. Moreover, as an additional novelty, systems will also be evaluated on labels with different granularity in order to assess their robustness for different applications. To lower the entry barrier for participants, we developed an updated baseline system with several caveats to address these aforementioned problems. Results with our baseline system indicate that this research direction is promising and is possible to obtain a stronger SED system by using diverse domain training data with missing labels compared to training a SED system for each domain separately.

PDF

SOUND EVENT DETECTION ENHANCED BY SCENE INFORMATION FOR DCASE CHALLENGE 2024 TASK4

Wen Huang1, Bing Han1, Xie Chen1, Pingyi Fan2, Cheng Lu3, Zhiqiang Lv4, Jia Liu2,4, Wei-Qiang Zhang2, Yanmin Qian1
1Shanghai Jiao Tong University, Shanghai, China, 2Tsinghua University, Beijing, China, 3North China Electric Power University, Beijing, China, 4Huakong AI Plus Company Limited, Beijing, China

Abstract

In this technical report, we describe our submission to the DCASE 2024 Challenge Task 4: Sound Event Detection with Heterogeneous Training Data and Potentially Missing Labels. Our approach leverages a Convolutional Recurrent Neural Network (CRNN) architecture enhanced with pre-trained BEATs embeddings to perform robust sound event detection. To effectively utilize different sources of data, we integrate scene information to enhance event detection performance through multi-task learning. Additionally, we address the challenge of partially missing labels by employing a semi-supervised strategy that combines the mean teacher model with pseudo-labeling to improve performance. Our final ensemble system achieves a PSDS1 score of 0.545 on the DESED validation set and an mpAUC score of 0.759 on the MAESTRO real validation set. These results highlight the efficacy of incorporating scene information and semi-supervised learning strategies in sound event detection tasks with heterogeneous and incomplete datasets.

PDF

TECHNICAL REPORT ON LEE SUBMISSION: SOUND EVENT DETECTION USING CONFORMER AND ATST FRAMEWORK FOR DCASE CHALLENGE 2024 TASK 4

Yuna Lee, JaeHoon Jung
KT Corporation, Republic of Korea

Abstract

Sound Event Detection (SED) has shown promising performance in detecting and classifying meaningful events on the given audio signal input. Since the real-world scenario does not provide well-labeled data, there had been an urge to extend the research to a rather “coarse” labeled dataset. In this report, we propose a novel model to perform robustly on the well-labeled datasets and potentially missing labeled datasets using large pre-trained audio transformers throughout the training process. Our method can improve the performance to 0.52 in P SDS1 and 0.77 in pAUCM.

PDF

Semi-Supervised Sound Event Detection System Based on Complex Convolutional Recurrent Neural Network

Hong Lyu, Qianhua He
School of Electronic and Information Engineering, South China University of Technology, China

Abstract

This report describes the system we proposed for Task 4 of DCASE 2024. To investigate the impact of complex information on sound event detection tasks, we designed a system based on Complex Convolutional Recurrent Neural Network[1] for semi-supervised Sound Event Detection (CCRN-SED). We utilized the Mean Teacher[2] for semi-supervised learning, which can address the challenge of unlabeled data. In addition, we use BEATs pretrained model[3] to extract information from data outside the development set. The optimal PSDS1 and mean pAUC of CCRN-SED on the development test set are 0.508 and 0.693.

PDF

SELF TRAINING AND ENSEMBLING FREQUENCY DEPENDENT NETWORKS WITH COARSE PREDICTION POOLING AND SOUND EVENT BOUNDING BOXES

Hyeonuk Nam, Deokki Min, Seungdeok Choi, Inhan Choi, Yong-Hwa Park
Korea Advanced Institute of Science and Technology, South Korea

Abstract

To tackle sound event detection (SED) task, we propose frequency dependent networks (FreDNets), which heavily leverage frequency-dependent methods. We apply frequency warping and FilterAugment, which are frequency-dependent data augmentation methods. The model architecture consists of 3 branches: audio teacher-student transformer (ATST) branch, BEATs branch and CNN branch including either partial dilated frequency dynamic convolution (PDFD) or squeeze-and-Excitation (SE) with time-frame frequency-wise SE (tfwSE). To train MAESTRO labels with coarse temporal resolution, we apply max pooling on prediction for the MAESTRO dataset. Using best ensemble model, we apply self training to obtain pseudo label from DESED weak set, DESED unlabeled set and AudioSet. AudioSet labels are filtered to focus on high-confidence pseudo labels and AudioSet pseudo labels are used to train on DESED labels only. We used change-detection-based sound event bounding boxes (cSEBBs) as post processing for ensemble models on self training and submission models.

PDF

A EFFICIENCE SOUND EVENT DETECTION SYSTEM FOR DCASE 2024 TASK 4

ZunXue Niu1,2, Ying Hu1,2, Xin Fan1,2, Jie Liu1,2, Ye Dong1,2, Fujie Xu1,2, ShangKun Tu1,2, KaiMin Cao1,2, JiaBo Jing1,2, Qiong Wu1,2, QingJing Wan1,2
1XinJiang University, School of Information Science and Engineering, China, 2Key Laboratory of Signal Detection and Processing in Xinjiang, China

Abstract

This technical report describes the system we submitted to DCASE2024 Task4: Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels. Specifically, we apply three main techniques to improve the performance of the official baseline system. Firstly, We exploiting a dual-branch convolutional recurrent neural network (CRNN) structure including the main branch and auxiliary branch. We adopt an SCT strategy to apply the self-consistency regularization in addition to the Mean Teacher loss to maintain the consistency between the outputs of the auxiliary and main branches. Secondly, a HTA module is designed to aggregate the information at different temporal resolutions so that the receptive fields of the network can be adjusted according to the short-term and long-term correlation. Thirdly, several data augmentation strategies are adopted to improve the robust of the network. Experiments on the DCASE2024 Task4 validation dataset demonstrate the effectiveness of the techniques used in our system.

PDF

IMPROVING AUDIO SPECTROGRAM TRANSFORMERS FOR SOUND EVENT DETECTION THROUGH MULTI-STAGE TRAINING

Florian Schmid1, Paul Primus1, Tobias Morocutti1, Jonathan Greif1, Gerhard Widmer1,2
1Institute of Computational Perception (CP-JKU), Johannes Kepler University Linz, Austria, 2LIT Artificial Intelligence Lab, Johannes Kepler University Linz, Austria

Abstract

This technical report describes the CP-JKU team’s submission for Task 4 Sound Event Detection with Heterogeneous Training Datasets and Potentially Missing Labels of the DCASE 24 Challenge. We fine-tune three large Audio Spectrogram Transformers, PaSST, BEATs, and ATST, on the joint DESED and MAESTRO datasets in a two-stage training procedure. The first stage closely matches the baseline system setup and trains a CRNN model while keeping the large pre-trained transformer model frozen. In the second stage, both CRNN and transformer are fine-tuned using heavily weighted self-supervised losses. After the second stage, we compute strong pseudo-labels for all audio clips in the training set using an ensemble of all three fine-tuned transformers. Then, in a second iteration, we repeat the two-stage training process and include a distillation loss based on the pseudo-labels, boosting single-model performance substantially. Additionally, we pre-train PaSST and ATST on the subset of AudioSet that comes with strong temporal labels, before fine-tuning them on the Task 4 datasets.

PDF

SOUND EVENT DETECTION BASED ON AUXILIARY DECODER AND MAXIMUM PROBABILITY AGGREGATION FOR DCASE CHALLENGE 2024 TASK 4

Sang Won Son1, Jongyeon Park1, Hong Kook Kim1,2, Sulaiman Vesal3, Jeong Eun Lim4
1AI Graduate School, Gwangju Institute of Science and Technology Korea, 2 School of EECSHanwha Vision, Gwangju Institute of Science and Technology Korea, 3AI Lab., Innovation Center, USA, 4AI Lab., R&D Center Hanwha Vision, Korea

Abstract

In this report, we propose three novel methods for developing a sound event detection (SED) model for the DCASE 2024 Challenge Task 4. First, we propose an auxiliary decoder attached to the final convolutional block to improve feature extraction capabilities while reducing dependency on embeddings from pretrained large models. The proposed auxiliary decoder operates independently from the main decoder, enhancing performance of the convolutional block during the initial training stages by assigning a different weight strategy between main and auxiliary decoder losses. Next, to address the time interval issue between the DESED and MAESTRO datasets, we propose maximum probability aggregation (MPA) during the training step. The proposed MPA method enables the model’s output to be aligned with soft labels of 1 s in the MAESTRO dataset. Finally, we propose a multi-channel input feature that employs various versions of log-mel and MFCC features to generate time-frequency pattern. The experimental results demonstrate the efficacy of these proposed methods in a view of improving SED performance by achieving a balanced enhancement across different datasets and label types. Ultimately, this approach presents a significant step forward in developing more robust and flexible SED models.

PDF

FMSG-JLESS SUBMISSION FOR DCASE 2024 TASK4 ON SOUND EVENT DETECTION WITH HETEROGENEOUS TRAINING DATASET AND POTENTIALLY MISSING LABELS

Yang Xiao1, Han Yin2, Jisheng Bai2, Rohan Kumar Das1
1Fortemedia Singapore, Singapore, 2 Joint Laboratory of Environmental Sound Sensing, School of Marine Science and Technology, Northwestern Polytechnical University, China

Abstract

This report presents the systems developed and submitted by Fortemedia Singapore (FMSG) and Joint Laboratory of Environmental Sound Sensing (JLESS) for DCASE 2024 Task 4. The task focuses on recognizing event classes and their time boundaries, given that multiple events can be present and may overlap in an audio recording. The novelty this year is a dataset with two sources, making it challenging to achieve good performance without knowing the source of the audio clips during evaluation. To address this, we propose a sound event detection method using domain generalization. Our approach integrates features from bidirectional encoder representations from audio transformers and a convolutional recurrent neural network. We focus on three main strategies to improve our method. First, we apply mixstyle to the frequency dimension to adapt the mel-spectrograms from different domains. Second, we consider training loss of our model specific to each datasets for their corresponding classes. This independent learning framework helps the model extract domain-specific features effectively. Lastly, we use the sound event bounding boxes method for post-processing. Our proposed method shows superior macro-average pAUC and polyphonic SED score performance on the DCASE 2024 Challenge Task 4 validation dataset and public evaluation dataset.

PDF

LOCAL AND GLOBAL FEATURES FUSION FOR SOUND EVENT DETECTION WITH HETEROGENEOUS TRAINING DATASET AND POTENTIALLY MISSING LABELS

Haobo Yue, Zehao Wang, Da Mu, Huamei Sun, Yuanyuan Jiang, Zhicheng Zhang, Jianqin Yin
Beijing University of Posts and Telecommunications, China

Abstract

In this work, we present our submission system for DCASE 2024 Task4 on Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels, where we introduce the BEATs-CRNN interactive systems. Considering that the pretrained BEATs model predominantly captures global features for the dataset, while the CRNN model focuses on learning local features, this work aims to fuse the middle layer information of the two to enhance the system’s feature extraction capabilities. Firstly, we modify the BEATs model and the CRNN model so that the feature extraction of the dataset by the two models is performed at the same stage. Secondly, due to the differing number of layers in CNN and BEATs, we extract intermediate features from both models at regular intervals, interact them through cross-attention, and then feed the resulting features back to the respective models for the feature extration in the subsequent layer. Finally, the final interaction results of the two models are used as the final features for learning. Compared to the baseline system using BEATs embeddings, which achieved 48.3% in PSDS-scenario 1, 49.4% in PSDS-scenario1 (sed score), and 73.7% in mean-pAUC, our BEATs-CRNN interactive system achieves 53.2%, 54.1%, and 76.3%, respectively. The ensemble of the BEATs-CRNN interactive system further improves the PSDS-scenario 1 to 56.4%, the PSDS-scenario1 (sed score) to 57.4% and the mean-pAUC to 75.6%.

PDF