Task description
More detailed task description can be found in the task description page
All confindence intervals are computed based on the three runs per systems and bootstrapping on the evaluation set.
Team Ranking
Tables including only the best ranking score per submitting team without ensembling.
Rank |
Submission code |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS (DESED evaluation dataset) |
mpAUC (MAESTRO evaluation dataset) |
---|---|---|---|---|---|
Schmid_CPJKU_task4_2 | Schmid2024 | 1.35 | 0.646 (0.640 - 0.654) | 0.711 (0.704 - 0.717) | |
Nam_KAIST_task4_2 | Nam2024 | 1.32 | 0.586 (0.585 - 0.589) | 0.738 (0.732 - 0.745) | |
Zhang_BUPT_task4_1 | Yue2024 | 1.23 | 0.523 (0.523 - 0.524) | 0.704 (0.704 - 0.705) | |
Chen_CHT_task4_1 | Chen2024 | 1.23 | 0.495 (0.486 - 0.503) | 0.733 (0.730 - 0.739) | |
Kim_GIST-HanwhaVision_task4_1 | Son2024 | 1.23 | 0.567 (0.558 - 0.573) | 0.665 (0.646 - 0.677) | |
Chen_NCUT_task4_3 | Chen2024 | 1.20 | 0.526 (0.524 - 0.527) | 0.675 (0.675 - 0.675) | |
LEE_KT_task4_1 | Lee2024 | 1.19 | 0.506 (0.482 - 0.548) | 0.684 (0.672 - 0.693) | |
Baseline | Cornell2024 | 1.13 | 0.475 (0.469 - 0.479) | 0.646 (0.641 - 0.653) | |
XIAO_FMSG-JLESS_task4_3 | Xiao2024 | 1.12 | 0.574 (0.574 - 0.574) | 0.553 (0.553 - 0.553) | |
Lyu_SCUT_task4_2 | Lyu2024 | 1.10 | 0.478 (0.474 - 0.481) | 0.612 (0.596 - 0.624) | |
Niu_XJU_task4_1 | Niu2024 | 1.07 | 0.465 (0.462 - 0.467) | 0.603 (0.599 - 0.610) | |
Cai_USTC_task4_2 | Cai2024 | 0.63 | 0.574 (0.573 - 0.574) | 0.050 (0.050 - 0.050) | |
Huang_SJTU_task4_4 | Huang2024 | 1.20 | 0.519 (0.516 - 0.522) | 0.678 (0.669 - 0.685) |
With ensembling
Rank |
Submission code |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS (DESED evaluation dataset) |
mpAUC (MAESTRO evaluation dataset) |
---|---|---|---|---|---|
Schmid_CPJKU_task4_4 | Schmid2024 | 1.42 | 0.680 (0.679 - 0.682) | 0.739 (0.736 - 0.742) | |
Nam_KAIST_task4_4 | Nam2024 | 1.35 | 0.610 (0.609 - 0.611) | 0.744 (0.744 - 0.745) | |
Zhang_BUPT_task4_2 | Yue2024 | 1.27 | 0.570 (0.566 - 0.573) | 0.691 (0.691 - 0.691) | |
Chen_NCUT_task4_4 | Chen2024a | 1.25 | 0.565 (0.563 - 0.566) | 0.684 (0.684 - 0.684) | |
Chen_CHT_task4_3 | Chen2024 | 1.25 | 0.527 (0.524 - 0.530) | 0.711 (0.709 - 0.712) | |
Kim_GIST-HanwhaVision_task4_1 | Son2024 | 1.23 | 0.567 (0.558 - 0.573) | 0.665 (0.646 - 0.677) | |
LEE_KT_task4_4 | Lee2024 | 1.20 | 0.509 (0.509 - 0.509) | 0.690 (0.690 - 0.690) | |
XIAO_FMSG-JLESS_task4_4 | Xiao2024 | 1.17 | 0.606 (0.606 - 0.606) | 0.566 (0.566 - 0.566) | |
Baseline | Cornell2024 | 1.13 | 0.475 (0.469 - 0.479) | 0.646 (0.641 - 0.653) | |
Lyu_SCUT_task4_2 | Lyu2024 | 1.10 | 0.478 (0.474 - 0.481) | 0.612 (0.596 - 0.624) | |
Niu_XJU_task4_1 | Niu2024 | 1.07 | 0.465 (0.462 - 0.467) | 0.603 (0.599 - 0.610) | |
Cai_USTC_task4_2 | Cai2024 | 0.63 | 0.574 (0.573 - 0.574) | 0.050 (0.050 - 0.050) | |
Huang_SJTU_task4_4 | Huang2024 | 1.20 | 0.519 (0.516 - 0.522) | 0.678 (0.669 - 0.685) |
Systems ranking
Performance obtained without ensembling.
Rank |
Submission code |
Submission name |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS (DESED evaluation dataset) |
mpAUC (MAESTRO evaluation dataset) |
PSDS (Development dataset) |
mpAUC (Development dataset) |
---|---|---|---|---|---|---|---|---|
Schmid_CPJKU_task4_2 | ATST S2.I2 Devtest | Schmid2024 | 1.35 | 0.646 (0.640 - 0.654) | 0.711 (0.704 - 0.717) | 0.617 | 0.749 | |
Nam_KAIST_task4_2 | NAM_SED_2 | Nam2024 | 1.32 | 0.586 (0.585 - 0.589) | 0.738 (0.732 - 0.745) | 0.539 | 0.773 | |
Schmid_CPJKU_task4_1 | ATST S2.I2 | Schmid2024 | 1.31 | 0.644 (0.640 - 0.647) | 0.672 (0.669 - 0.676) | 0.617 | 0.749 | |
Nam_KAIST_task4_1 | NAM_SED_1 | Nam2024 | 1.31 | 0.584 (0.582 - 0.587) | 0.726 (0.720 - 0.733) | 0.571 | 0.788 | |
Zhang_BUPT_task4_1 | single_model | Yue2024 | 1.23 | 0.523 (0.523 - 0.524) | 0.704 (0.704 - 0.705) | 0.543 | 0.763 | |
Chen_CHT_task4_1 | Chen_CHT_task4_1 | Chen2024 | 1.23 | 0.495 (0.486 - 0.503) | 0.733 (0.730 - 0.739) | 0.498 | 0.726 | |
Kim_GIST-HanwhaVision_task4_1 | DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature | Son2024 | 1.23 | 0.567 (0.558 - 0.573) | 0.665 (0.646 - 0.677) | 0.481 | 0.686 | |
Chen_CHT_task4_2 | Chen_CHT_task4_2 | Chen2024 | 1.23 | 0.527 (0.524 - 0.530) | 0.691 (0.663 - 0.708) | 0.531 | 0.773 | |
Chen_NCUT_task4_3 | Chen_NCUT_SED_system_3 | Chen2024a | 1.20 | 0.526 (0.524 - 0.527) | 0.675 (0.675 - 0.675) | 0.514 | 0.697 | |
Chen_NCUT_task4_1 | Chen_NCUT_SED_system_1 | Chen2024a | 1.20 | 0.525 (0.523 - 0.527) | 0.667 (0.667 - 0.667) | 0.521 | 0.659 | |
Chen_NCUT_task4_2 | Chen_NCUT_SED_system_2 | Chen2024a | 1.19 | 0.519 (0.485 - 0.537) | 0.665 (0.659 - 0.669) | 0.525 | 0.651 | |
LEE_KT_task4_1 | CRNN-Con_with_ATST_BEATs | Lee2024 | 1.19 | 0.506 (0.482 - 0.548) | 0.684 (0.672 - 0.693) | 0.467 | 0.734 | |
LEE_KT_task4_2 | FDY-CRNN_with_ATST_and_BEATs | Lee2024 | 1.16 | 0.474 (0.471 - 0.479) | 0.676 (0.666 - 0.690) | 0.475 | 0.730 | |
Baseline | DCASE2024 baseline system | Cornell2024 | 1.13 | 0.475 (0.469 - 0.479) | 0.646 (0.641 - 0.653) | 0.491 | 0.695 | |
XIAO_FMSG-JLESS_task4_3 | XIAO_FMSG-JLESS_task4_3_FDY | Xiao2024 | 1.12 | 0.574 (0.574 - 0.574) | 0.553 (0.553 - 0.553) | 0.503 | 0.737 | |
XIAO_FMSG-JLESS_task4_2 | XIAO_FMSG-JLESS_task4_2_WIDE | Xiao2024 | 1.12 | 0.597 (0.597 - 0.597) | 0.530 (0.530 - 0.530) | 0.479 | 0.748 | |
Lyu_SCUT_task4_2 | CCRN_BEATs_2 | Lyu2024 | 1.10 | 0.478 (0.474 - 0.481) | 0.612 (0.596 - 0.624) | 0.508 | 0.693 | |
Lyu_SCUT_task4_1 | CCRN_BEATs_1 | Lyu2024 | 1.08 | 0.474 (0.469 - 0.482) | 0.602 (0.586 - 0.619) | 0.494 | 0.655 | |
Niu_XJU_task4_1 | DCASE2024 SED system | Niu2024 | 1.07 | 0.465 (0.462 - 0.467) | 0.603 (0.599 - 0.610) | 0.493 | 0.657 | |
XIAO_FMSG-JLESS_task4_1 | XIAO_FMSG-JLESS_task4_1_ORL | Xiao2024 | 1.06 | 0.575 (0.575 - 0.575) | 0.490 (0.490 - 0.490) | 0.506 | 0.734 | |
Cai_USTC_task4_2 | MAT-SED-CNN | Cai2024 | 0.63 | 0.574 (0.573 - 0.574) | 0.050 (0.050 - 0.050) | 0.588 | 0.000 | |
Cai_USTC_task4_1 | MAT-SED | Cai2024 | 0.61 | 0.561 (0.560 - 0.561) | 0.050 (0.050 - 0.050) | 0.587 | 0.000 | |
Huang_SJTU_task4_4 | pl_mtl_single | Huang2024 | 1.20 | 0.519 (0.516 - 0.522) | 0.678 (0.669 - 0.685) | 0.527 | 0.737 |
Supplementary metrics
DESED dataset
Rank |
Submission code |
Submission name |
Technical Report |
PSDS (Development dataset) |
PSDS (Evaluation dataset) |
PSDS (DESED public evaluation) |
PSDS (DESED Vimeo dataset) |
Segment-based F1 Threshold = 0.5 (DESED evaluation) |
Segment-based F1 Optimal threshold (DESED evaluation) |
Collar-based F1 Threshold = 0.5 (DESED evaluation) |
Collar-based F1 Optimal threshold (DESED evaluation) |
Intersection-based F1 Threshold = 0.5 (DESED evaluation) |
Intersection-based F1 Optimal threshold (DESED evaluation) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Schmid_CPJKU_task4_2 | ATST S2.I2 Devtest | Schmid2024 | 0.617 | 0.646 (0.640 - 0.654) | 0.695 (0.692 - 0.698) | 0.525 (0.517 - 0.541) | 0.853 (0.849 - 0.858) | 0.883 (0.882 - 0.884) | 0.642 (0.634 - 0.648) | 0.672 (0.666 - 0.676) | 0.772 (0.768 - 0.778) | 0.803 (0.802 - 0.805) | |
Nam_KAIST_task4_2 | NAM_SED_2 | Nam2024 | 0.539 | 0.586 (0.585 - 0.589) | 0.640 (0.637 - 0.644) | 0.446 (0.442 - 0.452) | 0.858 (0.857 - 0.859) | 0.884 (0.883 - 0.886) | 0.654 (0.653 - 0.654) | 0.673 (0.671 - 0.675) | 0.772 (0.771 - 0.775) | 0.788 (0.787 - 0.791) | |
Schmid_CPJKU_task4_1 | ATST S2.I2 | Schmid2024 | 0.617 | 0.644 (0.640 - 0.647) | 0.687 (0.684 - 0.688) | 0.545 (0.539 - 0.553) | 0.851 (0.843 - 0.855) | 0.885 (0.881 - 0.889) | 0.638 (0.635 - 0.642) | 0.678 (0.675 - 0.679) | 0.767 (0.765 - 0.769) | 0.805 (0.800 - 0.809) | |
Nam_KAIST_task4_1 | NAM_SED_1 | Nam2024 | 0.571 | 0.584 (0.582 - 0.587) | 0.629 (0.624 - 0.636) | 0.470 (0.463 - 0.476) | 0.860 (0.859 - 0.861) | 0.885 (0.884 - 0.886) | 0.652 (0.649 - 0.655) | 0.673 (0.673 - 0.673) | 0.772 (0.771 - 0.774) | 0.788 (0.787 - 0.789) | |
Zhang_BUPT_task4_1 | single_model | Yue2024 | 0.543 | 0.523 (0.523 - 0.524) | 0.572 (0.571 - 0.573) | 0.425 (0.422 - 0.427) | 0.776 (0.774 - 0.777) | 0.866 (0.866 - 0.866) | 0.527 (0.526 - 0.528) | 0.607 (0.606 - 0.608) | 0.662 (0.662 - 0.663) | 0.749 (0.747 - 0.751) | |
Chen_CHT_task4_1 | Chen_CHT_task4_1 | Chen2024 | 0.498 | 0.495 (0.486 - 0.503) | 0.540 (0.532 - 0.548) | 0.412 (0.397 - 0.420) | 0.854 (0.850 - 0.859) | 0.880 (0.878 - 0.885) | 0.561 (0.558 - 0.567) | 0.591 (0.587 - 0.597) | 0.740 (0.738 - 0.745) | 0.763 (0.761 - 0.765) | |
Kim_GIST-HanwhaVision_task4_1 | DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature | Son2024 | 0.481 | 0.567 (0.558 - 0.573) | 0.610 (0.597 - 0.622) | 0.464 (0.460 - 0.470) | 0.846 (0.833 - 0.863) | 0.891 (0.889 - 0.893) | 0.577 (0.568 - 0.589) | 0.630 (0.621 - 0.636) | 0.736 (0.726 - 0.747) | 0.780 (0.778 - 0.781) | |
Chen_CHT_task4_2 | Chen_CHT_task4_2 | Chen2024 | 0.531 | 0.527 (0.524 - 0.530) | 0.581 (0.577 - 0.584) | 0.422 (0.421 - 0.423) | 0.879 (0.879 - 0.880) | 0.899 (0.898 - 0.899) | 0.562 (0.560 - 0.563) | 0.616 (0.615 - 0.617) | 0.747 (0.745 - 0.749) | 0.782 (0.781 - 0.782) | |
Chen_NCUT_task4_3 | Chen_NCUT_SED_system_3 | Chen2024a | 0.514 | 0.526 (0.524 - 0.527) | 0.575 (0.572 - 0.576) | 0.430 (0.429 - 0.430) | 0.878 (0.877 - 0.878) | 0.887 (0.887 - 0.887) | 0.572 (0.572 - 0.573) | 0.615 (0.614 - 0.616) | 0.756 (0.754 - 0.757) | 0.773 (0.773 - 0.774) | |
Chen_NCUT_task4_1 | Chen_NCUT_SED_system_1 | Chen2024a | 0.521 | 0.525 (0.523 - 0.527) | 0.575 (0.574 - 0.577) | 0.431 (0.430 - 0.432) | 0.861 (0.861 - 0.861) | 0.877 (0.877 - 0.877) | 0.547 (0.547 - 0.547) | 0.590 (0.589 - 0.590) | 0.740 (0.740 - 0.741) | 0.766 (0.765 - 0.766) | |
Chen_NCUT_task4_2 | Chen_NCUT_SED_system_2 | Chen2024a | 0.525 | 0.519 (0.485 - 0.537) | 0.576 (0.543 - 0.594) | 0.398 (0.358 - 0.419) | 0.857 (0.855 - 0.860) | 0.873 (0.872 - 0.874) | 0.535 (0.518 - 0.546) | 0.592 (0.577 - 0.601) | 0.723 (0.711 - 0.731) | 0.767 (0.758 - 0.772) | |
LEE_KT_task4_1 | CRNN-Con_with_ATST_BEATs | Lee2024 | 0.467 | 0.506 (0.482 - 0.548) | 0.550 (0.528 - 0.585) | 0.380 (0.340 - 0.441) | 0.816 (0.809 - 0.829) | 0.842 (0.835 - 0.855) | 0.517 (0.479 - 0.575) | 0.558 (0.534 - 0.595) | 0.682 (0.657 - 0.722) | 0.705 (0.685 - 0.739) | |
LEE_KT_task4_2 | FDY-CRNN_with_ATST_and_BEATs | Lee2024 | 0.475 | 0.474 (0.471 - 0.479) | 0.519 (0.515 - 0.524) | 0.378 (0.374 - 0.383) | 0.814 (0.811 - 0.817) | 0.849 (0.841 - 0.862) | 0.477 (0.463 - 0.495) | 0.537 (0.517 - 0.566) | 0.666 (0.664 - 0.668) | 0.712 (0.702 - 0.723) | |
Baseline | DCASE2024 baseline system | Cornell2024 | 0.491 | 0.475 (0.469 - 0.479) | 0.522 (0.516 - 0.527) | 0.380 (0.365 - 0.389) | 0.858 (0.855 - 0.862) | 0.867 (0.863 - 0.873) | 0.474 (0.470 - 0.480) | 0.545 (0.540 - 0.552) | 0.682 (0.674 - 0.687) | 0.726 (0.722 - 0.733) | |
XIAO_FMSG-JLESS_task4_3 | XIAO_FMSG-JLESS_task4_3_FDY | Xiao2024 | 0.503 | 0.574 (0.574 - 0.574) | 0.631 (0.631 - 0.631) | 0.443 (0.443 - 0.443) | 0.869 (0.869 - 0.869) | 0.885 (0.885 - 0.885) | 0.592 (0.592 - 0.592) | 0.611 (0.611 - 0.611) | 0.775 (0.775 - 0.775) | 0.787 (0.787 - 0.787) | |
XIAO_FMSG-JLESS_task4_2 | XIAO_FMSG-JLESS_task4_2_WIDE | Xiao2024 | 0.479 | 0.597 (0.597 - 0.597) | 0.639 (0.639 - 0.639) | 0.489 (0.489 - 0.489) | 0.869 (0.869 - 0.869) | 0.887 (0.887 - 0.887) | 0.598 (0.598 - 0.598) | 0.621 (0.621 - 0.621) | 0.768 (0.768 - 0.768) | 0.786 (0.786 - 0.786) | |
Lyu_SCUT_task4_2 | CCRN_BEATs_2 | Lyu2024 | 0.508 | 0.478 (0.474 - 0.481) | 0.532 (0.530 - 0.533) | 0.369 (0.367 - 0.371) | 0.856 (0.853 - 0.858) | 0.867 (0.865 - 0.869) | 0.519 (0.515 - 0.523) | 0.558 (0.553 - 0.562) | 0.697 (0.693 - 0.703) | 0.735 (0.733 - 0.736) | |
Lyu_SCUT_task4_1 | CCRN_BEATs_1 | Lyu2024 | 0.494 | 0.474 (0.469 - 0.482) | 0.529 (0.523 - 0.539) | 0.361 (0.357 - 0.363) | 0.845 (0.843 - 0.846) | 0.860 (0.859 - 0.861) | 0.456 (0.454 - 0.458) | 0.494 (0.493 - 0.494) | 0.683 (0.678 - 0.687) | 0.710 (0.709 - 0.711) | |
Niu_XJU_task4_1 | DCASE2024 SED system | Niu2024 | 0.493 | 0.465 (0.462 - 0.467) | 0.511 (0.510 - 0.512) | 0.367 (0.363 - 0.369) | 0.863 (0.861 - 0.864) | 0.874 (0.871 - 0.877) | 0.543 (0.537 - 0.547) | 0.565 (0.560 - 0.568) | 0.713 (0.708 - 0.716) | 0.730 (0.727 - 0.731) | |
XIAO_FMSG-JLESS_task4_1 | XIAO_FMSG-JLESS_task4_1_ORL | Xiao2024 | 0.506 | 0.575 (0.575 - 0.575) | 0.627 (0.627 - 0.627) | 0.450 (0.450 - 0.450) | 0.866 (0.866 - 0.866) | 0.885 (0.885 - 0.885) | 0.579 (0.579 - 0.579) | 0.604 (0.604 - 0.604) | 0.764 (0.764 - 0.764) | 0.779 (0.779 - 0.779) | |
Cai_USTC_task4_2 | MAT-SED-CNN | Cai2024 | 0.588 | 0.574 (0.573 - 0.574) | 0.632 (0.632 - 0.632) | 0.473 (0.473 - 0.473) | 0.836 (0.836 - 0.836) | 0.870 (0.870 - 0.870) | 0.615 (0.615 - 0.616) | 0.651 (0.650 - 0.651) | 0.757 (0.757 - 0.757) | 0.788 (0.788 - 0.788) | |
Cai_USTC_task4_1 | MAT-SED | Cai2024 | 0.587 | 0.561 (0.560 - 0.561) | 0.607 (0.606 - 0.607) | 0.478 (0.477 - 0.479) | 0.824 (0.823 - 0.825) | 0.869 (0.868 - 0.869) | 0.590 (0.589 - 0.591) | 0.630 (0.629 - 0.631) | 0.741 (0.739 - 0.742) | 0.769 (0.768 - 0.769) | |
Huang_SJTU_task4_4 | pl_mtl_single | Huang2024 | 0.527 | 0.519 (0.516 - 0.522) | 0.568 (0.561 - 0.575) | 0.417 (0.408 - 0.430) | 0.858 (0.855 - 0.861) | 0.871 (0.870 - 0.874) | 0.550 (0.547 - 0.556) | 0.593 (0.590 - 0.595) | 0.731 (0.730 - 0.732) | 0.757 (0.755 - 0.759) |
MAESTRO dataset
Rank |
Submission code |
Submission name |
Technical Report |
mpAUC (MAESTRO Development dataset) |
mpAUC (MAESTRO Evaluation dataset) |
Segment-based F1 Threshold = 0.5 (MAESTRO evaluation) |
Segment-based F1 Optimal threshold (MAESTRO evaluation) |
---|---|---|---|---|---|---|---|
Schmid_CPJKU_task4_2 | ATST S2.I2 Devtest | Schmid2024 | 0.749 | 0.711 (0.704 - 0.717) | 0.385 (0.376 - 0.392) | 0.585 (0.581 - 0.592) | |
Nam_KAIST_task4_2 | NAM_SED_2 | Nam2024 | 0.773 | 0.738 (0.732 - 0.745) | 0.219 (0.218 - 0.221) | 0.593 (0.592 - 0.594) | |
Schmid_CPJKU_task4_1 | ATST S2.I2 | Schmid2024 | 0.749 | 0.672 (0.669 - 0.676) | 0.380 (0.371 - 0.391) | 0.560 (0.556 - 0.566) | |
Nam_KAIST_task4_1 | NAM_SED_1 | Nam2024 | 0.788 | 0.726 (0.720 - 0.733) | 0.218 (0.216 - 0.221) | 0.588 (0.582 - 0.593) | |
Zhang_BUPT_task4_1 | single_model | Yue2024 | 0.763 | 0.704 (0.704 - 0.705) | 0.474 (0.473 - 0.477) | 0.570 (0.570 - 0.571) | |
Chen_CHT_task4_1 | Chen_CHT_task4_1 | Chen2024 | 0.726 | 0.733 (0.730 - 0.739) | 0.347 (0.332 - 0.361) | 0.603 (0.598 - 0.609) | |
Kim_GIST-HanwhaVision_task4_1 | DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature | Son2024 | 0.686 | 0.665 (0.646 - 0.677) | 0.129 (0.105 - 0.168) | 0.544 (0.533 - 0.552) | |
Chen_CHT_task4_2 | Chen_CHT_task4_2 | Chen2024 | 0.773 | 0.691 (0.663 - 0.708) | 0.366 (0.359 - 0.377) | 0.570 (0.552 - 0.583) | |
Chen_NCUT_task4_3 | Chen_NCUT_SED_system_3 | Chen2024a | 0.697 | 0.675 (0.675 - 0.675) | 0.344 (0.332 - 0.361) | 0.559 (0.559 - 0.559) | |
Chen_NCUT_task4_1 | Chen_NCUT_SED_system_1 | Chen2024a | 0.659 | 0.667 (0.667 - 0.667) | 0.422 (0.419 - 0.426) | 0.542 (0.541 - 0.542) | |
Chen_NCUT_task4_2 | Chen_NCUT_SED_system_2 | Chen2024a | 0.651 | 0.665 (0.659 - 0.669) | 0.478 (0.470 - 0.486) | 0.542 (0.539 - 0.543) | |
LEE_KT_task4_1 | CRNN-Con_with_ATST_BEATs | Lee2024 | 0.734 | 0.684 (0.672 - 0.693) | 0.258 (0.247 - 0.266) | 0.569 (0.557 - 0.576) | |
LEE_KT_task4_2 | FDY-CRNN_with_ATST_and_BEATs | Lee2024 | 0.730 | 0.676 (0.666 - 0.690) | 0.230 (0.219 - 0.238) | 0.567 (0.561 - 0.573) | |
Baseline | DCASE2024 baseline system | Cornell2024 | 0.695 | 0.646 (0.641 - 0.653) | 0.459 (0.435 - 0.475) | 0.534 (0.530 - 0.537) | |
XIAO_FMSG-JLESS_task4_3 | XIAO_FMSG-JLESS_task4_3_FDY | Xiao2024 | 0.737 | 0.553 (0.553 - 0.553) | 0.113 (0.113 - 0.113) | 0.491 (0.491 - 0.491) | |
XIAO_FMSG-JLESS_task4_2 | XIAO_FMSG-JLESS_task4_2_WIDE | Xiao2024 | 0.748 | 0.530 (0.530 - 0.530) | 0.096 (0.096 - 0.096) | 0.480 (0.480 - 0.480) | |
Lyu_SCUT_task4_2 | CCRN_BEATs_2 | Lyu2024 | 0.693 | 0.612 (0.596 - 0.624) | 0.370 (0.354 - 0.380) | 0.523 (0.520 - 0.526) | |
Lyu_SCUT_task4_1 | CCRN_BEATs_1 | Lyu2024 | 0.655 | 0.602 (0.586 - 0.619) | 0.368 (0.346 - 0.387) | 0.510 (0.503 - 0.517) | |
Niu_XJU_task4_1 | DCASE2024 SED system | Niu2024 | 0.657 | 0.603 (0.599 - 0.610) | 0.261 (0.243 - 0.292) | 0.515 (0.512 - 0.521) | |
XIAO_FMSG-JLESS_task4_1 | XIAO_FMSG-JLESS_task4_1_ORL | Xiao2024 | 0.734 | 0.490 (0.490 - 0.490) | 0.096 (0.096 - 0.096) | 0.455 (0.455 - 0.455) | |
Cai_USTC_task4_2 | MAT-SED-CNN | Cai2024 | 0.000 | 0.050 (0.050 - 0.050) | 0.000 (0.000 - 0.000) | 0.168 (0.168 - 0.168) | |
Cai_USTC_task4_1 | MAT-SED | Cai2024 | 0.000 | 0.050 (0.050 - 0.050) | 0.000 (0.000 - 0.000) | 0.168 (0.168 - 0.168) | |
Huang_SJTU_task4_4 | pl_mtl_single | Huang2024 | 0.737 | 0.678 (0.669 - 0.685) | 0.410 (0.393 - 0.434) | 0.556 (0.553 - 0.561) |
With ensembling
Rank |
Submission code |
Submission name |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS (DESED evaluation dataset) |
mpAUC (MAESTRO evaluation dataset) |
PSDS (Development dataset) |
mpAUC (Development dataset) |
---|---|---|---|---|---|---|---|---|
Schmid_CPJKU_task4_4 | Ensemble_15 ATST, BEATs, PaSST Devtest | Schmid2024 | 1.42 | 0.680 (0.679 - 0.682) | 0.739 (0.736 - 0.742) | 0.632 | 0.746 | |
Schmid_CPJKU_task4_3 | Ensemble_18 ATST, BEATs, PaSST | Schmid2024 | 1.39 | 0.676 (0.674 - 0.678) | 0.715 (0.714 - 0.718) | 0.632 | 0.743 | |
Schmid_CPJKU_task4_2 | ATST S2.I2 Devtest | Schmid2024 | 1.35 | 0.646 (0.640 - 0.654) | 0.711 (0.704 - 0.717) | 0.617 | 0.749 | |
Nam_KAIST_task4_4 | NAM_SED_4 | Nam2024 | 1.35 | 0.610 (0.609 - 0.611) | 0.744 (0.744 - 0.745) | 0.491 | 0.695 | |
Nam_KAIST_task4_3 | NAM_SED_3 | Nam2024 | 1.35 | 0.610 (0.609 - 0.611) | 0.744 (0.744 - 0.744) | 0.575 | 0.788 | |
Nam_KAIST_task4_2 | NAM_SED_2 | Nam2024 | 1.32 | 0.586 (0.585 - 0.589) | 0.738 (0.732 - 0.745) | 0.539 | 0.773 | |
Schmid_CPJKU_task4_1 | ATST S2.I2 | Schmid2024 | 1.31 | 0.644 (0.640 - 0.647) | 0.672 (0.669 - 0.676) | 0.617 | 0.749 | |
Nam_KAIST_task4_1 | NAM_SED_1 | Nam2024 | 1.31 | 0.584 (0.582 - 0.587) | 0.726 (0.720 - 0.733) | 0.571 | 0.788 | |
Zhang_BUPT_task4_2 | ensemble_model | Yue2024 | 1.27 | 0.570 (0.566 - 0.573) | 0.691 (0.691 - 0.691) | 0.575 | 0.756 | |
Chen_NCUT_task4_4 | Chen_NCUT_SED_system_4 | Chen2024a | 1.25 | 0.565 (0.563 - 0.566) | 0.684 (0.684 - 0.684) | 0.535 | 0.677 | |
Chen_CHT_task4_3 | Chen_CHT_task4_3 | Chen2024 | 1.25 | 0.527 (0.524 - 0.530) | 0.711 (0.709 - 0.712) | 0.531 | 0.740 | |
Zhang_BUPT_task4_1 | single_model | Yue2024 | 1.23 | 0.523 (0.523 - 0.524) | 0.704 (0.704 - 0.705) | 0.543 | 0.763 | |
Chen_CHT_task4_1 | Chen_CHT_task4_1 | Chen2024 | 1.23 | 0.495 (0.486 - 0.503) | 0.733 (0.730 - 0.739) | 0.498 | 0.726 | |
Kim_GIST-HanwhaVision_task4_1 | DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature | Son2024 | 1.23 | 0.567 (0.558 - 0.573) | 0.665 (0.646 - 0.677) | 0.481 | 0.686 | |
Chen_CHT_task4_2 | Chen_CHT_task4_2 | Chen2024 | 1.23 | 0.527 (0.524 - 0.530) | 0.691 (0.663 - 0.708) | 0.531 | 0.773 | |
Kim_GIST-HanwhaVision_task4_4 | DCASE2024 ensemble model with mix | Son2024 | 1.22 | 0.586 (0.578 - 0.597) | 0.638 (0.620 - 0.654) | 0.509 | 0.700 | |
Kim_GIST-HanwhaVision_task4_2 | DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature | Son2024 | 1.21 | 0.580 (0.560 - 0.599) | 0.629 (0.620 - 0.639) | 0.486 | 0.700 | |
Chen_NCUT_task4_3 | Chen_NCUT_SED_system_3 | Chen2024a | 1.20 | 0.526 (0.524 - 0.527) | 0.675 (0.675 - 0.675) | 0.514 | 0.697 | |
Chen_NCUT_task4_1 | Chen_NCUT_SED_system_1 | Chen2024a | 1.20 | 0.525 (0.523 - 0.527) | 0.667 (0.667 - 0.667) | 0.521 | 0.659 | |
LEE_KT_task4_4 | Ensemble_FDY-Con_with_ATST_and_BEATs | Lee2024 | 1.20 | 0.509 (0.509 - 0.509) | 0.690 (0.690 - 0.690) | 0.507 | 0.757 | |
Chen_CHT_task4_4 | Chen_CHT_task4_4 | Chen2024 | 1.20 | 0.500 (0.498 - 0.504) | 0.691 (0.663 - 0.708) | 0.525 | 0.773 | |
Chen_NCUT_task4_2 | Chen_NCUT_SED_system_2 | Chen2024a | 1.19 | 0.519 (0.485 - 0.537) | 0.665 (0.659 - 0.669) | 0.525 | 0.651 | |
LEE_KT_task4_1 | CRNN-Con_with_ATST_BEATs | Lee2024 | 1.19 | 0.506 (0.482 - 0.548) | 0.684 (0.672 - 0.693) | 0.467 | 0.734 | |
Kim_GIST-HanwhaVision_task4_3 | DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder | Son2024 | 1.18 | 0.542 (0.525 - 0.560) | 0.637 (0.628 - 0.652) | 0.505 | 0.696 | |
LEE_KT_task4_3 | Ensemble_FDY-CON | Lee2024 | 1.17 | 0.468 (0.468 - 0.468) | 0.692 (0.692 - 0.692) | 0.510 | 0.692 | |
XIAO_FMSG-JLESS_task4_4 | XIAO_FMSG-JLESS_task4_4_ENSEMBLE | Xiao2024 | 1.17 | 0.606 (0.606 - 0.606) | 0.566 (0.566 - 0.566) | 0.519 | 0.762 | |
LEE_KT_task4_2 | FDY-CRNN_with_ATST_and_BEATs | Lee2024 | 1.16 | 0.474 (0.471 - 0.479) | 0.676 (0.666 - 0.690) | 0.475 | 0.730 | |
Baseline | DCASE2024 baseline system | Cornell2024 | 1.13 | 0.475 (0.469 - 0.479) | 0.646 (0.641 - 0.653) | 0.491 | 0.695 | |
XIAO_FMSG-JLESS_task4_3 | XIAO_FMSG-JLESS_task4_3_FDY | Xiao2024 | 1.12 | 0.574 (0.574 - 0.574) | 0.553 (0.553 - 0.553) | 0.503 | 0.737 | |
XIAO_FMSG-JLESS_task4_2 | XIAO_FMSG-JLESS_task4_2_WIDE | Xiao2024 | 1.12 | 0.597 (0.597 - 0.597) | 0.530 (0.530 - 0.530) | 0.479 | 0.748 | |
Lyu_SCUT_task4_2 | CCRN_BEATs_2 | Lyu2024 | 1.10 | 0.478 (0.474 - 0.481) | 0.612 (0.596 - 0.624) | 0.508 | 0.693 | |
Lyu_SCUT_task4_1 | CCRN_BEATs_1 | Lyu2024 | 1.08 | 0.474 (0.469 - 0.482) | 0.602 (0.586 - 0.619) | 0.494 | 0.655 | |
Niu_XJU_task4_1 | DCASE2024 SED system | Niu2024 | 1.07 | 0.465 (0.462 - 0.467) | 0.603 (0.599 - 0.610) | 0.493 | 0.657 | |
XIAO_FMSG-JLESS_task4_1 | XIAO_FMSG-JLESS_task4_1_ORL | Xiao2024 | 1.06 | 0.575 (0.575 - 0.575) | 0.490 (0.490 - 0.490) | 0.506 | 0.734 | |
Cai_USTC_task4_2 | MAT-SED-CNN | Cai2024 | 0.63 | 0.574 (0.573 - 0.574) | 0.050 (0.050 - 0.050) | 0.588 | 0.000 | |
Cai_USTC_task4_1 | MAT-SED | Cai2024 | 0.61 | 0.561 (0.560 - 0.561) | 0.050 (0.050 - 0.050) | 0.587 | 0.000 | |
Cai_USTC_task4_4 | MAT-ATST2 | Cai2024 | 0.56 | 0.506 (0.505 - 0.507) | 0.050 (0.050 - 0.050) | 0.600 | 0.000 | |
Cai_USTC_task4_3 | MAT-ATST | Cai2024 | 0.47 | 0.417 (0.402 - 0.428) | 0.050 (0.050 - 0.050) | 0.600 | 0.000 | |
Huang_SJTU_task4_1 | pl_mtl_ensemble | Huang2024 | 0.20 | 0.000 (0.000 - 0.000) | 0.196 (0.189 - 0.202) | 0.545 | 0.759 | |
Huang_SJTU_task4_3 | pl_mtl_ensemble | Huang2024 | 0.17 | 0.000 (0.000 - 0.000) | 0.172 (0.165 - 0.179) | 0.545 | 0.757 | |
Huang_SJTU_task4_2 | pl_mtl_ensemble | Huang2024 | 0.15 | 0.000 (0.000 - 0.000) | 0.149 (0.137 - 0.159) | 0.541 | 0.758 | |
Huang_SJTU_task4_4 | pl_mtl_single | Huang2024 | 1.20 | 0.519 (0.516 - 0.522) | 0.678 (0.669 - 0.685) | 0.527 | 0.737 |
Supplementary metrics
DESED dataset
Rank |
Submission code |
Submission name |
Technical Report |
PSDS (Development dataset) |
PSDS (Evaluation dataset) |
PSDS (DESED public evaluation) |
PSDS (DESED Vimeo dataset) |
Segment-based F1 Threshold = 0.5 (DESED evaluation) |
Segment-based F1 Optimal threshold (DESED evaluation) |
Collar-based F1 Threshold = 0.5 (DESED evaluation) |
Collar-based F1 Optimal threshold (DESED evaluation) |
Intersection-based F1 Threshold = 0.5 (DESED evaluation) |
Intersection-based F1 Optimal threshold (DESED evaluation) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Schmid_CPJKU_task4_4 | Ensemble_15 ATST, BEATs, PaSST Devtest | Schmid2024 | 0.632 | 0.680 (0.679 - 0.682) | 0.733 (0.730 - 0.737) | 0.555 (0.553 - 0.559) | 0.874 (0.872 - 0.876) | 0.903 (0.902 - 0.904) | 0.677 (0.674 - 0.683) | 0.710 (0.708 - 0.711) | 0.801 (0.798 - 0.803) | 0.829 (0.827 - 0.831) | |
Schmid_CPJKU_task4_3 | Ensemble_18 ATST, BEATs, PaSST | Schmid2024 | 0.632 | 0.676 (0.674 - 0.678) | 0.724 (0.722 - 0.726) | 0.560 (0.555 - 0.565) | 0.875 (0.873 - 0.877) | 0.904 (0.904 - 0.905) | 0.670 (0.666 - 0.673) | 0.703 (0.696 - 0.709) | 0.796 (0.793 - 0.797) | 0.827 (0.826 - 0.827) | |
Schmid_CPJKU_task4_2 | ATST S2.I2 Devtest | Schmid2024 | 0.617 | 0.646 (0.640 - 0.654) | 0.695 (0.692 - 0.698) | 0.525 (0.517 - 0.541) | 0.853 (0.849 - 0.858) | 0.883 (0.882 - 0.884) | 0.642 (0.634 - 0.648) | 0.672 (0.666 - 0.676) | 0.772 (0.768 - 0.778) | 0.803 (0.802 - 0.805) | |
Nam_KAIST_task4_4 | NAM_SED_4 | Nam2024 | 0.491 | 0.610 (0.609 - 0.611) | 0.664 (0.663 - 0.665) | 0.468 (0.468 - 0.469) | 0.857 (0.857 - 0.858) | 0.889 (0.888 - 0.889) | 0.664 (0.663 - 0.665) | 0.683 (0.682 - 0.684) | 0.776 (0.775 - 0.776) | 0.796 (0.795 - 0.796) | |
Nam_KAIST_task4_3 | NAM_SED_3 | Nam2024 | 0.575 | 0.610 (0.609 - 0.611) | 0.664 (0.663 - 0.666) | 0.470 (0.469 - 0.470) | 0.858 (0.858 - 0.858) | 0.889 (0.889 - 0.889) | 0.663 (0.662 - 0.665) | 0.683 (0.681 - 0.685) | 0.777 (0.776 - 0.777) | 0.796 (0.796 - 0.797) | |
Nam_KAIST_task4_2 | NAM_SED_2 | Nam2024 | 0.539 | 0.586 (0.585 - 0.589) | 0.640 (0.637 - 0.644) | 0.446 (0.442 - 0.452) | 0.858 (0.857 - 0.859) | 0.884 (0.883 - 0.886) | 0.654 (0.653 - 0.654) | 0.673 (0.671 - 0.675) | 0.772 (0.771 - 0.775) | 0.788 (0.787 - 0.791) | |
Schmid_CPJKU_task4_1 | ATST S2.I2 | Schmid2024 | 0.617 | 0.644 (0.640 - 0.647) | 0.687 (0.684 - 0.688) | 0.545 (0.539 - 0.553) | 0.851 (0.843 - 0.855) | 0.885 (0.881 - 0.889) | 0.638 (0.635 - 0.642) | 0.678 (0.675 - 0.679) | 0.767 (0.765 - 0.769) | 0.805 (0.800 - 0.809) | |
Nam_KAIST_task4_1 | NAM_SED_1 | Nam2024 | 0.571 | 0.584 (0.582 - 0.587) | 0.629 (0.624 - 0.636) | 0.470 (0.463 - 0.476) | 0.860 (0.859 - 0.861) | 0.885 (0.884 - 0.886) | 0.652 (0.649 - 0.655) | 0.673 (0.673 - 0.673) | 0.772 (0.771 - 0.774) | 0.788 (0.787 - 0.789) | |
Zhang_BUPT_task4_2 | ensemble_model | Yue2024 | 0.575 | 0.570 (0.566 - 0.573) | 0.626 (0.623 - 0.630) | 0.469 (0.463 - 0.473) | 0.853 (0.852 - 0.854) | 0.877 (0.875 - 0.879) | 0.614 (0.610 - 0.617) | 0.664 (0.661 - 0.666) | 0.769 (0.765 - 0.772) | 0.792 (0.787 - 0.796) | |
Chen_NCUT_task4_4 | Chen_NCUT_SED_system_4 | Chen2024a | 0.535 | 0.565 (0.563 - 0.566) | 0.613 (0.612 - 0.614) | 0.460 (0.459 - 0.460) | 0.868 (0.867 - 0.869) | 0.888 (0.888 - 0.888) | 0.592 (0.591 - 0.593) | 0.658 (0.657 - 0.658) | 0.755 (0.754 - 0.756) | 0.792 (0.791 - 0.792) | |
Chen_CHT_task4_3 | Chen_CHT_task4_3 | Chen2024 | 0.531 | 0.527 (0.524 - 0.530) | 0.581 (0.577 - 0.584) | 0.422 (0.421 - 0.423) | 0.879 (0.879 - 0.880) | 0.899 (0.898 - 0.899) | 0.562 (0.560 - 0.563) | 0.616 (0.615 - 0.617) | 0.747 (0.745 - 0.749) | 0.782 (0.781 - 0.782) | |
Zhang_BUPT_task4_1 | single_model | Yue2024 | 0.543 | 0.523 (0.523 - 0.524) | 0.572 (0.571 - 0.573) | 0.425 (0.422 - 0.427) | 0.776 (0.774 - 0.777) | 0.866 (0.866 - 0.866) | 0.527 (0.526 - 0.528) | 0.607 (0.606 - 0.608) | 0.662 (0.662 - 0.663) | 0.749 (0.747 - 0.751) | |
Chen_CHT_task4_1 | Chen_CHT_task4_1 | Chen2024 | 0.498 | 0.495 (0.486 - 0.503) | 0.540 (0.532 - 0.548) | 0.412 (0.397 - 0.420) | 0.854 (0.850 - 0.859) | 0.880 (0.878 - 0.885) | 0.561 (0.558 - 0.567) | 0.591 (0.587 - 0.597) | 0.740 (0.738 - 0.745) | 0.763 (0.761 - 0.765) | |
Kim_GIST-HanwhaVision_task4_1 | DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature | Son2024 | 0.481 | 0.567 (0.558 - 0.573) | 0.610 (0.597 - 0.622) | 0.464 (0.460 - 0.470) | 0.846 (0.833 - 0.863) | 0.891 (0.889 - 0.893) | 0.577 (0.568 - 0.589) | 0.630 (0.621 - 0.636) | 0.736 (0.726 - 0.747) | 0.780 (0.778 - 0.781) | |
Chen_CHT_task4_2 | Chen_CHT_task4_2 | Chen2024 | 0.531 | 0.527 (0.524 - 0.530) | 0.581 (0.577 - 0.584) | 0.422 (0.421 - 0.423) | 0.879 (0.879 - 0.880) | 0.899 (0.898 - 0.899) | 0.562 (0.560 - 0.563) | 0.616 (0.615 - 0.617) | 0.747 (0.745 - 0.749) | 0.782 (0.781 - 0.782) | |
Kim_GIST-HanwhaVision_task4_4 | DCASE2024 ensemble model with mix | Son2024 | 0.509 | 0.586 (0.578 - 0.597) | 0.622 (0.609 - 0.639) | 0.496 (0.495 - 0.497) | 0.732 (0.658 - 0.823) | 0.899 (0.897 - 0.902) | 0.557 (0.547 - 0.566) | 0.649 (0.619 - 0.675) | 0.668 (0.618 - 0.731) | 0.792 (0.788 - 0.797) | |
Kim_GIST-HanwhaVision_task4_2 | DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature | Son2024 | 0.486 | 0.580 (0.560 - 0.599) | 0.617 (0.597 - 0.634) | 0.490 (0.472 - 0.514) | 0.843 (0.826 - 0.863) | 0.898 (0.896 - 0.900) | 0.579 (0.550 - 0.614) | 0.624 (0.608 - 0.648) | 0.745 (0.722 - 0.771) | 0.787 (0.776 - 0.799) | |
Chen_NCUT_task4_3 | Chen_NCUT_SED_system_3 | Chen2024a | 0.514 | 0.526 (0.524 - 0.527) | 0.575 (0.572 - 0.576) | 0.430 (0.429 - 0.430) | 0.878 (0.877 - 0.878) | 0.887 (0.887 - 0.887) | 0.572 (0.572 - 0.573) | 0.615 (0.614 - 0.616) | 0.756 (0.754 - 0.757) | 0.773 (0.773 - 0.774) | |
Chen_NCUT_task4_1 | Chen_NCUT_SED_system_1 | Chen2024a | 0.521 | 0.525 (0.523 - 0.527) | 0.575 (0.574 - 0.577) | 0.431 (0.430 - 0.432) | 0.861 (0.861 - 0.861) | 0.877 (0.877 - 0.877) | 0.547 (0.547 - 0.547) | 0.590 (0.589 - 0.590) | 0.740 (0.740 - 0.741) | 0.766 (0.765 - 0.766) | |
LEE_KT_task4_4 | Ensemble_FDY-Con_with_ATST_and_BEATs | Lee2024 | 0.507 | 0.509 (0.509 - 0.509) | 0.544 (0.544 - 0.544) | 0.412 (0.412 - 0.412) | 0.796 (0.796 - 0.796) | 0.850 (0.850 - 0.850) | 0.524 (0.524 - 0.524) | 0.557 (0.557 - 0.557) | 0.688 (0.688 - 0.688) | 0.718 (0.718 - 0.718) | |
Chen_CHT_task4_4 | Chen_CHT_task4_4 | Chen2024 | 0.525 | 0.500 (0.498 - 0.504) | 0.546 (0.541 - 0.551) | 0.412 (0.411 - 0.415) | 0.872 (0.870 - 0.873) | 0.882 (0.880 - 0.883) | 0.520 (0.520 - 0.521) | 0.570 (0.568 - 0.572) | 0.728 (0.724 - 0.732) | 0.756 (0.753 - 0.760) | |
Chen_NCUT_task4_2 | Chen_NCUT_SED_system_2 | Chen2024a | 0.525 | 0.519 (0.485 - 0.537) | 0.576 (0.543 - 0.594) | 0.398 (0.358 - 0.419) | 0.857 (0.855 - 0.860) | 0.873 (0.872 - 0.874) | 0.535 (0.518 - 0.546) | 0.592 (0.577 - 0.601) | 0.723 (0.711 - 0.731) | 0.767 (0.758 - 0.772) | |
LEE_KT_task4_1 | CRNN-Con_with_ATST_BEATs | Lee2024 | 0.467 | 0.506 (0.482 - 0.548) | 0.550 (0.528 - 0.585) | 0.380 (0.340 - 0.441) | 0.816 (0.809 - 0.829) | 0.842 (0.835 - 0.855) | 0.517 (0.479 - 0.575) | 0.558 (0.534 - 0.595) | 0.682 (0.657 - 0.722) | 0.705 (0.685 - 0.739) | |
Kim_GIST-HanwhaVision_task4_3 | DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder | Son2024 | 0.505 | 0.542 (0.525 - 0.560) | 0.572 (0.552 - 0.601) | 0.464 (0.456 - 0.473) | 0.841 (0.832 - 0.847) | 0.894 (0.890 - 0.898) | 0.545 (0.532 - 0.555) | 0.597 (0.576 - 0.613) | 0.723 (0.717 - 0.727) | 0.764 (0.762 - 0.768) | |
LEE_KT_task4_3 | Ensemble_FDY-CON | Lee2024 | 0.510 | 0.468 (0.468 - 0.468) | 0.515 (0.515 - 0.515) | 0.386 (0.386 - 0.387) | 0.801 (0.801 - 0.802) | 0.857 (0.857 - 0.857) | 0.481 (0.481 - 0.481) | 0.555 (0.555 - 0.555) | 0.667 (0.667 - 0.667) | 0.724 (0.724 - 0.724) | |
XIAO_FMSG-JLESS_task4_4 | XIAO_FMSG-JLESS_task4_4_ENSEMBLE | Xiao2024 | 0.519 | 0.606 (0.606 - 0.606) | 0.656 (0.656 - 0.656) | 0.479 (0.479 - 0.479) | 0.875 (0.875 - 0.875) | 0.896 (0.896 - 0.896) | 0.626 (0.626 - 0.626) | 0.645 (0.645 - 0.645) | 0.786 (0.786 - 0.786) | 0.804 (0.804 - 0.804) | |
LEE_KT_task4_2 | FDY-CRNN_with_ATST_and_BEATs | Lee2024 | 0.475 | 0.474 (0.471 - 0.479) | 0.519 (0.515 - 0.524) | 0.378 (0.374 - 0.383) | 0.814 (0.811 - 0.817) | 0.849 (0.841 - 0.862) | 0.477 (0.463 - 0.495) | 0.537 (0.517 - 0.566) | 0.666 (0.664 - 0.668) | 0.712 (0.702 - 0.723) | |
Baseline | DCASE2024 baseline system | Cornell2024 | 0.491 | 0.475 (0.469 - 0.479) | 0.522 (0.516 - 0.527) | 0.380 (0.365 - 0.389) | 0.858 (0.855 - 0.862) | 0.867 (0.863 - 0.873) | 0.474 (0.470 - 0.480) | 0.545 (0.540 - 0.552) | 0.682 (0.674 - 0.687) | 0.726 (0.722 - 0.733) | |
XIAO_FMSG-JLESS_task4_3 | XIAO_FMSG-JLESS_task4_3_FDY | Xiao2024 | 0.503 | 0.574 (0.574 - 0.574) | 0.631 (0.631 - 0.631) | 0.443 (0.443 - 0.443) | 0.869 (0.869 - 0.869) | 0.885 (0.885 - 0.885) | 0.592 (0.592 - 0.592) | 0.611 (0.611 - 0.611) | 0.775 (0.775 - 0.775) | 0.787 (0.787 - 0.787) | |
XIAO_FMSG-JLESS_task4_2 | XIAO_FMSG-JLESS_task4_2_WIDE | Xiao2024 | 0.479 | 0.597 (0.597 - 0.597) | 0.639 (0.639 - 0.639) | 0.489 (0.489 - 0.489) | 0.869 (0.869 - 0.869) | 0.887 (0.887 - 0.887) | 0.598 (0.598 - 0.598) | 0.621 (0.621 - 0.621) | 0.768 (0.768 - 0.768) | 0.786 (0.786 - 0.786) | |
Lyu_SCUT_task4_2 | CCRN_BEATs_2 | Lyu2024 | 0.508 | 0.478 (0.474 - 0.481) | 0.532 (0.530 - 0.533) | 0.369 (0.367 - 0.371) | 0.856 (0.853 - 0.858) | 0.867 (0.865 - 0.869) | 0.519 (0.515 - 0.523) | 0.558 (0.553 - 0.562) | 0.697 (0.693 - 0.703) | 0.735 (0.733 - 0.736) | |
Lyu_SCUT_task4_1 | CCRN_BEATs_1 | Lyu2024 | 0.494 | 0.474 (0.469 - 0.482) | 0.529 (0.523 - 0.539) | 0.361 (0.357 - 0.363) | 0.845 (0.843 - 0.846) | 0.860 (0.859 - 0.861) | 0.456 (0.454 - 0.458) | 0.494 (0.493 - 0.494) | 0.683 (0.678 - 0.687) | 0.710 (0.709 - 0.711) | |
Niu_XJU_task4_1 | DCASE2024 SED system | Niu2024 | 0.493 | 0.465 (0.462 - 0.467) | 0.511 (0.510 - 0.512) | 0.367 (0.363 - 0.369) | 0.863 (0.861 - 0.864) | 0.874 (0.871 - 0.877) | 0.543 (0.537 - 0.547) | 0.565 (0.560 - 0.568) | 0.713 (0.708 - 0.716) | 0.730 (0.727 - 0.731) | |
XIAO_FMSG-JLESS_task4_1 | XIAO_FMSG-JLESS_task4_1_ORL | Xiao2024 | 0.506 | 0.575 (0.575 - 0.575) | 0.627 (0.627 - 0.627) | 0.450 (0.450 - 0.450) | 0.866 (0.866 - 0.866) | 0.885 (0.885 - 0.885) | 0.579 (0.579 - 0.579) | 0.604 (0.604 - 0.604) | 0.764 (0.764 - 0.764) | 0.779 (0.779 - 0.779) | |
Cai_USTC_task4_2 | MAT-SED-CNN | Cai2024 | 0.588 | 0.574 (0.573 - 0.574) | 0.632 (0.632 - 0.632) | 0.473 (0.473 - 0.473) | 0.836 (0.836 - 0.836) | 0.870 (0.870 - 0.870) | 0.615 (0.615 - 0.616) | 0.651 (0.650 - 0.651) | 0.757 (0.757 - 0.757) | 0.788 (0.788 - 0.788) | |
Cai_USTC_task4_1 | MAT-SED | Cai2024 | 0.587 | 0.561 (0.560 - 0.561) | 0.607 (0.606 - 0.607) | 0.478 (0.477 - 0.479) | 0.824 (0.823 - 0.825) | 0.869 (0.868 - 0.869) | 0.590 (0.589 - 0.591) | 0.630 (0.629 - 0.631) | 0.741 (0.739 - 0.742) | 0.769 (0.768 - 0.769) | |
Cai_USTC_task4_4 | MAT-ATST2 | Cai2024 | 0.600 | 0.506 (0.505 - 0.507) | 0.557 (0.556 - 0.557) | 0.406 (0.405 - 0.406) | 0.829 (0.829 - 0.830) | 0.852 (0.852 - 0.852) | 0.603 (0.603 - 0.603) | 0.632 (0.631 - 0.632) | 0.747 (0.747 - 0.747) | 0.771 (0.771 - 0.772) | |
Cai_USTC_task4_3 | MAT-ATST | Cai2024 | 0.600 | 0.417 (0.402 - 0.428) | 0.467 (0.450 - 0.480) | 0.330 (0.321 - 0.338) | 0.752 (0.729 - 0.772) | 0.823 (0.822 - 0.825) | 0.472 (0.424 - 0.510) | 0.536 (0.514 - 0.556) | 0.635 (0.592 - 0.671) | 0.697 (0.673 - 0.715) | |
Huang_SJTU_task4_1 | pl_mtl_ensemble | Huang2024 | 0.545 | 0.000 (0.000 - 0.000) | 0.000 (0.000 - 0.000) | 0.000 (0.000 - 0.000) | 0.000 (0.000 - 0.000) | 0.275 (0.268 - 0.280) | 0.000 (0.000 - 0.000) | 0.100 (0.094 - 0.105) | 0.000 (0.000 - 0.000) | 0.181 (0.176 - 0.186) | |
Huang_SJTU_task4_3 | pl_mtl_ensemble | Huang2024 | 0.545 | 0.000 (0.000 - 0.000) | 0.000 (0.000 - 0.000) | 0.000 (0.000 - 0.000) | 0.000 (0.000 - 0.001) | 0.298 (0.287 - 0.308) | 0.000 (0.000 - 0.000) | 0.122 (0.109 - 0.130) | 0.000 (0.000 - 0.000) | 0.198 (0.189 - 0.204) | |
Huang_SJTU_task4_2 | pl_mtl_ensemble | Huang2024 | 0.541 | 0.000 (0.000 - 0.000) | 0.000 (0.000 - 0.000) | 0.000 (0.000 - 0.000) | 0.006 (0.000 - 0.015) | 0.279 (0.274 - 0.283) | 0.000 (0.000 - 0.000) | 0.083 (0.058 - 0.101) | 0.000 (0.000 - 0.000) | 0.166 (0.137 - 0.184) | |
Huang_SJTU_task4_4 | pl_mtl_single | Huang2024 | 0.527 | 0.519 (0.516 - 0.522) | 0.568 (0.561 - 0.575) | 0.417 (0.408 - 0.430) | 0.858 (0.855 - 0.861) | 0.871 (0.870 - 0.874) | 0.550 (0.547 - 0.556) | 0.593 (0.590 - 0.595) | 0.731 (0.730 - 0.732) | 0.757 (0.755 - 0.759) |
MAESTRO dataset
Rank |
Submission code |
Submission name |
Technical Report |
mpAUC (MAESTRO Development dataset) |
mpAUC (MAESTRO Evaluation dataset) |
Segment-based F1 Threshold = 0.5 (MAESTRO evaluation) |
Segment-based F1 Optimal threshold (MAESTRO evaluation) |
---|---|---|---|---|---|---|---|
Schmid_CPJKU_task4_4 | Ensemble_15 ATST, BEATs, PaSST Devtest | Schmid2024 | 0.746 | 0.739 (0.736 - 0.742) | 0.392 (0.387 - 0.394) | 0.600 (0.597 - 0.604) | |
Schmid_CPJKU_task4_3 | Ensemble_18 ATST, BEATs, PaSST | Schmid2024 | 0.743 | 0.715 (0.714 - 0.718) | 0.379 (0.374 - 0.384) | 0.585 (0.583 - 0.587) | |
Schmid_CPJKU_task4_2 | ATST S2.I2 Devtest | Schmid2024 | 0.749 | 0.711 (0.704 - 0.717) | 0.385 (0.376 - 0.392) | 0.585 (0.581 - 0.592) | |
Nam_KAIST_task4_4 | NAM_SED_4 | Nam2024 | 0.695 | 0.744 (0.744 - 0.745) | 0.214 (0.214 - 0.215) | 0.601 (0.601 - 0.602) | |
Nam_KAIST_task4_3 | NAM_SED_3 | Nam2024 | 0.788 | 0.744 (0.744 - 0.744) | 0.213 (0.213 - 0.214) | 0.600 (0.600 - 0.601) | |
Nam_KAIST_task4_2 | NAM_SED_2 | Nam2024 | 0.773 | 0.738 (0.732 - 0.745) | 0.219 (0.218 - 0.221) | 0.593 (0.592 - 0.594) | |
Schmid_CPJKU_task4_1 | ATST S2.I2 | Schmid2024 | 0.749 | 0.672 (0.669 - 0.676) | 0.380 (0.371 - 0.391) | 0.560 (0.556 - 0.566) | |
Nam_KAIST_task4_1 | NAM_SED_1 | Nam2024 | 0.788 | 0.726 (0.720 - 0.733) | 0.218 (0.216 - 0.221) | 0.588 (0.582 - 0.593) | |
Zhang_BUPT_task4_2 | ensemble_model | Yue2024 | 0.756 | 0.691 (0.691 - 0.691) | 0.485 (0.480 - 0.491) | 0.565 (0.563 - 0.568) | |
Chen_NCUT_task4_4 | Chen_NCUT_SED_system_4 | Chen2024a | 0.677 | 0.684 (0.684 - 0.684) | 0.461 (0.419 - 0.489) | 0.560 (0.559 - 0.560) | |
Chen_CHT_task4_3 | Chen_CHT_task4_3 | Chen2024 | 0.740 | 0.711 (0.709 - 0.712) | 0.344 (0.337 - 0.349) | 0.589 (0.588 - 0.591) | |
Zhang_BUPT_task4_1 | single_model | Yue2024 | 0.763 | 0.704 (0.704 - 0.705) | 0.474 (0.473 - 0.477) | 0.570 (0.570 - 0.571) | |
Chen_CHT_task4_1 | Chen_CHT_task4_1 | Chen2024 | 0.726 | 0.733 (0.730 - 0.739) | 0.347 (0.332 - 0.361) | 0.603 (0.598 - 0.609) | |
Kim_GIST-HanwhaVision_task4_1 | DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature | Son2024 | 0.686 | 0.665 (0.646 - 0.677) | 0.129 (0.105 - 0.168) | 0.544 (0.533 - 0.552) | |
Chen_CHT_task4_2 | Chen_CHT_task4_2 | Chen2024 | 0.773 | 0.691 (0.663 - 0.708) | 0.366 (0.359 - 0.377) | 0.570 (0.552 - 0.583) | |
Kim_GIST-HanwhaVision_task4_4 | DCASE2024 ensemble model with mix | Son2024 | 0.700 | 0.638 (0.620 - 0.654) | 0.049 (0.012 - 0.111) | 0.542 (0.533 - 0.553) | |
Kim_GIST-HanwhaVision_task4_2 | DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature | Son2024 | 0.700 | 0.629 (0.620 - 0.639) | 0.131 (0.104 - 0.156) | 0.533 (0.530 - 0.537) | |
Chen_NCUT_task4_3 | Chen_NCUT_SED_system_3 | Chen2024a | 0.697 | 0.675 (0.675 - 0.675) | 0.344 (0.332 - 0.361) | 0.559 (0.559 - 0.559) | |
Chen_NCUT_task4_1 | Chen_NCUT_SED_system_1 | Chen2024a | 0.659 | 0.667 (0.667 - 0.667) | 0.422 (0.419 - 0.426) | 0.542 (0.541 - 0.542) | |
LEE_KT_task4_4 | Ensemble_FDY-Con_with_ATST_and_BEATs | Lee2024 | 0.757 | 0.690 (0.690 - 0.690) | 0.232 (0.232 - 0.232) | 0.572 (0.572 - 0.572) | |
Chen_CHT_task4_4 | Chen_CHT_task4_4 | Chen2024 | 0.773 | 0.691 (0.663 - 0.708) | 0.366 (0.359 - 0.377) | 0.570 (0.552 - 0.583) | |
Chen_NCUT_task4_2 | Chen_NCUT_SED_system_2 | Chen2024a | 0.651 | 0.665 (0.659 - 0.669) | 0.478 (0.470 - 0.486) | 0.542 (0.539 - 0.543) | |
LEE_KT_task4_1 | CRNN-Con_with_ATST_BEATs | Lee2024 | 0.734 | 0.684 (0.672 - 0.693) | 0.258 (0.247 - 0.266) | 0.569 (0.557 - 0.576) | |
Kim_GIST-HanwhaVision_task4_3 | DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder | Son2024 | 0.696 | 0.637 (0.628 - 0.652) | 0.144 (0.122 - 0.179) | 0.537 (0.528 - 0.550) | |
LEE_KT_task4_3 | Ensemble_FDY-CON | Lee2024 | 0.692 | 0.692 (0.692 - 0.692) | 0.211 (0.211 - 0.211) | 0.575 (0.575 - 0.576) | |
XIAO_FMSG-JLESS_task4_4 | XIAO_FMSG-JLESS_task4_4_ENSEMBLE | Xiao2024 | 0.762 | 0.566 (0.566 - 0.566) | 0.091 (0.091 - 0.091) | 0.517 (0.517 - 0.517) | |
LEE_KT_task4_2 | FDY-CRNN_with_ATST_and_BEATs | Lee2024 | 0.730 | 0.676 (0.666 - 0.690) | 0.230 (0.219 - 0.238) | 0.567 (0.561 - 0.573) | |
Baseline | DCASE2024 baseline system | Cornell2024 | 0.695 | 0.646 (0.641 - 0.653) | 0.459 (0.435 - 0.475) | 0.534 (0.530 - 0.537) | |
XIAO_FMSG-JLESS_task4_3 | XIAO_FMSG-JLESS_task4_3_FDY | Xiao2024 | 0.737 | 0.553 (0.553 - 0.553) | 0.113 (0.113 - 0.113) | 0.491 (0.491 - 0.491) | |
XIAO_FMSG-JLESS_task4_2 | XIAO_FMSG-JLESS_task4_2_WIDE | Xiao2024 | 0.748 | 0.530 (0.530 - 0.530) | 0.096 (0.096 - 0.096) | 0.480 (0.480 - 0.480) | |
Lyu_SCUT_task4_2 | CCRN_BEATs_2 | Lyu2024 | 0.693 | 0.612 (0.596 - 0.624) | 0.370 (0.354 - 0.380) | 0.523 (0.520 - 0.526) | |
Lyu_SCUT_task4_1 | CCRN_BEATs_1 | Lyu2024 | 0.655 | 0.602 (0.586 - 0.619) | 0.368 (0.346 - 0.387) | 0.510 (0.503 - 0.517) | |
Niu_XJU_task4_1 | DCASE2024 SED system | Niu2024 | 0.657 | 0.603 (0.599 - 0.610) | 0.261 (0.243 - 0.292) | 0.515 (0.512 - 0.521) | |
XIAO_FMSG-JLESS_task4_1 | XIAO_FMSG-JLESS_task4_1_ORL | Xiao2024 | 0.734 | 0.490 (0.490 - 0.490) | 0.096 (0.096 - 0.096) | 0.455 (0.455 - 0.455) | |
Cai_USTC_task4_2 | MAT-SED-CNN | Cai2024 | 0.000 | 0.050 (0.050 - 0.050) | 0.000 (0.000 - 0.000) | 0.168 (0.168 - 0.168) | |
Cai_USTC_task4_1 | MAT-SED | Cai2024 | 0.000 | 0.050 (0.050 - 0.050) | 0.000 (0.000 - 0.000) | 0.168 (0.168 - 0.168) | |
Cai_USTC_task4_4 | MAT-ATST2 | Cai2024 | 0.000 | 0.050 (0.050 - 0.050) | 0.000 (0.000 - 0.000) | 0.168 (0.168 - 0.168) | |
Cai_USTC_task4_3 | MAT-ATST | Cai2024 | 0.000 | 0.050 (0.050 - 0.050) | 0.000 (0.000 - 0.000) | 0.168 (0.168 - 0.168) | |
Huang_SJTU_task4_1 | pl_mtl_ensemble | Huang2024 | 0.759 | 0.196 (0.189 - 0.202) | 0.000 (0.000 - 0.000) | 0.270 (0.268 - 0.272) | |
Huang_SJTU_task4_3 | pl_mtl_ensemble | Huang2024 | 0.757 | 0.172 (0.165 - 0.179) | 0.000 (0.000 - 0.000) | 0.294 (0.291 - 0.296) | |
Huang_SJTU_task4_2 | pl_mtl_ensemble | Huang2024 | 0.758 | 0.149 (0.137 - 0.159) | 0.000 (0.000 - 0.000) | 0.252 (0.240 - 0.262) | |
Huang_SJTU_task4_4 | pl_mtl_single | Huang2024 | 0.737 | 0.678 (0.669 - 0.685) | 0.410 (0.393 - 0.434) | 0.556 (0.553 - 0.561) |
Class-wise performance
DESED
PSDS
Rank |
Submission code |
Submission name |
Technical Report |
Ranking score (DESED evaluation dataset) |
Alarm Bell Ringing |
Blender | Cat | Dishes | Dog |
Electric shave toothbrush |
Frying |
Running water |
Speech |
Vacuum cleaner |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Schmid_CPJKU_task4_4 | Ensemble_15 ATST, BEATs, PaSST Devtest | Schmid2024 | 1.42 | 0.812 (0.797 - 0.830) | 0.948 (0.945 - 0.950) | 0.877 (0.872 - 0.881) | 0.523 (0.517 - 0.527) | 0.704 (0.696 - 0.708) | 0.823 (0.799 - 0.843) | 0.880 (0.879 - 0.881) | 0.710 (0.706 - 0.717) | 0.854 (0.851 - 0.857) | 0.930 (0.922 - 0.939) | |
Schmid_CPJKU_task4_3 | Ensemble_18 ATST, BEATs, PaSST | Schmid2024 | 1.39 | 0.795 (0.793 - 0.797) | 0.964 (0.959 - 0.970) | 0.874 (0.872 - 0.875) | 0.512 (0.508 - 0.515) | 0.717 (0.705 - 0.725) | 0.815 (0.790 - 0.835) | 0.884 (0.880 - 0.889) | 0.708 (0.697 - 0.715) | 0.852 (0.851 - 0.853) | 0.920 (0.914 - 0.924) | |
Schmid_CPJKU_task4_2 | ATST S2.I2 Devtest | Schmid2024 | 1.35 | 0.772 (0.752 - 0.787) | 0.928 (0.921 - 0.936) | 0.881 (0.873 - 0.888) | 0.475 (0.462 - 0.485) | 0.699 (0.691 - 0.712) | 0.776 (0.749 - 0.807) | 0.830 (0.823 - 0.835) | 0.681 (0.668 - 0.698) | 0.841 (0.838 - 0.843) | 0.895 (0.882 - 0.907) | |
Nam_KAIST_task4_4 | NAM_SED_4 | Nam2024 | 1.35 | 0.741 (0.739 - 0.744) | 0.929 (0.928 - 0.930) | 0.829 (0.827 - 0.831) | 0.413 (0.412 - 0.415) | 0.627 (0.626 - 0.627) | 0.864 (0.860 - 0.866) | 0.778 (0.778 - 0.779) | 0.712 (0.711 - 0.713) | 0.779 (0.778 - 0.779) | 0.930 (0.930 - 0.930) | |
Nam_KAIST_task4_3 | NAM_SED_3 | Nam2024 | 1.35 | 0.739 (0.738 - 0.740) | 0.931 (0.931 - 0.932) | 0.833 (0.831 - 0.834) | 0.416 (0.415 - 0.417) | 0.625 (0.622 - 0.627) | 0.869 (0.866 - 0.871) | 0.787 (0.783 - 0.790) | 0.703 (0.702 - 0.705) | 0.779 (0.778 - 0.780) | 0.930 (0.929 - 0.931) | |
Nam_KAIST_task4_2 | NAM_SED_2 | Nam2024 | 1.32 | 0.741 (0.736 - 0.745) | 0.895 (0.888 - 0.908) | 0.831 (0.827 - 0.836) | 0.394 (0.389 - 0.398) | 0.616 (0.609 - 0.621) | 0.825 (0.813 - 0.833) | 0.764 (0.756 - 0.771) | 0.636 (0.633 - 0.640) | 0.762 (0.758 - 0.765) | 0.914 (0.909 - 0.918) | |
Schmid_CPJKU_task4_1 | ATST S2.I2 | Schmid2024 | 1.31 | 0.785 (0.773 - 0.796) | 0.922 (0.913 - 0.930) | 0.893 (0.887 - 0.899) | 0.458 (0.448 - 0.468) | 0.701 (0.688 - 0.719) | 0.809 (0.795 - 0.819) | 0.820 (0.812 - 0.828) | 0.673 (0.659 - 0.685) | 0.843 (0.839 - 0.848) | 0.892 (0.887 - 0.895) | |
Nam_KAIST_task4_1 | NAM_SED_1 | Nam2024 | 1.31 | 0.764 (0.756 - 0.770) | 0.874 (0.864 - 0.884) | 0.829 (0.826 - 0.832) | 0.395 (0.384 - 0.401) | 0.604 (0.596 - 0.610) | 0.820 (0.805 - 0.841) | 0.755 (0.747 - 0.760) | 0.623 (0.599 - 0.639) | 0.771 (0.766 - 0.776) | 0.912 (0.905 - 0.916) | |
Zhang_BUPT_task4_2 | ensemble_model | Yue2024 | 1.27 | 0.712 (0.702 - 0.729) | 0.855 (0.852 - 0.858) | 0.840 (0.840 - 0.841) | 0.395 (0.394 - 0.398) | 0.581 (0.578 - 0.584) | 0.737 (0.708 - 0.764) | 0.766 (0.745 - 0.783) | 0.614 (0.606 - 0.621) | 0.827 (0.826 - 0.828) | 0.895 (0.890 - 0.898) | |
Chen_NCUT_task4_4 | Chen_NCUT_SED_system_4 | Chen2024a | 1.25 | 0.731 (0.731 - 0.731) | 0.842 (0.838 - 0.846) | 0.792 (0.791 - 0.792) | 0.352 (0.352 - 0.353) | 0.571 (0.570 - 0.571) | 0.780 (0.780 - 0.780) | 0.819 (0.819 - 0.820) | 0.663 (0.663 - 0.664) | 0.815 (0.813 - 0.817) | 0.887 (0.887 - 0.887) | |
Chen_CHT_task4_3 | Chen_CHT_task4_3 | Chen2024 | 1.25 | 0.659 (0.650 - 0.668) | 0.799 (0.797 - 0.803) | 0.779 (0.777 - 0.781) | 0.330 (0.328 - 0.333) | 0.491 (0.486 - 0.500) | 0.806 (0.800 - 0.812) | 0.780 (0.772 - 0.787) | 0.647 (0.637 - 0.660) | 0.804 (0.801 - 0.807) | 0.898 (0.897 - 0.898) | |
Zhang_BUPT_task4_1 | single_model | Yue2024 | 1.23 | 0.610 (0.608 - 0.611) | 0.865 (0.863 - 0.868) | 0.814 (0.813 - 0.814) | 0.366 (0.366 - 0.366) | 0.494 (0.492 - 0.496) | 0.762 (0.761 - 0.762) | 0.802 (0.798 - 0.807) | 0.588 (0.587 - 0.589) | 0.760 (0.757 - 0.762) | 0.870 (0.864 - 0.874) | |
Chen_CHT_task4_1 | Chen_CHT_task4_1 | Chen2024 | 1.23 | 0.648 (0.627 - 0.668) | 0.763 (0.756 - 0.771) | 0.810 (0.801 - 0.822) | 0.305 (0.289 - 0.315) | 0.474 (0.451 - 0.504) | 0.815 (0.804 - 0.821) | 0.763 (0.751 - 0.779) | 0.571 (0.554 - 0.597) | 0.741 (0.728 - 0.757) | 0.877 (0.863 - 0.888) | |
Kim_GIST-HanwhaVision_task4_1 | DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature | Son2024 | 1.23 | 0.764 (0.757 - 0.774) | 0.883 (0.880 - 0.885) | 0.796 (0.793 - 0.798) | 0.346 (0.344 - 0.347) | 0.519 (0.495 - 0.539) | 0.861 (0.858 - 0.865) | 0.884 (0.878 - 0.890) | 0.687 (0.679 - 0.692) | 0.790 (0.770 - 0.806) | 0.898 (0.889 - 0.904) | |
Chen_CHT_task4_2 | Chen_CHT_task4_2 | Chen2024 | 1.23 | 0.659 (0.650 - 0.668) | 0.799 (0.797 - 0.803) | 0.779 (0.777 - 0.781) | 0.330 (0.328 - 0.333) | 0.491 (0.486 - 0.500) | 0.806 (0.800 - 0.812) | 0.780 (0.772 - 0.787) | 0.647 (0.637 - 0.660) | 0.804 (0.801 - 0.807) | 0.898 (0.897 - 0.898) | |
Kim_GIST-HanwhaVision_task4_4 | DCASE2024 ensemble model with mix | Son2024 | 1.22 | 0.774 (0.760 - 0.784) | 0.918 (0.916 - 0.919) | 0.819 (0.803 - 0.831) | 0.361 (0.348 - 0.377) | 0.540 (0.526 - 0.560) | 0.886 (0.879 - 0.895) | 0.902 (0.896 - 0.906) | 0.720 (0.713 - 0.726) | 0.799 (0.791 - 0.806) | 0.916 (0.908 - 0.926) | |
Kim_GIST-HanwhaVision_task4_2 | DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature | Son2024 | 1.21 | 0.791 (0.763 - 0.832) | 0.902 (0.899 - 0.904) | 0.800 (0.790 - 0.809) | 0.362 (0.341 - 0.383) | 0.518 (0.488 - 0.541) | 0.890 (0.882 - 0.900) | 0.900 (0.892 - 0.910) | 0.713 (0.707 - 0.721) | 0.795 (0.785 - 0.806) | 0.925 (0.921 - 0.928) | |
Chen_NCUT_task4_3 | Chen_NCUT_SED_system_3 | Chen2024a | 1.20 | 0.736 (0.735 - 0.736) | 0.849 (0.848 - 0.850) | 0.716 (0.716 - 0.717) | 0.330 (0.329 - 0.330) | 0.455 (0.453 - 0.457) | 0.830 (0.830 - 0.831) | 0.791 (0.790 - 0.791) | 0.657 (0.657 - 0.657) | 0.787 (0.785 - 0.789) | 0.883 (0.883 - 0.883) | |
Chen_NCUT_task4_1 | Chen_NCUT_SED_system_1 | Chen2024a | 1.20 | 0.724 (0.724 - 0.725) | 0.829 (0.824 - 0.835) | 0.788 (0.787 - 0.789) | 0.339 (0.339 - 0.339) | 0.562 (0.561 - 0.563) | 0.593 (0.589 - 0.597) | 0.812 (0.812 - 0.812) | 0.582 (0.581 - 0.582) | 0.811 (0.810 - 0.812) | 0.885 (0.883 - 0.886) | |
LEE_KT_task4_4 | Ensemble_FDY-Con_with_ATST_and_BEATs | Lee2024 | 1.20 | 0.633 (0.633 - 0.633) | 0.835 (0.835 - 0.835) | 0.761 (0.761 - 0.761) | 0.320 (0.320 - 0.320) | 0.455 (0.455 - 0.455) | 0.725 (0.725 - 0.725) | 0.712 (0.712 - 0.712) | 0.659 (0.659 - 0.659) | 0.766 (0.766 - 0.766) | 0.853 (0.853 - 0.853) | |
Chen_CHT_task4_4 | Chen_CHT_task4_4 | Chen2024 | 1.20 | 0.649 (0.627 - 0.668) | 0.778 (0.778 - 0.778) | 0.802 (0.781 - 0.827) | 0.282 (0.282 - 0.282) | 0.455 (0.455 - 0.455) | 0.724 (0.724 - 0.724) | 0.816 (0.816 - 0.816) | 0.652 (0.652 - 0.652) | 0.783 (0.783 - 0.783) | 0.853 (0.853 - 0.853) | |
Chen_NCUT_task4_2 | Chen_NCUT_SED_system_2 | Chen2024a | 1.19 | 0.697 (0.683 - 0.706) | 0.831 (0.818 - 0.841) | 0.763 (0.759 - 0.766) | 0.314 (0.304 - 0.320) | 0.553 (0.546 - 0.558) | 0.721 (0.679 - 0.748) | 0.780 (0.756 - 0.795) | 0.608 (0.575 - 0.627) | 0.750 (0.677 - 0.795) | 0.858 (0.845 - 0.865) | |
LEE_KT_task4_1 | CRNN-Con_with_ATST_BEATs | Lee2024 | 1.19 | 0.615 (0.556 - 0.678) | 0.821 (0.786 - 0.871) | 0.714 (0.689 - 0.740) | 0.348 (0.302 - 0.416) | 0.454 (0.437 - 0.467) | 0.748 (0.719 - 0.793) | 0.719 (0.685 - 0.760) | 0.615 (0.572 - 0.667) | 0.756 (0.752 - 0.759) | 0.808 (0.760 - 0.881) | |
Kim_GIST-HanwhaVision_task4_3 | DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder | Son2024 | 1.18 | 0.711 (0.688 - 0.734) | 0.910 (0.906 - 0.914) | 0.804 (0.782 - 0.821) | 0.331 (0.312 - 0.347) | 0.455 (0.409 - 0.504) | 0.877 (0.867 - 0.882) | 0.876 (0.865 - 0.892) | 0.690 (0.680 - 0.700) | 0.780 (0.757 - 0.795) | 0.922 (0.917 - 0.929) | |
LEE_KT_task4_3 | Ensemble_FDY-CON | Lee2024 | 1.17 | 0.586 (0.586 - 0.586) | 0.796 (0.796 - 0.796) | 0.744 (0.744 - 0.744) | 0.273 (0.273 - 0.273) | 0.472 (0.472 - 0.472) | 0.605 (0.605 - 0.605) | 0.728 (0.728 - 0.728) | 0.574 (0.574 - 0.574) | 0.792 (0.792 - 0.792) | 0.856 (0.856 - 0.856) | |
XIAO_FMSG-JLESS_task4_4 | XIAO_FMSG-JLESS_task4_4_ENSEMBLE | Xiao2024 | 1.17 | 0.792 (0.792 - 0.792) | 0.901 (0.901 - 0.901) | 0.779 (0.779 - 0.779) | 0.397 (0.397 - 0.397) | 0.595 (0.595 - 0.595) | 0.879 (0.879 - 0.879) | 0.904 (0.904 - 0.904) | 0.714 (0.714 - 0.714) | 0.792 (0.792 - 0.792) | 0.869 (0.869 - 0.869) | |
LEE_KT_task4_2 | FDY-CRNN_with_ATST_and_BEATs | Lee2024 | 1.16 | 0.582 (0.535 - 0.626) | 0.801 (0.761 - 0.854) | 0.698 (0.683 - 0.714) | 0.319 (0.309 - 0.325) | 0.470 (0.382 - 0.540) | 0.577 (0.549 - 0.624) | 0.763 (0.742 - 0.797) | 0.572 (0.547 - 0.601) | 0.778 (0.762 - 0.790) | 0.827 (0.805 - 0.851) | |
Baseline | DCASE2024 baseline system | Cornell2024 | 1.13 | 0.634 (0.611 - 0.654) | 0.793 (0.781 - 0.808) | 0.720 (0.707 - 0.734) | 0.303 (0.289 - 0.316) | 0.454 (0.442 - 0.466) | 0.576 (0.558 - 0.592) | 0.754 (0.728 - 0.786) | 0.587 (0.578 - 0.598) | 0.779 (0.774 - 0.783) | 0.855 (0.840 - 0.872) | |
XIAO_FMSG-JLESS_task4_3 | XIAO_FMSG-JLESS_task4_3_FDY | Xiao2024 | 1.12 | 0.769 (0.769 - 0.769) | 0.871 (0.871 - 0.871) | 0.764 (0.764 - 0.764) | 0.354 (0.354 - 0.354) | 0.575 (0.575 - 0.575) | 0.861 (0.861 - 0.861) | 0.893 (0.893 - 0.893) | 0.669 (0.669 - 0.669) | 0.760 (0.760 - 0.760) | 0.858 (0.858 - 0.858) | |
XIAO_FMSG-JLESS_task4_2 | XIAO_FMSG-JLESS_task4_2_WIDE | Xiao2024 | 1.12 | 0.745 (0.745 - 0.745) | 0.878 (0.878 - 0.878) | 0.764 (0.764 - 0.764) | 0.410 (0.410 - 0.410) | 0.643 (0.643 - 0.643) | 0.857 (0.857 - 0.857) | 0.755 (0.755 - 0.755) | 0.646 (0.646 - 0.646) | 0.798 (0.798 - 0.798) | 0.878 (0.878 - 0.878) | |
Lyu_SCUT_task4_2 | CCRN_BEATs_2 | Lyu2024 | 1.10 | 0.582 (0.554 - 0.618) | 0.837 (0.834 - 0.840) | 0.757 (0.751 - 0.763) | 0.302 (0.293 - 0.308) | 0.467 (0.450 - 0.485) | 0.781 (0.781 - 0.782) | 0.683 (0.662 - 0.699) | 0.552 (0.542 - 0.562) | 0.748 (0.741 - 0.753) | 0.870 (0.862 - 0.881) | |
Lyu_SCUT_task4_1 | CCRN_BEATs_1 | Lyu2024 | 1.08 | 0.560 (0.534 - 0.591) | 0.800 (0.796 - 0.807) | 0.674 (0.671 - 0.677) | 0.312 (0.302 - 0.322) | 0.471 (0.465 - 0.481) | 0.685 (0.671 - 0.706) | 0.728 (0.718 - 0.738) | 0.558 (0.549 - 0.571) | 0.740 (0.731 - 0.751) | 0.857 (0.839 - 0.874) | |
Niu_XJU_task4_1 | DCASE2024 SED system | Niu2024 | 1.07 | 0.605 (0.603 - 0.608) | 0.807 (0.800 - 0.819) | 0.765 (0.763 - 0.769) | 0.285 (0.278 - 0.289) | 0.368 (0.361 - 0.380) | 0.761 (0.747 - 0.769) | 0.777 (0.754 - 0.790) | 0.575 (0.555 - 0.586) | 0.758 (0.755 - 0.763) | 0.861 (0.858 - 0.867) | |
XIAO_FMSG-JLESS_task4_1 | XIAO_FMSG-JLESS_task4_1_ORL | Xiao2024 | 1.06 | 0.721 (0.721 - 0.721) | 0.874 (0.874 - 0.874) | 0.748 (0.748 - 0.748) | 0.373 (0.373 - 0.373) | 0.565 (0.565 - 0.565) | 0.866 (0.866 - 0.866) | 0.877 (0.877 - 0.877) | 0.668 (0.668 - 0.668) | 0.820 (0.820 - 0.820) | 0.804 (0.804 - 0.804) | |
Cai_USTC_task4_2 | MAT-SED-CNN | Cai2024 | 0.63 | 0.694 (0.694 - 0.694) | 0.804 (0.804 - 0.804) | 0.850 (0.850 - 0.850) | 0.429 (0.429 - 0.429) | 0.590 (0.590 - 0.590) | 0.830 (0.826 - 0.832) | 0.814 (0.814 - 0.814) | 0.559 (0.557 - 0.561) | 0.804 (0.804 - 0.804) | 0.908 (0.908 - 0.908) | |
Cai_USTC_task4_1 | MAT-SED | Cai2024 | 0.61 | 0.655 (0.655 - 0.655) | 0.797 (0.786 - 0.812) | 0.821 (0.821 - 0.821) | 0.414 (0.414 - 0.414) | 0.556 (0.552 - 0.560) | 0.744 (0.743 - 0.744) | 0.829 (0.829 - 0.829) | 0.595 (0.595 - 0.595) | 0.802 (0.802 - 0.802) | 0.863 (0.860 - 0.867) | |
Cai_USTC_task4_4 | MAT-ATST2 | Cai2024 | 0.56 | 0.637 (0.635 - 0.639) | 0.719 (0.715 - 0.722) | 0.803 (0.803 - 0.804) | 0.386 (0.386 - 0.387) | 0.547 (0.546 - 0.547) | 0.676 (0.676 - 0.676) | 0.795 (0.795 - 0.796) | 0.546 (0.546 - 0.546) | 0.699 (0.699 - 0.700) | 0.842 (0.841 - 0.842) | |
Cai_USTC_task4_3 | MAT-ATST | Cai2024 | 0.47 | 0.502 (0.468 - 0.525) | 0.625 (0.623 - 0.627) | 0.678 (0.619 - 0.729) | 0.282 (0.270 - 0.292) | 0.483 (0.471 - 0.493) | 0.606 (0.602 - 0.609) | 0.790 (0.790 - 0.791) | 0.498 (0.497 - 0.499) | 0.624 (0.611 - 0.635) | 0.830 (0.829 - 0.830) | |
Huang_SJTU_task4_1 | pl_mtl_ensemble | Huang2024 | 0.20 | 0.000 (0.000 - 0.001) | 0.000 (0.000 - 0.000) | 0.018 (0.018 - 0.018) | 0.001 (0.000 - 0.003) | 0.007 (0.001 - 0.014) | 0.005 (0.005 - 0.005) | 0.036 (0.022 - 0.050) | 0.000 (0.000 - 0.000) | 0.001 (0.001 - 0.001) | 0.000 (0.000 - 0.000) | |
Huang_SJTU_task4_3 | pl_mtl_ensemble | Huang2024 | 0.17 | 0.005 (0.001 - 0.011) | 0.000 (0.000 - 0.000) | 0.005 (0.001 - 0.007) | 0.036 (0.029 - 0.045) | 0.016 (0.013 - 0.019) | 0.030 (0.016 - 0.040) | 0.000 (0.000 - 0.001) | 0.000 (0.000 - 0.000) | 0.002 (0.001 - 0.002) | 0.000 (0.000 - 0.000) | |
Huang_SJTU_task4_2 | pl_mtl_ensemble | Huang2024 | 0.15 | 0.002 (0.000 - 0.006) | 0.000 (0.000 - 0.000) | 0.009 (0.000 - 0.020) | 0.000 (0.000 - 0.001) | 0.010 (0.001 - 0.024) | 0.004 (0.001 - 0.006) | 0.063 (0.034 - 0.085) | 0.001 (0.000 - 0.001) | 0.002 (0.001 - 0.002) | 0.000 (0.000 - 0.000) | |
Huang_SJTU_task4_4 | pl_mtl_single | Huang2024 | 1.20 | 0.596 (0.566 - 0.622) | 0.826 (0.817 - 0.837) | 0.759 (0.748 - 0.771) | 0.325 (0.320 - 0.329) | 0.549 (0.545 - 0.554) | 0.767 (0.763 - 0.770) | 0.787 (0.778 - 0.800) | 0.563 (0.555 - 0.574) | 0.818 (0.818 - 0.819) | 0.850 (0.847 - 0.852) |
mpAUC
Rank |
Submission code |
Submission name |
Technical Report |
Ranking score (DESED evaluation dataset) |
Alarm Bell Ringing |
Blender | Cat | Dishes | Dog |
Electric shave toothbrush |
Frying |
Running water |
Speech |
Vacuum cleaner |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Schmid_CPJKU_task4_4 | Ensemble_15 ATST, BEATs, PaSST Devtest | Schmid2024 | 1.42 | 0.977 (0.975 - 0.978) | 0.935 (0.931 - 0.941) | 0.976 (0.975 - 0.976) | 0.888 (0.887 - 0.889) | 0.973 (0.972 - 0.974) | 0.929 (0.926 - 0.931) | 0.905 (0.904 - 0.907) | 0.776 (0.774 - 0.779) | 0.945 (0.944 - 0.945) | 0.949 (0.946 - 0.953) | |
Schmid_CPJKU_task4_3 | Ensemble_18 ATST, BEATs, PaSST | Schmid2024 | 1.39 | 0.975 (0.975 - 0.976) | 0.934 (0.930 - 0.938) | 0.975 (0.975 - 0.976) | 0.886 (0.884 - 0.888) | 0.976 (0.975 - 0.976) | 0.930 (0.930 - 0.931) | 0.906 (0.903 - 0.909) | 0.772 (0.770 - 0.775) | 0.943 (0.942 - 0.944) | 0.949 (0.949 - 0.950) | |
Schmid_CPJKU_task4_2 | ATST S2.I2 Devtest | Schmid2024 | 1.35 | 0.961 (0.959 - 0.963) | 0.908 (0.904 - 0.910) | 0.977 (0.972 - 0.981) | 0.878 (0.872 - 0.883) | 0.969 (0.967 - 0.971) | 0.905 (0.895 - 0.918) | 0.836 (0.825 - 0.845) | 0.720 (0.708 - 0.732) | 0.938 (0.937 - 0.939) | 0.923 (0.918 - 0.928) | |
Nam_KAIST_task4_4 | NAM_SED_4 | Nam2024 | 1.35 | 0.959 (0.958 - 0.960) | 0.885 (0.885 - 0.885) | 0.973 (0.972 - 0.973) | 0.858 (0.857 - 0.858) | 0.967 (0.966 - 0.967) | 0.924 (0.923 - 0.925) | 0.861 (0.861 - 0.861) | 0.783 (0.782 - 0.783) | 0.917 (0.917 - 0.917) | 0.945 (0.945 - 0.946) | |
Nam_KAIST_task4_3 | NAM_SED_3 | Nam2024 | 1.35 | 0.959 (0.959 - 0.959) | 0.887 (0.886 - 0.888) | 0.971 (0.971 - 0.971) | 0.857 (0.857 - 0.859) | 0.967 (0.966 - 0.967) | 0.924 (0.923 - 0.925) | 0.861 (0.861 - 0.862) | 0.779 (0.778 - 0.781) | 0.918 (0.918 - 0.918) | 0.945 (0.945 - 0.946) | |
Nam_KAIST_task4_2 | NAM_SED_2 | Nam2024 | 1.32 | 0.955 (0.953 - 0.956) | 0.872 (0.868 - 0.876) | 0.965 (0.963 - 0.967) | 0.837 (0.834 - 0.839) | 0.962 (0.960 - 0.964) | 0.912 (0.906 - 0.917) | 0.843 (0.839 - 0.846) | 0.742 (0.737 - 0.748) | 0.914 (0.911 - 0.918) | 0.943 (0.940 - 0.947) | |
Schmid_CPJKU_task4_1 | ATST S2.I2 | Schmid2024 | 1.31 | 0.960 (0.957 - 0.963) | 0.913 (0.909 - 0.917) | 0.977 (0.975 - 0.980) | 0.867 (0.863 - 0.871) | 0.974 (0.973 - 0.975) | 0.915 (0.906 - 0.920) | 0.822 (0.814 - 0.827) | 0.709 (0.698 - 0.717) | 0.937 (0.934 - 0.939) | 0.923 (0.921 - 0.926) | |
Nam_KAIST_task4_1 | NAM_SED_1 | Nam2024 | 1.31 | 0.954 (0.953 - 0.956) | 0.869 (0.862 - 0.873) | 0.966 (0.964 - 0.968) | 0.840 (0.836 - 0.843) | 0.961 (0.960 - 0.962) | 0.908 (0.904 - 0.911) | 0.836 (0.832 - 0.839) | 0.745 (0.742 - 0.748) | 0.919 (0.916 - 0.923) | 0.941 (0.936 - 0.944) | |
Zhang_BUPT_task4_2 | ensemble_model | Yue2024 | 1.27 | 0.960 (0.959 - 0.961) | 0.888 (0.883 - 0.892) | 0.979 (0.979 - 0.979) | 0.875 (0.874 - 0.876) | 0.971 (0.970 - 0.972) | 0.919 (0.900 - 0.932) | 0.857 (0.848 - 0.863) | 0.737 (0.734 - 0.742) | 0.967 (0.967 - 0.968) | 0.927 (0.924 - 0.929) | |
Chen_NCUT_task4_4 | Chen_NCUT_SED_system_4 | Chen2024a | 1.25 | 0.976 (0.976 - 0.976) | 0.883 (0.883 - 0.883) | 0.977 (0.977 - 0.977) | 0.858 (0.858 - 0.858) | 0.976 (0.976 - 0.976) | 0.922 (0.922 - 0.923) | 0.886 (0.886 - 0.887) | 0.821 (0.820 - 0.822) | 0.960 (0.960 - 0.960) | 0.947 (0.947 - 0.947) | |
Chen_CHT_task4_3 | Chen_CHT_task4_3 | Chen2024 | 1.25 | 0.970 (0.970 - 0.971) | 0.877 (0.875 - 0.879) | 0.978 (0.977 - 0.979) | 0.872 (0.869 - 0.874) | 0.976 (0.975 - 0.976) | 0.941 (0.941 - 0.943) | 0.904 (0.900 - 0.907) | 0.840 (0.838 - 0.842) | 0.958 (0.957 - 0.959) | 0.959 (0.958 - 0.960) | |
Zhang_BUPT_task4_1 | single_model | Yue2024 | 1.23 | 0.929 (0.926 - 0.931) | 0.865 (0.865 - 0.866) | 0.966 (0.965 - 0.966) | 0.857 (0.856 - 0.858) | 0.957 (0.957 - 0.958) | 0.898 (0.897 - 0.899) | 0.859 (0.858 - 0.860) | 0.682 (0.679 - 0.686) | 0.938 (0.938 - 0.938) | 0.924 (0.924 - 0.925) | |
Chen_CHT_task4_1 | Chen_CHT_task4_1 | Chen2024 | 1.23 | 0.923 (0.900 - 0.942) | 0.857 (0.851 - 0.862) | 0.970 (0.968 - 0.973) | 0.791 (0.780 - 0.798) | 0.928 (0.926 - 0.930) | 0.924 (0.912 - 0.938) | 0.884 (0.870 - 0.903) | 0.716 (0.688 - 0.754) | 0.939 (0.936 - 0.944) | 0.945 (0.938 - 0.951) | |
Kim_GIST-HanwhaVision_task4_1 | DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature | Son2024 | 1.23 | 0.971 (0.967 - 0.974) | 0.895 (0.890 - 0.902) | 0.970 (0.967 - 0.974) | 0.865 (0.858 - 0.874) | 0.971 (0.968 - 0.973) | 0.935 (0.932 - 0.937) | 0.894 (0.884 - 0.901) | 0.791 (0.785 - 0.796) | 0.926 (0.922 - 0.928) | 0.942 (0.940 - 0.943) | |
Chen_CHT_task4_2 | Chen_CHT_task4_2 | Chen2024 | 1.23 | 0.970 (0.970 - 0.971) | 0.877 (0.875 - 0.879) | 0.978 (0.977 - 0.979) | 0.872 (0.869 - 0.874) | 0.976 (0.975 - 0.976) | 0.941 (0.941 - 0.943) | 0.904 (0.900 - 0.907) | 0.840 (0.838 - 0.842) | 0.958 (0.957 - 0.959) | 0.959 (0.958 - 0.960) | |
Kim_GIST-HanwhaVision_task4_4 | DCASE2024 ensemble model with mix | Son2024 | 1.22 | 0.977 (0.976 - 0.977) | 0.905 (0.901 - 0.908) | 0.977 (0.975 - 0.979) | 0.877 (0.873 - 0.879) | 0.971 (0.966 - 0.974) | 0.941 (0.940 - 0.943) | 0.910 (0.906 - 0.913) | 0.817 (0.814 - 0.819) | 0.936 (0.932 - 0.940) | 0.951 (0.949 - 0.953) | |
Kim_GIST-HanwhaVision_task4_2 | DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature | Son2024 | 1.21 | 0.976 (0.975 - 0.978) | 0.905 (0.902 - 0.907) | 0.975 (0.974 - 0.978) | 0.874 (0.869 - 0.881) | 0.969 (0.965 - 0.971) | 0.939 (0.937 - 0.941) | 0.907 (0.905 - 0.909) | 0.812 (0.805 - 0.817) | 0.934 (0.931 - 0.936) | 0.952 (0.951 - 0.953) | |
Chen_NCUT_task4_3 | Chen_NCUT_SED_system_3 | Chen2024a | 1.20 | 0.970 (0.970 - 0.970) | 0.886 (0.886 - 0.887) | 0.973 (0.973 - 0.973) | 0.851 (0.851 - 0.851) | 0.967 (0.966 - 0.967) | 0.924 (0.924 - 0.924) | 0.867 (0.867 - 0.867) | 0.809 (0.809 - 0.809) | 0.948 (0.948 - 0.948) | 0.945 (0.945 - 0.945) | |
Chen_NCUT_task4_1 | Chen_NCUT_SED_system_1 | Chen2024a | 1.20 | 0.962 (0.962 - 0.962) | 0.860 (0.859 - 0.860) | 0.975 (0.975 - 0.975) | 0.839 (0.838 - 0.839) | 0.978 (0.978 - 0.978) | 0.884 (0.882 - 0.885) | 0.902 (0.902 - 0.903) | 0.756 (0.756 - 0.756) | 0.955 (0.955 - 0.955) | 0.943 (0.943 - 0.943) | |
LEE_KT_task4_4 | Ensemble_FDY-Con_with_ATST_and_BEATs | Lee2024 | 1.20 | 0.917 (0.917 - 0.917) | 0.838 (0.838 - 0.838) | 0.948 (0.948 - 0.948) | 0.820 (0.820 - 0.820) | 0.924 (0.924 - 0.924) | 0.873 (0.873 - 0.873) | 0.836 (0.836 - 0.836) | 0.744 (0.744 - 0.745) | 0.904 (0.904 - 0.904) | 0.874 (0.874 - 0.874) | |
Chen_CHT_task4_4 | Chen_CHT_task4_4 | Chen2024 | 1.20 | 0.923 (0.900 - 0.942) | 0.835 (0.835 - 0.835) | 0.964 (0.957 - 0.972) | 0.852 (0.852 - 0.852) | 0.974 (0.974 - 0.974) | 0.906 (0.906 - 0.906) | 0.889 (0.889 - 0.889) | 0.809 (0.809 - 0.809) | 0.949 (0.949 - 0.949) | 0.934 (0.934 - 0.934) | |
Chen_NCUT_task4_2 | Chen_NCUT_SED_system_2 | Chen2024a | 1.19 | 0.954 (0.949 - 0.958) | 0.851 (0.845 - 0.855) | 0.959 (0.953 - 0.963) | 0.809 (0.795 - 0.818) | 0.964 (0.959 - 0.967) | 0.898 (0.884 - 0.907) | 0.869 (0.861 - 0.874) | 0.765 (0.743 - 0.778) | 0.950 (0.942 - 0.954) | 0.926 (0.923 - 0.928) | |
LEE_KT_task4_1 | CRNN-Con_with_ATST_BEATs | Lee2024 | 1.19 | 0.923 (0.913 - 0.935) | 0.840 (0.829 - 0.855) | 0.936 (0.928 - 0.946) | 0.804 (0.781 - 0.837) | 0.918 (0.913 - 0.926) | 0.878 (0.864 - 0.898) | 0.826 (0.802 - 0.854) | 0.715 (0.696 - 0.746) | 0.908 (0.907 - 0.910) | 0.866 (0.844 - 0.894) | |
Kim_GIST-HanwhaVision_task4_3 | DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder | Son2024 | 1.18 | 0.977 (0.975 - 0.978) | 0.901 (0.900 - 0.902) | 0.978 (0.976 - 0.979) | 0.873 (0.868 - 0.878) | 0.962 (0.945 - 0.974) | 0.944 (0.941 - 0.948) | 0.906 (0.902 - 0.910) | 0.810 (0.801 - 0.818) | 0.930 (0.921 - 0.940) | 0.946 (0.943 - 0.950) | |
LEE_KT_task4_3 | Ensemble_FDY-CON | Lee2024 | 1.17 | 0.953 (0.953 - 0.953) | 0.865 (0.865 - 0.865) | 0.970 (0.970 - 0.970) | 0.836 (0.836 - 0.836) | 0.942 (0.942 - 0.942) | 0.891 (0.891 - 0.891) | 0.875 (0.875 - 0.875) | 0.747 (0.747 - 0.747) | 0.950 (0.950 - 0.950) | 0.906 (0.906 - 0.906) | |
XIAO_FMSG-JLESS_task4_4 | XIAO_FMSG-JLESS_task4_4_ENSEMBLE | Xiao2024 | 1.17 | 0.966 (0.966 - 0.966) | 0.884 (0.884 - 0.884) | 0.968 (0.968 - 0.968) | 0.880 (0.880 - 0.880) | 0.972 (0.972 - 0.972) | 0.937 (0.937 - 0.937) | 0.906 (0.906 - 0.906) | 0.804 (0.804 - 0.804) | 0.907 (0.907 - 0.907) | 0.928 (0.928 - 0.928) | |
LEE_KT_task4_2 | FDY-CRNN_with_ATST_and_BEATs | Lee2024 | 1.16 | 0.941 (0.927 - 0.960) | 0.868 (0.849 - 0.899) | 0.952 (0.940 - 0.967) | 0.841 (0.828 - 0.861) | 0.939 (0.932 - 0.943) | 0.861 (0.840 - 0.888) | 0.852 (0.826 - 0.886) | 0.733 (0.703 - 0.763) | 0.942 (0.936 - 0.948) | 0.884 (0.869 - 0.908) | |
Baseline | DCASE2024 baseline system | Cornell2024 | 1.13 | 0.942 (0.937 - 0.948) | 0.853 (0.844 - 0.858) | 0.967 (0.965 - 0.969) | 0.838 (0.829 - 0.846) | 0.967 (0.965 - 0.970) | 0.866 (0.861 - 0.874) | 0.848 (0.843 - 0.855) | 0.749 (0.723 - 0.776) | 0.947 (0.946 - 0.949) | 0.925 (0.918 - 0.933) | |
XIAO_FMSG-JLESS_task4_3 | XIAO_FMSG-JLESS_task4_3_FDY | Xiao2024 | 1.12 | 0.949 (0.949 - 0.949) | 0.870 (0.870 - 0.870) | 0.959 (0.959 - 0.959) | 0.861 (0.861 - 0.861) | 0.970 (0.970 - 0.970) | 0.921 (0.921 - 0.921) | 0.899 (0.899 - 0.899) | 0.766 (0.766 - 0.766) | 0.899 (0.899 - 0.899) | 0.912 (0.912 - 0.912) | |
XIAO_FMSG-JLESS_task4_2 | XIAO_FMSG-JLESS_task4_2_WIDE | Xiao2024 | 1.12 | 0.967 (0.967 - 0.967) | 0.868 (0.868 - 0.868) | 0.963 (0.963 - 0.963) | 0.850 (0.850 - 0.850) | 0.971 (0.971 - 0.971) | 0.934 (0.934 - 0.934) | 0.884 (0.884 - 0.884) | 0.808 (0.808 - 0.808) | 0.923 (0.923 - 0.923) | 0.926 (0.926 - 0.926) | |
Lyu_SCUT_task4_2 | CCRN_BEATs_2 | Lyu2024 | 1.10 | 0.931 (0.928 - 0.937) | 0.867 (0.862 - 0.872) | 0.967 (0.964 - 0.969) | 0.823 (0.818 - 0.829) | 0.962 (0.959 - 0.963) | 0.908 (0.905 - 0.913) | 0.835 (0.826 - 0.844) | 0.702 (0.687 - 0.723) | 0.933 (0.930 - 0.936) | 0.930 (0.925 - 0.935) | |
Lyu_SCUT_task4_1 | CCRN_BEATs_1 | Lyu2024 | 1.08 | 0.945 (0.936 - 0.951) | 0.860 (0.851 - 0.870) | 0.944 (0.942 - 0.946) | 0.810 (0.800 - 0.820) | 0.958 (0.957 - 0.960) | 0.880 (0.876 - 0.884) | 0.834 (0.827 - 0.840) | 0.733 (0.707 - 0.752) | 0.930 (0.926 - 0.933) | 0.925 (0.920 - 0.931) | |
Niu_XJU_task4_1 | DCASE2024 SED system | Niu2024 | 1.07 | 0.920 (0.917 - 0.924) | 0.846 (0.845 - 0.849) | 0.949 (0.949 - 0.949) | 0.807 (0.804 - 0.812) | 0.941 (0.937 - 0.946) | 0.901 (0.899 - 0.905) | 0.857 (0.852 - 0.861) | 0.778 (0.746 - 0.796) | 0.936 (0.935 - 0.938) | 0.918 (0.914 - 0.923) | |
XIAO_FMSG-JLESS_task4_1 | XIAO_FMSG-JLESS_task4_1_ORL | Xiao2024 | 1.06 | 0.954 (0.954 - 0.954) | 0.874 (0.874 - 0.874) | 0.966 (0.966 - 0.966) | 0.868 (0.868 - 0.868) | 0.972 (0.972 - 0.972) | 0.934 (0.934 - 0.934) | 0.901 (0.901 - 0.901) | 0.796 (0.796 - 0.796) | 0.926 (0.926 - 0.926) | 0.904 (0.904 - 0.904) | |
Cai_USTC_task4_2 | MAT-SED-CNN | Cai2024 | 0.63 | 0.953 (0.953 - 0.953) | 0.847 (0.846 - 0.848) | 0.972 (0.972 - 0.972) | 0.842 (0.842 - 0.842) | 0.948 (0.948 - 0.948) | 0.914 (0.914 - 0.914) | 0.885 (0.885 - 0.885) | 0.702 (0.701 - 0.703) | 0.937 (0.937 - 0.937) | 0.939 (0.939 - 0.939) | |
Cai_USTC_task4_1 | MAT-SED | Cai2024 | 0.61 | 0.959 (0.959 - 0.959) | 0.830 (0.824 - 0.835) | 0.975 (0.975 - 0.975) | 0.849 (0.849 - 0.849) | 0.964 (0.964 - 0.964) | 0.910 (0.909 - 0.910) | 0.890 (0.890 - 0.891) | 0.769 (0.768 - 0.770) | 0.942 (0.942 - 0.943) | 0.927 (0.925 - 0.929) | |
Cai_USTC_task4_4 | MAT-ATST2 | Cai2024 | 0.56 | 0.915 (0.915 - 0.915) | 0.808 (0.807 - 0.808) | 0.954 (0.954 - 0.954) | 0.792 (0.792 - 0.792) | 0.932 (0.932 - 0.932) | 0.868 (0.867 - 0.868) | 0.877 (0.877 - 0.877) | 0.668 (0.668 - 0.668) | 0.932 (0.931 - 0.932) | 0.903 (0.903 - 0.904) | |
Cai_USTC_task4_3 | MAT-ATST | Cai2024 | 0.47 | 0.906 (0.905 - 0.907) | 0.788 (0.786 - 0.789) | 0.942 (0.939 - 0.946) | 0.760 (0.758 - 0.763) | 0.924 (0.923 - 0.925) | 0.851 (0.850 - 0.852) | 0.874 (0.874 - 0.875) | 0.657 (0.656 - 0.657) | 0.905 (0.901 - 0.908) | 0.891 (0.891 - 0.892) | |
Huang_SJTU_task4_1 | pl_mtl_ensemble | Huang2024 | 0.20 | 0.004 (0.003 - 0.005) | 0.009 (0.003 - 0.017) | 0.049 (0.048 - 0.050) | 0.062 (0.047 - 0.081) | 0.034 (0.007 - 0.072) | 0.005 (0.004 - 0.005) | 0.376 (0.352 - 0.397) | 0.020 (0.020 - 0.020) | 0.092 (0.074 - 0.111) | 0.000 (0.000 - 0.001) | |
Huang_SJTU_task4_3 | pl_mtl_ensemble | Huang2024 | 0.17 | 0.053 (0.040 - 0.072) | 0.002 (0.000 - 0.003) | 0.015 (0.004 - 0.022) | 0.259 (0.223 - 0.300) | 0.078 (0.065 - 0.085) | 0.021 (0.003 - 0.034) | 0.088 (0.045 - 0.137) | 0.015 (0.014 - 0.015) | 0.161 (0.089 - 0.224) | 0.001 (0.001 - 0.001) | |
Huang_SJTU_task4_2 | pl_mtl_ensemble | Huang2024 | 0.15 | 0.007 (0.005 - 0.009) | 0.032 (0.005 - 0.065) | 0.029 (0.005 - 0.048) | 0.014 (0.004 - 0.021) | 0.058 (0.009 - 0.134) | 0.005 (0.004 - 0.006) | 0.352 (0.318 - 0.378) | 0.022 (0.019 - 0.023) | 0.136 (0.096 - 0.175) | 0.000 (0.000 - 0.000) | |
Huang_SJTU_task4_4 | pl_mtl_single | Huang2024 | 1.20 | 0.941 (0.938 - 0.943) | 0.855 (0.841 - 0.867) | 0.967 (0.966 - 0.968) | 0.840 (0.836 - 0.844) | 0.972 (0.970 - 0.974) | 0.909 (0.903 - 0.916) | 0.865 (0.855 - 0.878) | 0.748 (0.738 - 0.761) | 0.957 (0.956 - 0.958) | 0.923 (0.922 - 0.924) |
MAESTRO mpAUC
Rank |
Submission code |
Submission name |
Technical Report |
Ranking score (Evaluation dataset) |
Birds singing |
Brakes squeaking |
Car |
Children voices |
Cutlery and dishes |
Footsteps |
Large vehicle |
Metro approaching |
Metro leaving |
People talking |
Wind blowing |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Schmid_CPJKU_task4_4 | Ensemble_15 ATST, BEATs, PaSST Devtest | Schmid2024 | 1.42 | 0.921 (0.917 - 0.924) | 0.430 (0.411 - 0.460) | 0.920 (0.918 - 0.923) | 0.688 (0.678 - 0.699) | 0.743 (0.729 - 0.752) | 0.725 (0.722 - 0.726) | 0.638 (0.635 - 0.642) | 0.879 (0.878 - 0.880) | 0.849 (0.844 - 0.855) | 0.850 (0.849 - 0.850) | 0.487 (0.461 - 0.510) | |
Schmid_CPJKU_task4_3 | Ensemble_18 ATST, BEATs, PaSST | Schmid2024 | 1.39 | 0.907 (0.903 - 0.912) | 0.489 (0.481 - 0.496) | 0.916 (0.913 - 0.920) | 0.615 (0.592 - 0.628) | 0.725 (0.720 - 0.733) | 0.718 (0.715 - 0.723) | 0.588 (0.583 - 0.592) | 0.856 (0.852 - 0.859) | 0.828 (0.826 - 0.831) | 0.843 (0.842 - 0.845) | 0.382 (0.361 - 0.401) | |
Schmid_CPJKU_task4_2 | ATST S2.I2 Devtest | Schmid2024 | 1.35 | 0.886 (0.872 - 0.902) | 0.499 (0.479 - 0.520) | 0.888 (0.875 - 0.896) | 0.594 (0.539 - 0.640) | 0.635 (0.628 - 0.640) | 0.710 (0.704 - 0.719) | 0.655 (0.643 - 0.670) | 0.853 (0.841 - 0.866) | 0.808 (0.795 - 0.826) | 0.829 (0.823 - 0.837) | 0.466 (0.393 - 0.524) | |
Nam_KAIST_task4_4 | NAM_SED_4 | Nam2024 | 1.35 | 0.916 (0.916 - 0.917) | 0.617 (0.615 - 0.620) | 0.924 (0.924 - 0.924) | 0.748 (0.747 - 0.748) | 0.604 (0.603 - 0.606) | 0.717 (0.717 - 0.717) | 0.561 (0.561 - 0.562) | 0.843 (0.842 - 0.843) | 0.833 (0.833 - 0.833) | 0.868 (0.868 - 0.868) | 0.556 (0.554 - 0.559) | |
Nam_KAIST_task4_3 | NAM_SED_3 | Nam2024 | 1.35 | 0.916 (0.916 - 0.916) | 0.612 (0.609 - 0.616) | 0.924 (0.924 - 0.925) | 0.746 (0.746 - 0.748) | 0.603 (0.601 - 0.605) | 0.718 (0.718 - 0.718) | 0.558 (0.557 - 0.559) | 0.844 (0.844 - 0.845) | 0.833 (0.832 - 0.834) | 0.868 (0.868 - 0.868) | 0.561 (0.559 - 0.564) | |
Nam_KAIST_task4_2 | NAM_SED_2 | Nam2024 | 1.32 | 0.907 (0.905 - 0.908) | 0.606 (0.594 - 0.618) | 0.916 (0.908 - 0.924) | 0.732 (0.719 - 0.751) | 0.579 (0.565 - 0.592) | 0.686 (0.675 - 0.701) | 0.567 (0.560 - 0.571) | 0.837 (0.830 - 0.847) | 0.834 (0.831 - 0.838) | 0.853 (0.848 - 0.856) | 0.598 (0.583 - 0.618) | |
Schmid_CPJKU_task4_1 | ATST S2.I2 | Schmid2024 | 1.31 | 0.869 (0.847 - 0.885) | 0.467 (0.453 - 0.487) | 0.885 (0.869 - 0.900) | 0.405 (0.391 - 0.416) | 0.616 (0.571 - 0.669) | 0.723 (0.719 - 0.726) | 0.611 (0.589 - 0.627) | 0.815 (0.809 - 0.822) | 0.795 (0.784 - 0.806) | 0.825 (0.818 - 0.830) | 0.380 (0.378 - 0.382) | |
Nam_KAIST_task4_1 | NAM_SED_1 | Nam2024 | 1.31 | 0.912 (0.908 - 0.916) | 0.587 (0.552 - 0.637) | 0.925 (0.919 - 0.932) | 0.730 (0.718 - 0.743) | 0.564 (0.530 - 0.598) | 0.684 (0.675 - 0.695) | 0.521 (0.496 - 0.540) | 0.828 (0.821 - 0.834) | 0.819 (0.818 - 0.822) | 0.852 (0.849 - 0.854) | 0.565 (0.519 - 0.593) | |
Zhang_BUPT_task4_2 | ensemble_model | Yue2024 | 1.27 | 0.905 (0.897 - 0.915) | 0.707 (0.693 - 0.717) | 0.903 (0.899 - 0.905) | 0.659 (0.653 - 0.662) | 0.349 (0.331 - 0.366) | 0.642 (0.639 - 0.645) | 0.501 (0.496 - 0.507) | 0.837 (0.835 - 0.840) | 0.802 (0.801 - 0.804) | 0.834 (0.831 - 0.839) | 0.461 (0.435 - 0.480) | |
Chen_NCUT_task4_4 | Chen_NCUT_SED_system_4 | Chen2024a | 1.25 | 0.874 (0.874 - 0.874) | 0.474 (0.474 - 0.474) | 0.907 (0.907 - 0.907) | 0.657 (0.656 - 0.657) | 0.598 (0.597 - 0.598) | 0.663 (0.663 - 0.663) | 0.531 (0.531 - 0.532) | 0.850 (0.850 - 0.851) | 0.821 (0.821 - 0.821) | 0.849 (0.849 - 0.849) | 0.303 (0.302 - 0.305) | |
Chen_CHT_task4_3 | Chen_CHT_task4_3 | Chen2024 | 1.25 | 0.886 (0.886 - 0.887) | 0.504 (0.475 - 0.521) | 0.920 (0.919 - 0.920) | 0.684 (0.678 - 0.690) | 0.672 (0.666 - 0.677) | 0.691 (0.688 - 0.693) | 0.539 (0.525 - 0.555) | 0.872 (0.868 - 0.874) | 0.859 (0.857 - 0.860) | 0.828 (0.826 - 0.828) | 0.365 (0.352 - 0.373) | |
Zhang_BUPT_task4_1 | single_model | Yue2024 | 1.23 | 0.903 (0.902 - 0.903) | 0.727 (0.727 - 0.728) | 0.902 (0.901 - 0.903) | 0.671 (0.671 - 0.671) | 0.417 (0.415 - 0.420) | 0.640 (0.640 - 0.642) | 0.518 (0.517 - 0.519) | 0.833 (0.833 - 0.834) | 0.804 (0.804 - 0.804) | 0.833 (0.832 - 0.833) | 0.498 (0.496 - 0.501) | |
Chen_CHT_task4_1 | Chen_CHT_task4_1 | Chen2024 | 1.23 | 0.867 (0.859 - 0.873) | 0.691 (0.633 - 0.766) | 0.897 (0.891 - 0.901) | 0.663 (0.637 - 0.690) | 0.765 (0.753 - 0.775) | 0.666 (0.662 - 0.673) | 0.542 (0.496 - 0.595) | 0.871 (0.858 - 0.883) | 0.850 (0.839 - 0.857) | 0.812 (0.807 - 0.816) | 0.442 (0.418 - 0.479) | |
Kim_GIST-HanwhaVision_task4_1 | DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature | Son2024 | 1.23 | 0.871 (0.855 - 0.883) | 0.202 (0.155 - 0.253) | 0.906 (0.889 - 0.920) | 0.633 (0.583 - 0.666) | 0.717 (0.691 - 0.747) | 0.614 (0.554 - 0.665) | 0.592 (0.558 - 0.617) | 0.839 (0.825 - 0.847) | 0.792 (0.779 - 0.803) | 0.805 (0.797 - 0.813) | 0.342 (0.278 - 0.382) | |
Chen_CHT_task4_2 | Chen_CHT_task4_2 | Chen2024 | 1.23 | 0.872 (0.872 - 0.872) | 0.491 (0.229 - 0.659) | 0.905 (0.905 - 0.905) | 0.680 (0.680 - 0.680) | 0.558 (0.558 - 0.558) | 0.690 (0.690 - 0.690) | 0.542 (0.496 - 0.595) | 0.852 (0.852 - 0.852) | 0.854 (0.854 - 0.854) | 0.842 (0.842 - 0.842) | 0.315 (0.315 - 0.315) | |
Kim_GIST-HanwhaVision_task4_4 | DCASE2024 ensemble model with mix | Son2024 | 1.22 | 0.861 (0.856 - 0.867) | 0.166 (0.145 - 0.186) | 0.902 (0.901 - 0.904) | 0.470 (0.392 - 0.542) | 0.704 (0.685 - 0.734) | 0.623 (0.614 - 0.635) | 0.520 (0.477 - 0.551) | 0.830 (0.828 - 0.833) | 0.791 (0.778 - 0.801) | 0.770 (0.730 - 0.804) | 0.380 (0.347 - 0.431) | |
Kim_GIST-HanwhaVision_task4_2 | DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature | Son2024 | 1.21 | 0.857 (0.850 - 0.867) | 0.161 (0.154 - 0.168) | 0.902 (0.890 - 0.913) | 0.435 (0.348 - 0.500) | 0.684 (0.636 - 0.746) | 0.631 (0.605 - 0.650) | 0.518 (0.476 - 0.548) | 0.834 (0.825 - 0.843) | 0.801 (0.799 - 0.802) | 0.743 (0.726 - 0.765) | 0.358 (0.347 - 0.366) | |
Chen_NCUT_task4_3 | Chen_NCUT_SED_system_3 | Chen2024a | 1.20 | 0.862 (0.862 - 0.862) | 0.358 (0.357 - 0.358) | 0.871 (0.871 - 0.871) | 0.717 (0.716 - 0.717) | 0.601 (0.601 - 0.601) | 0.663 (0.663 - 0.663) | 0.537 (0.537 - 0.538) | 0.853 (0.853 - 0.853) | 0.826 (0.826 - 0.827) | 0.822 (0.822 - 0.822) | 0.313 (0.311 - 0.314) | |
Chen_NCUT_task4_1 | Chen_NCUT_SED_system_1 | Chen2024a | 1.20 | 0.813 (0.813 - 0.813) | 0.458 (0.458 - 0.458) | 0.905 (0.905 - 0.905) | 0.596 (0.596 - 0.596) | 0.712 (0.712 - 0.712) | 0.648 (0.648 - 0.648) | 0.523 (0.523 - 0.523) | 0.824 (0.824 - 0.824) | 0.746 (0.746 - 0.746) | 0.820 (0.820 - 0.820) | 0.296 (0.296 - 0.297) | |
LEE_KT_task4_4 | Ensemble_FDY-Con_with_ATST_and_BEATs | Lee2024 | 1.20 | 0.877 (0.877 - 0.877) | 0.689 (0.689 - 0.689) | 0.891 (0.891 - 0.891) | 0.622 (0.622 - 0.622) | 0.621 (0.621 - 0.621) | 0.682 (0.682 - 0.682) | 0.447 (0.447 - 0.447) | 0.822 (0.822 - 0.822) | 0.800 (0.800 - 0.800) | 0.846 (0.846 - 0.846) | 0.292 (0.292 - 0.292) | |
Chen_CHT_task4_4 | Chen_CHT_task4_4 | Chen2024 | 1.20 | 0.872 (0.872 - 0.872) | 0.491 (0.229 - 0.659) | 0.905 (0.905 - 0.905) | 0.680 (0.680 - 0.680) | 0.558 (0.558 - 0.558) | 0.690 (0.690 - 0.690) | 0.542 (0.496 - 0.595) | 0.852 (0.852 - 0.852) | 0.854 (0.854 - 0.854) | 0.842 (0.842 - 0.842) | 0.315 (0.315 - 0.315) | |
Chen_NCUT_task4_2 | Chen_NCUT_SED_system_2 | Chen2024a | 1.19 | 0.871 (0.868 - 0.872) | 0.559 (0.550 - 0.565) | 0.875 (0.871 - 0.878) | 0.622 (0.619 - 0.623) | 0.483 (0.470 - 0.491) | 0.609 (0.607 - 0.611) | 0.513 (0.511 - 0.515) | 0.824 (0.818 - 0.828) | 0.802 (0.801 - 0.804) | 0.838 (0.834 - 0.841) | 0.322 (0.300 - 0.335) | |
LEE_KT_task4_1 | CRNN-Con_with_ATST_BEATs | Lee2024 | 1.19 | 0.868 (0.859 - 0.878) | 0.681 (0.637 - 0.722) | 0.865 (0.863 - 0.868) | 0.589 (0.536 - 0.646) | 0.574 (0.567 - 0.581) | 0.661 (0.650 - 0.672) | 0.460 (0.444 - 0.472) | 0.810 (0.806 - 0.817) | 0.787 (0.777 - 0.797) | 0.843 (0.820 - 0.860) | 0.380 (0.376 - 0.386) | |
Kim_GIST-HanwhaVision_task4_3 | DCASE2024 ensemble model with FDY-LKA CRNN with MPA, auxiliary decoder | Son2024 | 1.18 | 0.864 (0.860 - 0.868) | 0.181 (0.168 - 0.199) | 0.896 (0.891 - 0.899) | 0.457 (0.380 - 0.545) | 0.726 (0.691 - 0.753) | 0.588 (0.586 - 0.591) | 0.514 (0.470 - 0.549) | 0.824 (0.820 - 0.830) | 0.783 (0.764 - 0.796) | 0.778 (0.743 - 0.818) | 0.402 (0.377 - 0.441) | |
LEE_KT_task4_3 | Ensemble_FDY-CON | Lee2024 | 1.17 | 0.883 (0.883 - 0.883) | 0.641 (0.641 - 0.641) | 0.853 (0.853 - 0.853) | 0.684 (0.684 - 0.684) | 0.587 (0.587 - 0.587) | 0.709 (0.709 - 0.709) | 0.461 (0.461 - 0.461) | 0.809 (0.809 - 0.809) | 0.821 (0.821 - 0.821) | 0.854 (0.854 - 0.854) | 0.306 (0.306 - 0.306) | |
XIAO_FMSG-JLESS_task4_4 | XIAO_FMSG-JLESS_task4_4_ENSEMBLE | Xiao2024 | 1.17 | 0.808 (0.808 - 0.808) | 0.056 (0.056 - 0.056) | 0.834 (0.834 - 0.834) | 0.689 (0.689 - 0.689) | 0.576 (0.576 - 0.576) | 0.465 (0.465 - 0.465) | 0.317 (0.317 - 0.317) | 0.709 (0.709 - 0.709) | 0.660 (0.660 - 0.660) | 0.668 (0.668 - 0.668) | 0.441 (0.441 - 0.441) | |
LEE_KT_task4_2 | FDY-CRNN_with_ATST_and_BEATs | Lee2024 | 1.16 | 0.877 (0.874 - 0.882) | 0.567 (0.538 - 0.588) | 0.854 (0.841 - 0.863) | 0.635 (0.557 - 0.690) | 0.539 (0.475 - 0.617) | 0.657 (0.624 - 0.684) | 0.471 (0.442 - 0.488) | 0.800 (0.751 - 0.844) | 0.795 (0.783 - 0.805) | 0.847 (0.836 - 0.863) | 0.399 (0.342 - 0.451) | |
Baseline | DCASE2024 baseline system | Cornell2024 | 1.13 | 0.837 (0.829 - 0.849) | 0.338 (0.288 - 0.405) | 0.902 (0.901 - 0.903) | 0.588 (0.575 - 0.596) | 0.605 (0.585 - 0.618) | 0.640 (0.625 - 0.664) | 0.497 (0.472 - 0.532) | 0.828 (0.824 - 0.831) | 0.792 (0.763 - 0.811) | 0.814 (0.808 - 0.821) | 0.270 (0.266 - 0.275) | |
XIAO_FMSG-JLESS_task4_3 | XIAO_FMSG-JLESS_task4_3_FDY | Xiao2024 | 1.12 | 0.789 (0.789 - 0.789) | 0.073 (0.073 - 0.073) | 0.773 (0.773 - 0.773) | 0.636 (0.636 - 0.636) | 0.469 (0.469 - 0.469) | 0.405 (0.405 - 0.405) | 0.332 (0.332 - 0.332) | 0.730 (0.730 - 0.730) | 0.702 (0.702 - 0.702) | 0.721 (0.721 - 0.721) | 0.454 (0.454 - 0.454) | |
XIAO_FMSG-JLESS_task4_2 | XIAO_FMSG-JLESS_task4_2_WIDE | Xiao2024 | 1.12 | 0.759 (0.759 - 0.759) | 0.210 (0.210 - 0.210) | 0.784 (0.784 - 0.784) | 0.552 (0.552 - 0.552) | 0.466 (0.466 - 0.466) | 0.407 (0.407 - 0.407) | 0.284 (0.284 - 0.284) | 0.689 (0.689 - 0.689) | 0.608 (0.608 - 0.608) | 0.591 (0.591 - 0.591) | 0.477 (0.477 - 0.477) | |
Lyu_SCUT_task4_2 | CCRN_BEATs_2 | Lyu2024 | 1.10 | 0.807 (0.797 - 0.822) | 0.277 (0.217 - 0.341) | 0.885 (0.876 - 0.898) | 0.601 (0.581 - 0.622) | 0.429 (0.378 - 0.497) | 0.631 (0.606 - 0.655) | 0.439 (0.424 - 0.451) | 0.811 (0.807 - 0.817) | 0.803 (0.773 - 0.822) | 0.806 (0.803 - 0.809) | 0.244 (0.180 - 0.291) | |
Lyu_SCUT_task4_1 | CCRN_BEATs_1 | Lyu2024 | 1.08 | 0.822 (0.808 - 0.833) | 0.197 (0.139 - 0.274) | 0.884 (0.876 - 0.888) | 0.551 (0.535 - 0.569) | 0.549 (0.506 - 0.583) | 0.607 (0.596 - 0.619) | 0.436 (0.414 - 0.459) | 0.796 (0.793 - 0.798) | 0.802 (0.773 - 0.823) | 0.778 (0.764 - 0.789) | 0.201 (0.149 - 0.243) | |
Niu_XJU_task4_1 | DCASE2024 SED system | Niu2024 | 1.07 | 0.792 (0.784 - 0.807) | 0.078 (0.062 - 0.105) | 0.880 (0.871 - 0.896) | 0.623 (0.603 - 0.657) | 0.556 (0.543 - 0.578) | 0.595 (0.587 - 0.599) | 0.432 (0.428 - 0.438) | 0.796 (0.795 - 0.797) | 0.745 (0.737 - 0.759) | 0.767 (0.755 - 0.775) | 0.370 (0.327 - 0.395) | |
XIAO_FMSG-JLESS_task4_1 | XIAO_FMSG-JLESS_task4_1_ORL | Xiao2024 | 1.06 | 0.621 (0.621 - 0.621) | 0.053 (0.053 - 0.053) | 0.784 (0.784 - 0.784) | 0.673 (0.673 - 0.673) | 0.532 (0.532 - 0.532) | 0.409 (0.409 - 0.409) | 0.322 (0.322 - 0.322) | 0.571 (0.571 - 0.571) | 0.532 (0.532 - 0.532) | 0.541 (0.541 - 0.541) | 0.356 (0.356 - 0.356) | |
Cai_USTC_task4_2 | MAT-SED-CNN | Cai2024 | 0.63 | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | |
Cai_USTC_task4_1 | MAT-SED | Cai2024 | 0.61 | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | |
Cai_USTC_task4_4 | MAT-ATST2 | Cai2024 | 0.56 | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | |
Cai_USTC_task4_3 | MAT-ATST | Cai2024 | 0.47 | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | 0.050 (0.050 - 0.050) | |
Huang_SJTU_task4_1 | pl_mtl_ensemble | Huang2024 | 0.20 | 0.002 (0.000 - 0.006) | 0.001 (0.000 - 0.002) | 0.791 (0.725 - 0.855) | 0.005 (0.000 - 0.014) | 0.001 (0.000 - 0.001) | 0.119 (0.046 - 0.196) | 0.020 (0.020 - 0.020) | 0.000 (0.000 - 0.000) | 0.583 (0.568 - 0.594) | 0.633 (0.581 - 0.676) | 0.000 (0.000 - 0.000) | |
Huang_SJTU_task4_3 | pl_mtl_ensemble | Huang2024 | 0.17 | 0.022 (0.010 - 0.043) | 0.125 (0.123 - 0.129) | 0.752 (0.679 - 0.824) | 0.230 (0.109 - 0.388) | 0.000 (0.000 - 0.000) | 0.047 (0.013 - 0.088) | 0.019 (0.014 - 0.021) | 0.000 (0.000 - 0.000) | 0.528 (0.323 - 0.652) | 0.153 (0.096 - 0.228) | 0.018 (0.000 - 0.050) | |
Huang_SJTU_task4_2 | pl_mtl_ensemble | Huang2024 | 0.15 | 0.010 (0.000 - 0.026) | 0.000 (0.000 - 0.000) | 0.532 (0.499 - 0.585) | 0.096 (0.002 - 0.245) | 0.000 (0.000 - 0.000) | 0.007 (0.001 - 0.015) | 0.025 (0.017 - 0.032) | 0.000 (0.000 - 0.000) | 0.246 (0.227 - 0.270) | 0.719 (0.661 - 0.781) | 0.000 (0.000 - 0.000) | |
Huang_SJTU_task4_4 | pl_mtl_single | Huang2024 | 1.20 | 0.850 (0.842 - 0.858) | 0.445 (0.403 - 0.485) | 0.889 (0.878 - 0.898) | 0.686 (0.676 - 0.695) | 0.581 (0.561 - 0.598) | 0.657 (0.646 - 0.669) | 0.551 (0.521 - 0.584) | 0.836 (0.823 - 0.851) | 0.806 (0.802 - 0.810) | 0.827 (0.818 - 0.838) | 0.328 (0.300 - 0.368) |
Energy Consumption
Rank |
Submission code |
Submission name |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS (DESED evaluation dataset) |
mpAUC (MAESTRO evaluation dataset) |
Energy (kWh) (training, normalized) |
Energy (kWh) (GPU, training, normalized) |
Energy (kWh) (test, normalized) |
Energy (kWh) (GPU, test, normalized) |
EW-PSDS (DESED, training energy) |
EW-mpAUC (MAESTRO, training energy) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Schmid_CPJKU_task4_2 | ATST S2.I2 Devtest | Schmid2024 | 1.35 | 0.646 (0.640 - 0.654) | 0.711 (0.704 - 0.717) | 3.461 | 1.302 | 0.059 | 0.014 | 0.517 | 0.572 | |
Nam_KAIST_task4_2 | NAM_SED_2 | Nam2024 | 1.32 | 0.586 (0.585 - 0.589) | 0.738 (0.732 - 0.745) | 12.504 | 10.242 | 0.119 | 0.012 | 0.056 | 0.071 | |
Schmid_CPJKU_task4_1 | ATST S2.I2 | Schmid2024 | 1.31 | 0.644 (0.640 - 0.647) | 0.672 (0.669 - 0.676) | 3.461 | 1.302 | 0.059 | 0.014 | 0.515 | 0.540 | |
Nam_KAIST_task4_1 | NAM_SED_1 | Nam2024 | 1.31 | 0.584 (0.582 - 0.587) | 0.726 (0.720 - 0.733) | 12.504 | 10.232 | 0.042 | 0.034 | 0.056 | 0.070 | |
Zhang_BUPT_task4_1 | single_model | Yue2024 | 1.23 | 0.523 (0.523 - 0.524) | 0.704 (0.704 - 0.705) | 11.596 | 8.459 | 0.061 | 0.036 | 0.125 | 0.167 | |
Chen_CHT_task4_1 | Chen_CHT_task4_1 | Chen2024 | 1.23 | 0.495 (0.486 - 0.503) | 0.733 (0.730 - 0.739) | 8.517 | 0.086 | 0.527 | 0.773 | |||
Kim_GIST-HanwhaVision_task4_1 | DCASE2024 FDY-LKA CRNN with MPA, auxiliary decoder, multi-channel input feature | Son2024 | 1.23 | 0.567 (0.558 - 0.573) | 0.665 (0.646 - 0.677) | 8.777 | 3.956 | 0.176 | 0.050 | 0.122 | 0.144 | |
Chen_CHT_task4_2 | Chen_CHT_task4_2 | Chen2024 | 1.23 | 0.527 (0.524 - 0.530) | 0.691 (0.663 - 0.708) | 49.984 | 1.075 | 0.552 | 0.714 | |||
Chen_NCUT_task4_3 | Chen_NCUT_SED_system_3 | Chen2024a | 1.20 | 0.526 (0.524 - 0.527) | 0.675 (0.675 - 0.675) | 0.155 | 0.085 | 0.016 | 0.009 | 1.403 | 1.790 | |
Chen_NCUT_task4_1 | Chen_NCUT_SED_system_1 | Chen2024a | 1.20 | 0.525 (0.523 - 0.527) | 0.667 (0.667 - 0.667) | 0.260 | 0.175 | 0.011 | 0.005 | 0.838 | 1.056 | |
Chen_NCUT_task4_2 | Chen_NCUT_SED_system_2 | Chen2024a | 1.19 | 0.519 (0.485 - 0.537) | 0.665 (0.659 - 0.669) | 0.788 | 0.621 | 0.018 | 0.010 | 0.273 | 0.347 | |
LEE_KT_task4_1 | CRNN-Con_with_ATST_BEATs | Lee2024 | 1.19 | 0.506 (0.482 - 0.548) | 0.684 (0.672 - 0.693) | 0.069 | 0.126 | 8.999 | 12.221 | |||
LEE_KT_task4_2 | FDY-CRNN_with_ATST_and_BEATs | Lee2024 | 1.16 | 0.474 (0.471 - 0.479) | 0.676 (0.666 - 0.690) | 0.102 | 0.151 | 5.810 | 8.181 | |||
Baseline | DCASE2024 baseline system | Cornell2024 | 1.13 | 0.475 (0.469 - 0.479) | 0.646 (0.641 - 0.653) | 0.946 | 0.113 | 0.119 | 0.013 | 0.483 | 0.648 | |
XIAO_FMSG-JLESS_task4_3 | XIAO_FMSG-JLESS_task4_3_FDY | Xiao2024 | 1.12 | 0.574 (0.574 - 0.574) | 0.553 (0.553 - 0.553) | 4.140 | 1.910 | 0.100 | 0.040 | 0.188 | 0.182 | |
XIAO_FMSG-JLESS_task4_2 | XIAO_FMSG-JLESS_task4_2_WIDE | Xiao2024 | 1.12 | 0.597 (0.597 - 0.597) | 0.530 (0.530 - 0.530) | 2.860 | 1.380 | 0.082 | 0.037 | 0.282 | 0.252 | |
Lyu_SCUT_task4_2 | CCRN_BEATs_2 | Lyu2024 | 1.10 | 0.478 (0.474 - 0.481) | 0.612 (0.596 - 0.624) | 11.394 | 3.093 | 0.152 | 0.014 | 0.083 | 0.105 | |
Lyu_SCUT_task4_1 | CCRN_BEATs_1 | Lyu2024 | 1.08 | 0.474 (0.469 - 0.482) | 0.602 (0.586 - 0.619) | 10.456 | 2.855 | 0.143 | 0.013 | 0.090 | 0.113 | |
Niu_XJU_task4_1 | DCASE2024 SED system | Niu2024 | 1.07 | 0.465 (0.462 - 0.467) | 0.603 (0.599 - 0.610) | 3.273 | 1.606 | 0.062 | 0.018 | 0.177 | 0.227 | |
XIAO_FMSG-JLESS_task4_1 | XIAO_FMSG-JLESS_task4_1_ORL | Xiao2024 | 1.06 | 0.575 (0.575 - 0.575) | 0.490 (0.490 - 0.490) | 1.790 | 0.856 | 0.045 | 0.012 | 0.434 | 0.373 | |
Cai_USTC_task4_2 | MAT-SED-CNN | Cai2024 | 0.63 | 0.574 (0.573 - 0.574) | 0.050 (0.050 - 0.050) | 1.180 | 0.113 | 0.119 | 0.013 | 0.603 | 0.052 | |
Cai_USTC_task4_1 | MAT-SED | Cai2024 | 0.61 | 0.561 (0.560 - 0.561) | 0.050 (0.050 - 0.050) | 1.180 | 0.113 | 0.119 | 0.013 | 0.590 | 0.052 | |
Huang_SJTU_task4_4 | pl_mtl_single | Huang2024 | 1.20 | 0.519 (0.516 - 0.522) | 0.678 (0.669 - 0.685) | 1.358 | 0.664 | 0.039 | 0.016 | 0.475 | 0.616 |
System characteristics
General characteristics
Rank | Code |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS (DESED evaluation dataset) |
mpAUC (MAESTRO evaluation dataset) |
Data augmentation |
Features |
---|---|---|---|---|---|---|---|
Schmid_CPJKU_task4_4 | Schmid2024 | 1.42 | 0.680 (0.679 - 0.682) | 0.739 (0.736 - 0.742) | Freq-MixStyle, Filter augmentation, Time shifting, Time masking, WavMix, Mixup, Device-Impulse-Response Augmentation, Frequency warping, Pitch Shifting | log-mel energies | |
Schmid_CPJKU_task4_3 | Schmid2024 | 1.39 | 0.676 (0.674 - 0.678) | 0.715 (0.714 - 0.718) | Freq-MixStyle, Filter augmentation, Time shifting, Time masking, WavMix, Mixup, Device-Impulse-Response Augmentation, Frequency warping, Pitch Shifting | log-mel energies | |
Schmid_CPJKU_task4_2 | Schmid2024 | 1.35 | 0.646 (0.640 - 0.654) | 0.711 (0.704 - 0.717) | Freq-MixStyle, Filter augmentation, Time shifting, Time masking, WavMix, Mixup, Device-Impulse-Response Augmentation, Frequency warping, Pitch Shifting | log-mel energies | |
Nam_KAIST_task4_4 | Nam2024 | 1.35 | 0.610 (0.609 - 0.611) | 0.744 (0.744 - 0.745) | Mixup, Frequency warping, Filter augmentation | log-mel energies | |
Nam_KAIST_task4_3 | Nam2024 | 1.35 | 0.610 (0.609 - 0.611) | 0.744 (0.744 - 0.744) | Mixup, Frequency warping, Filter augmentation | log-mel energies | |
Nam_KAIST_task4_2 | Nam2024 | 1.32 | 0.586 (0.585 - 0.589) | 0.738 (0.732 - 0.745) | Mixup, Frequency warping, Filter augmentation | log-mel energies | |
Schmid_CPJKU_task4_1 | Schmid2024 | 1.31 | 0.644 (0.640 - 0.647) | 0.672 (0.669 - 0.676) | Freq-MixStyle, Filter augmentation, Time shifting, Time masking, WavMix, Mixup, Device-Impulse-Response Augmentation, Frequency warping, Pitch Shifting | log-mel energies | |
Nam_KAIST_task4_1 | Nam2024 | 1.31 | 0.584 (0.582 - 0.587) | 0.726 (0.720 - 0.733) | Mixup, Frequency warping, Filter augmentation | log-mel energies | |
Zhang_BUPT_task4_2 | Yue2024 | 1.27 | 0.570 (0.566 - 0.573) | 0.691 (0.691 - 0.691) | Mixup, Time masking, Frequency masking | log-mel energies | |
Chen_NCUT_task4_4 | Chen2024a | 1.25 | 0.565 (0.563 - 0.566) | 0.684 (0.684 - 0.684) | Mixup | log-mel energies | |
Chen_CHT_task4_3 | Chen2024 | 1.25 | 0.527 (0.524 - 0.530) | 0.711 (0.709 - 0.712) | Mixup, SpecAugment | log-mel energies | |
Zhang_BUPT_task4_1 | Yue2024 | 1.23 | 0.523 (0.523 - 0.524) | 0.704 (0.704 - 0.705) | Mixup, Time masking, Frequency masking | log-mel energies | |
Chen_CHT_task4_1 | Chen2024 | 1.23 | 0.495 (0.486 - 0.503) | 0.733 (0.730 - 0.739) | Mixup, SpecAugment | log-mel energies | |
Kim_GIST-HanwhaVision_task4_1 | Son2024 | 1.23 | 0.567 (0.558 - 0.573) | 0.665 (0.646 - 0.677) | Mixup, Frequency shift, Time shifting, Time masking, filter augmentation, Adding Gaussian noise | log-mel energies, MFCC | |
Chen_CHT_task4_2 | Chen2024 | 1.23 | 0.527 (0.524 - 0.530) | 0.691 (0.663 - 0.708) | Mixup, SpecAugment | log-mel energies | |
Kim_GIST-HanwhaVision_task4_4 | Son2024 | 1.22 | 0.586 (0.578 - 0.597) | 0.638 (0.620 - 0.654) | Mixup, Frequency shift, Time shifting, Time masking, filter augmentation, Adding Gaussian noise | log-mel energies, MFCC | |
Kim_GIST-HanwhaVision_task4_2 | Son2024 | 1.21 | 0.580 (0.560 - 0.599) | 0.629 (0.620 - 0.639) | Mixup, Frequency shift, Time shifting, Time masking, filter augmentation, Adding Gaussian noise | log-mel energies, MFCC | |
Chen_NCUT_task4_3 | Chen2024a | 1.20 | 0.526 (0.524 - 0.527) | 0.675 (0.675 - 0.675) | Mixup, Frame shifting, time_mask, Filter augmentation, Frequency masking, Adding noise | log-mel energies | |
Chen_NCUT_task4_1 | Chen2024a | 1.20 | 0.525 (0.523 - 0.527) | 0.667 (0.667 - 0.667) | Mixup, Frame shifting, time_mask, Filter augmentation, Frequency masking, Adding noise | log-mel energies | |
LEE_KT_task4_4 | Lee2024 | 1.20 | 0.509 (0.509 - 0.509) | 0.690 (0.690 - 0.690) | Frequency warping, Filter augmentation | log-mel energies | |
Chen_CHT_task4_4 | Chen2024 | 1.20 | 0.500 (0.498 - 0.504) | 0.691 (0.663 - 0.708) | Mixup, SpecAugment | log-mel energies | |
Chen_NCUT_task4_2 | Chen2024a | 1.19 | 0.519 (0.485 - 0.537) | 0.665 (0.659 - 0.669) | Mixup, Frame shifting, time_mask, Filter augmentation, Frequency masking, Adding noise | log-mel energies | |
LEE_KT_task4_1 | Lee2024 | 1.19 | 0.506 (0.482 - 0.548) | 0.684 (0.672 - 0.693) | Frequency warping, Filter augmentation | log-mel energies | |
Kim_GIST-HanwhaVision_task4_3 | Son2024 | 1.18 | 0.542 (0.525 - 0.560) | 0.637 (0.628 - 0.652) | Mixup, Frequency shift, Time shifting, Time masking, filter augmentation, Adding Gaussian noise | log-mel energies | |
LEE_KT_task4_3 | Lee2024 | 1.17 | 0.468 (0.468 - 0.468) | 0.692 (0.692 - 0.692) | Frequency warping, Filter augmentation | log-mel energies | |
XIAO_FMSG-JLESS_task4_4 | Xiao2024 | 1.17 | 0.606 (0.606 - 0.606) | 0.566 (0.566 - 0.566) | SpecAugment, Filter augmentation, Mixup, Freq-MixStyle | log-mel energies | |
LEE_KT_task4_2 | Lee2024 | 1.16 | 0.474 (0.471 - 0.479) | 0.676 (0.666 - 0.690) | Frequency warping, Filter augmentation | log-mel energies | |
Baseline | Cornell2024 | 1.13 | 0.475 (0.469 - 0.479) | 0.646 (0.641 - 0.653) | log-mel energies | ||
XIAO_FMSG-JLESS_task4_3 | Xiao2024 | 1.12 | 0.574 (0.574 - 0.574) | 0.553 (0.553 - 0.553) | SpecAugment, Filter augmentation, Mixup, Freq-MixStyle | log-mel energies | |
XIAO_FMSG-JLESS_task4_2 | Xiao2024 | 1.12 | 0.597 (0.597 - 0.597) | 0.530 (0.530 - 0.530) | SpecAugment, Filter augmentation, Mixup, Freq-MixStyle | log-mel energies | |
Lyu_SCUT_task4_2 | Lyu2024 | 1.10 | 0.478 (0.474 - 0.481) | 0.612 (0.596 - 0.624) | Mixup | complex spectrogram | |
Lyu_SCUT_task4_1 | Lyu2024 | 1.08 | 0.474 (0.469 - 0.482) | 0.602 (0.586 - 0.619) | Mixup | complex spectrogram | |
Niu_XJU_task4_1 | Niu2024 | 1.07 | 0.465 (0.462 - 0.467) | 0.603 (0.599 - 0.610) | Mixup, SpecAugment, Audio cutmix, Random linear fedar | log-mel energies | |
XIAO_FMSG-JLESS_task4_1 | Xiao2024 | 1.06 | 0.575 (0.575 - 0.575) | 0.490 (0.490 - 0.490) | SpecAugment, Filter augmentation, Mixup, Freq-MixStyle | log-mel energies | |
Cai_USTC_task4_2 | Cai2024 | 0.63 | 0.574 (0.573 - 0.574) | 0.050 (0.050 - 0.050) | Mixup, Frame shifting, Filter augmentation | log-mel energies | |
Cai_USTC_task4_1 | Cai2024 | 0.61 | 0.561 (0.560 - 0.561) | 0.050 (0.050 - 0.050) | Mixup, Frame shifting, Filter augmentation | log-mel energies | |
Cai_USTC_task4_4 | Cai2024 | 0.56 | 0.506 (0.505 - 0.507) | 0.050 (0.050 - 0.050) | Mixup, Frame shifting, Filter augmentation | log-mel energies | |
Cai_USTC_task4_3 | Cai2024 | 0.47 | 0.417 (0.402 - 0.428) | 0.050 (0.050 - 0.050) | Mixup, Frame shifting, Filter augmentation | log-mel energies | |
Huang_SJTU_task4_1 | Huang2024 | 0.20 | 0.000 (0.000 - 0.000) | 0.196 (0.189 - 0.202) | Mixup, SpecAugment | log-mel energies | |
Huang_SJTU_task4_3 | Huang2024 | 0.17 | 0.000 (0.000 - 0.000) | 0.172 (0.165 - 0.179) | Mixup, SpecAugment | log-mel energies | |
Huang_SJTU_task4_2 | Huang2024 | 0.15 | 0.000 (0.000 - 0.000) | 0.149 (0.137 - 0.159) | Mixup, SpecAugment | log-mel energies | |
Huang_SJTU_task4_4 | Huang2024 | 1.20 | 0.519 (0.516 - 0.522) | 0.678 (0.669 - 0.685) | mixup, specaugment | log-mel energies |
Machine learning characteristics
Rank | Code |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS (DESED evaluation dataset) |
mpAUC (MAESTRO evaluation dataset) |
Classifier | Semi-supervised approach | Post-processing |
Segmentation method |
Decision making |
---|---|---|---|---|---|---|---|---|---|---|
Schmid_CPJKU_task4_4 | Schmid2024 | 1.42 | 0.680 (0.679 - 0.682) | 0.739 (0.736 - 0.742) | ATST_CRNN, PaSST_CRNN, BEATs_CRNN | pseudo-labelling, mean-teacher student, interpolation consistency training | Sound Event Bounding Boxes | |||
Schmid_CPJKU_task4_3 | Schmid2024 | 1.39 | 0.676 (0.674 - 0.678) | 0.715 (0.714 - 0.718) | ATST_CRNN, PaSST_CRNN, BEATs_CRNN | pseudo-labelling, mean-teacher student, interpolation consistency training | Sound Event Bounding Boxes | |||
Schmid_CPJKU_task4_2 | Schmid2024 | 1.35 | 0.646 (0.640 - 0.654) | 0.711 (0.704 - 0.717) | ATST_CRNN | pseudo-labelling, mean-teacher student, interpolation consistency training | Sound Event Bounding Boxes | |||
Nam_KAIST_task4_4 | Nam2024 | 1.35 | 0.610 (0.609 - 0.611) | 0.744 (0.744 - 0.745) | CRNN, ensemble | mean-teacher student, self training | cSEBBs | mean | ||
Nam_KAIST_task4_3 | Nam2024 | 1.35 | 0.610 (0.609 - 0.611) | 0.744 (0.744 - 0.744) | CRNN, ensemble | mean-teacher student, self training | cSEBBs | mean | ||
Nam_KAIST_task4_2 | Nam2024 | 1.32 | 0.586 (0.585 - 0.589) | 0.738 (0.732 - 0.745) | CRNN | mean-teacher student, self training | cSEBBs | |||
Schmid_CPJKU_task4_1 | Schmid2024 | 1.31 | 0.644 (0.640 - 0.647) | 0.672 (0.669 - 0.676) | ATST_CRNN | pseudo-labelling, mean-teacher student, interpolation consistency training | Sound Event Bounding Boxes | |||
Nam_KAIST_task4_1 | Nam2024 | 1.31 | 0.584 (0.582 - 0.587) | 0.726 (0.720 - 0.733) | CRNN | mean-teacher student, self training | cSEBBs | |||
Zhang_BUPT_task4_2 | Yue2024 | 1.27 | 0.570 (0.566 - 0.573) | 0.691 (0.691 - 0.691) | CRNN with pretrained BEATs | pseudo-labelling, mean-teacher student | median filtering | average | ||
Chen_NCUT_task4_4 | Chen2024a | 1.25 | 0.565 (0.563 - 0.566) | 0.684 (0.684 - 0.684) | CRNN, FFDCRNN, RNN | mean-teacher student | median filtering, weak prediction masking | weighted mean | ||
Chen_CHT_task4_3 | Chen2024 | 1.25 | 0.527 (0.524 - 0.530) | 0.711 (0.709 - 0.712) | Transformer, RNN | mean-teacher student | median filtering | average | ||
Zhang_BUPT_task4_1 | Yue2024 | 1.23 | 0.523 (0.523 - 0.524) | 0.704 (0.704 - 0.705) | CRNN with pretrained BEATs | pseudo-labelling, mean-teacher student | median filtering | |||
Chen_CHT_task4_1 | Chen2024 | 1.23 | 0.495 (0.486 - 0.503) | 0.733 (0.730 - 0.739) | Transformer, RNN | mean-teacher student | median filtering | |||
Kim_GIST-HanwhaVision_task4_1 | Son2024 | 1.23 | 0.567 (0.558 - 0.573) | 0.665 (0.646 - 0.677) | CRNN with pretrained transformer | mean-teacher student | median filtering, csebbs | |||
Chen_CHT_task4_2 | Chen2024 | 1.23 | 0.527 (0.524 - 0.530) | 0.691 (0.663 - 0.708) | Transformer, RNN | mean-teacher student | median filtering | average, majority vote | ||
Kim_GIST-HanwhaVision_task4_4 | Son2024 | 1.22 | 0.586 (0.578 - 0.597) | 0.638 (0.620 - 0.654) | CRNN with pretrained transformer | mean-teacher student | median filtering, csebbs | averaging | ||
Kim_GIST-HanwhaVision_task4_2 | Son2024 | 1.21 | 0.580 (0.560 - 0.599) | 0.629 (0.620 - 0.639) | CRNN with pretrained transformer | mean-teacher student | median filtering, csebbs | averaging | ||
Chen_NCUT_task4_3 | Chen2024a | 1.20 | 0.526 (0.524 - 0.527) | 0.675 (0.675 - 0.675) | RNN | mean-teacher student | median filtering, weak prediction masking | |||
Chen_NCUT_task4_1 | Chen2024a | 1.20 | 0.525 (0.523 - 0.527) | 0.667 (0.667 - 0.667) | CRNN | mean-teacher student | median filtering, weak prediction masking | |||
LEE_KT_task4_4 | Lee2024 | 1.20 | 0.509 (0.509 - 0.509) | 0.690 (0.690 - 0.690) | CRNN, Conformer, ensemble | mean-teacher student | median filtering, Sound Event Bounding Boxes | |||
Chen_CHT_task4_4 | Chen2024 | 1.20 | 0.500 (0.498 - 0.504) | 0.691 (0.663 - 0.708) | Transformer, RNN | mean-teacher student | median filtering | majority vote | ||
Chen_NCUT_task4_2 | Chen2024a | 1.19 | 0.519 (0.485 - 0.537) | 0.665 (0.659 - 0.669) | FFDCRNN | mean-teacher student, pseudo-labelling | median filtering, weak prediction masking | |||
LEE_KT_task4_1 | Lee2024 | 1.19 | 0.506 (0.482 - 0.548) | 0.684 (0.672 - 0.693) | CRNN, Conformer | mean-teacher student | median filtering, Sound Event Bounding Boxes | |||
Kim_GIST-HanwhaVision_task4_3 | Son2024 | 1.18 | 0.542 (0.525 - 0.560) | 0.637 (0.628 - 0.652) | CRNN with pretrained transformer | mean-teacher student | median filtering, csebbs | averaging | ||
LEE_KT_task4_3 | Lee2024 | 1.17 | 0.468 (0.468 - 0.468) | 0.692 (0.692 - 0.692) | CRNN, Conformer, ensemble | mean-teacher student | median filtering | |||
XIAO_FMSG-JLESS_task4_4 | Xiao2024 | 1.17 | 0.606 (0.606 - 0.606) | 0.566 (0.566 - 0.566) | FDYCRNN, CRNN | mean-teacher student | median filtering, sebbs | |||
LEE_KT_task4_2 | Lee2024 | 1.16 | 0.474 (0.471 - 0.479) | 0.676 (0.666 - 0.690) | CRNN | mean-teacher student | median filtering | |||
Baseline | Cornell2024 | 1.13 | 0.475 (0.469 - 0.479) | 0.646 (0.641 - 0.653) | CRNN | mean-teacher student | median filtering | |||
XIAO_FMSG-JLESS_task4_3 | Xiao2024 | 1.12 | 0.574 (0.574 - 0.574) | 0.553 (0.553 - 0.553) | FDYCRNN | mean-teacher student | median filtering, sebbs | |||
XIAO_FMSG-JLESS_task4_2 | Xiao2024 | 1.12 | 0.597 (0.597 - 0.597) | 0.530 (0.530 - 0.530) | CRNN | mean-teacher student | median filtering, sebbs | |||
Lyu_SCUT_task4_2 | Lyu2024 | 1.10 | 0.478 (0.474 - 0.481) | 0.612 (0.596 - 0.624) | CRNN | mean-teacher student | median filtering | |||
Lyu_SCUT_task4_1 | Lyu2024 | 1.08 | 0.474 (0.469 - 0.482) | 0.602 (0.586 - 0.619) | CRNN | mean-teacher student | median filtering | |||
Niu_XJU_task4_1 | Niu2024 | 1.07 | 0.465 (0.462 - 0.467) | 0.603 (0.599 - 0.610) | CRNN | mean-teacher student | median filtering | |||
XIAO_FMSG-JLESS_task4_1 | Xiao2024 | 1.06 | 0.575 (0.575 - 0.575) | 0.490 (0.490 - 0.490) | CRNN | mean-teacher student | median filtering, sebbs | |||
Cai_USTC_task4_2 | Cai2024 | 0.63 | 0.574 (0.573 - 0.574) | 0.050 (0.050 - 0.050) | MAT-SED | mean-teacher student | median filtering | |||
Cai_USTC_task4_1 | Cai2024 | 0.61 | 0.561 (0.560 - 0.561) | 0.050 (0.050 - 0.050) | MAT-SED | mean-teacher student | median filtering | |||
Cai_USTC_task4_4 | Cai2024 | 0.56 | 0.506 (0.505 - 0.507) | 0.050 (0.050 - 0.050) | MAT-SED, ATST-SED | mean-teacher student | median filtering | |||
Cai_USTC_task4_3 | Cai2024 | 0.47 | 0.417 (0.402 - 0.428) | 0.050 (0.050 - 0.050) | MAT-SED, ATST-SED | mean-teacher student | median filtering | |||
Huang_SJTU_task4_1 | Huang2024 | 0.20 | 0.000 (0.000 - 0.000) | 0.196 (0.189 - 0.202) | CRNN | pseudo-labelling, mean-teacher student | median filtering | averaging | ||
Huang_SJTU_task4_3 | Huang2024 | 0.17 | 0.000 (0.000 - 0.000) | 0.172 (0.165 - 0.179) | CRNN | pseudo-labelling, mean-teacher student | median filtering | averaging | ||
Huang_SJTU_task4_2 | Huang2024 | 0.15 | 0.000 (0.000 - 0.000) | 0.149 (0.137 - 0.159) | CRNN | pseudo-labelling, mean-teacher student | median filtering | averaging | ||
Huang_SJTU_task4_4 | Huang2024 | 1.20 | 0.519 (0.516 - 0.522) | 0.678 (0.669 - 0.685) | CRNN | pseudo-labelling, mean-teacher student | median filtering |
Complexity
Rank | Code |
Technical Report |
Ranking score (Evaluation dataset) |
PSDS (DESED evaluation dataset) |
mpAUC (MAESTRO evaluation dataset) |
Model complexity |
MACS |
Ensemble subsystems |
Training time |
---|---|---|---|---|---|---|---|---|---|
Schmid_CPJKU_task4_4 | Schmid2024 | 1.42 | 0.680 (0.679 - 0.682) | 0.739 (0.736 - 0.742) | 1342986395 | 450300000000 | 15 | 160h (1 Nvidia A40) | |
Schmid_CPJKU_task4_3 | Schmid2024 | 1.39 | 0.676 (0.674 - 0.678) | 0.715 (0.714 - 0.718) | 1608946202 | 560410000000 | 18 | 199h (1 Nvidia A40) | |
Schmid_CPJKU_task4_2 | Schmid2024 | 1.35 | 0.646 (0.640 - 0.654) | 0.711 (0.704 - 0.717) | 88411541 | 22590000000 | 8h (1 Nvidia A40) | ||
Nam_KAIST_task4_4 | Nam2024 | 1.35 | 0.610 (0.609 - 0.611) | 0.744 (0.744 - 0.745) | 181600000 | 26085000000 | 17 | 3h (1 GTX 1080 Ti) | |
Nam_KAIST_task4_3 | Nam2024 | 1.35 | 0.610 (0.609 - 0.611) | 0.744 (0.744 - 0.744) | 181600000 | 26085000000 | 15 | 3h (1 GTX 1080 Ti) | |
Nam_KAIST_task4_2 | Nam2024 | 1.32 | 0.586 (0.585 - 0.589) | 0.738 (0.732 - 0.745) | 181600000 | 26085000000 | 3h (1 GTX 1080 Ti) | ||
Schmid_CPJKU_task4_1 | Schmid2024 | 1.31 | 0.644 (0.640 - 0.647) | 0.672 (0.669 - 0.676) | 88411541 | 22590000000 | 8h (1 Nvidia A40) | ||
Nam_KAIST_task4_1 | Nam2024 | 1.31 | 0.584 (0.582 - 0.587) | 0.726 (0.720 - 0.733) | 181600000 | 26085000000 | 10h (1 RTX A6000) | ||
Zhang_BUPT_task4_2 | Yue2024 | 1.27 | 0.570 (0.566 - 0.573) | 0.691 (0.691 - 0.691) | 63300000 | 4986000000 | 6 | 4h (1 GeForce RTX 3090) | |
Chen_NCUT_task4_4 | Chen2024a | 1.25 | 0.565 (0.563 - 0.566) | 0.684 (0.684 - 0.684) | 40000000 | 4104000000 | 3 | 3h15m (1 GTX 4090) | |
Chen_CHT_task4_3 | Chen2024 | 1.25 | 0.527 (0.524 - 0.530) | 0.711 (0.709 - 0.712) | 389600000 | 202214000000 | 11 | 137h (1 A100) | |
Zhang_BUPT_task4_1 | Yue2024 | 1.23 | 0.523 (0.523 - 0.524) | 0.704 (0.704 - 0.705) | 10500000 | 831000000 | 34h (1 GeForce RTX 3090) | ||
Chen_CHT_task4_1 | Chen2024 | 1.23 | 0.495 (0.486 - 0.503) | 0.733 (0.730 - 0.739) | 92100000 | 45259000000 | 23h (1 A100) | ||
Kim_GIST-HanwhaVision_task4_1 | Son2024 | 1.23 | 0.567 (0.558 - 0.573) | 0.665 (0.646 - 0.677) | 4822398 | 7304359968 | 20~24h (3 A6000) | ||
Chen_CHT_task4_2 | Chen2024 | 1.23 | 0.527 (0.524 - 0.530) | 0.691 (0.663 - 0.708) | 393000000 | 202768000000 | 144h (1 A100) | ||
Kim_GIST-HanwhaVision_task4_4 | Son2024 | 1.22 | 0.586 (0.578 - 0.597) | 0.638 (0.620 - 0.654) | 617266944 | 7304359968 | 128 | 24h (3 A6000) | |
Kim_GIST-HanwhaVision_task4_2 | Son2024 | 1.21 | 0.580 (0.560 - 0.599) | 0.629 (0.620 - 0.639) | 308633472 | 7304359968 | 64 | 24h (3 A6000) | |
Chen_NCUT_task4_3 | Chen2024a | 1.20 | 0.526 (0.524 - 0.527) | 0.675 (0.675 - 0.675) | 17400000 | 1362000000 | 30m (1 GTX 4090) | ||
Chen_NCUT_task4_1 | Chen2024a | 1.20 | 0.525 (0.523 - 0.527) | 0.667 (0.667 - 0.667) | 2500000 | 950200000 | 45m (1 GTX 4090) | ||
LEE_KT_task4_4 | Lee2024 | 1.20 | 0.509 (0.509 - 0.509) | 0.690 (0.690 - 0.690) | 1834547 | 104724000000 | 4 | 2h (A6000) | |
Chen_CHT_task4_4 | Chen2024 | 1.20 | 0.500 (0.498 - 0.504) | 0.691 (0.663 - 0.708) | 207000000 | 111214000000 | 9 | 95h (1 A100) | |
Chen_NCUT_task4_2 | Chen2024a | 1.19 | 0.519 (0.485 - 0.537) | 0.665 (0.659 - 0.669) | 20100000 | 1792000000 | 2h (1 GTX 4090) | ||
LEE_KT_task4_1 | Lee2024 | 1.19 | 0.506 (0.482 - 0.548) | 0.684 (0.672 - 0.693) | 1097897 | 26181000000 | 4h (1 NVIDIA GeForce RTX 4090) | ||
Kim_GIST-HanwhaVision_task4_3 | Son2024 | 1.18 | 0.542 (0.525 - 0.560) | 0.637 (0.628 - 0.652) | 308633472 | 7304359968 | 64 | 24h (3 A6000) | |
LEE_KT_task4_3 | Lee2024 | 1.17 | 0.468 (0.468 - 0.468) | 0.692 (0.692 - 0.692) | 1898428 | 96896000000 | 4 | 2h (A6000) | |
XIAO_FMSG-JLESS_task4_4 | Xiao2024 | 1.17 | 0.606 (0.606 - 0.606) | 0.566 (0.566 - 0.566) | 15658236 | 4380000000 | 6 | 30h (1 RTX A5000) | |
LEE_KT_task4_2 | Lee2024 | 1.16 | 0.474 (0.471 - 0.479) | 0.676 (0.666 - 0.690) | 1161778 | 22267000000 | 4h (1 NVIDIA GeForce RTX 4090) | ||
Baseline | Cornell2024 | 1.13 | 0.475 (0.469 - 0.479) | 0.646 (0.641 - 0.653) | 1800000 | 1036000000 | 3h (1 GTX 1080 Ti) | ||
XIAO_FMSG-JLESS_task4_3 | Xiao2024 | 1.12 | 0.574 (0.574 - 0.574) | 0.553 (0.553 - 0.553) | 3438938 | 345260000 | 5h (1 RTX A5000) | ||
XIAO_FMSG-JLESS_task4_2 | Xiao2024 | 1.12 | 0.597 (0.597 - 0.597) | 0.530 (0.530 - 0.530) | 1780474 | 1659000000 | 6h (1 RTX A5000) | ||
Lyu_SCUT_task4_2 | Lyu2024 | 1.10 | 0.478 (0.474 - 0.481) | 0.612 (0.596 - 0.624) | 1100000 | 20730000000 | 10.5h (1 RTX 4090 D) | ||
Lyu_SCUT_task4_1 | Lyu2024 | 1.08 | 0.474 (0.469 - 0.482) | 0.602 (0.586 - 0.619) | 1400000 | 20822000000 | 10h (1 RTX 4090 D) | ||
Niu_XJU_task4_1 | Niu2024 | 1.07 | 0.465 (0.462 - 0.467) | 0.603 (0.599 - 0.610) | 28 | 1431000000 | 9h (1 GTX TITAN) | ||
XIAO_FMSG-JLESS_task4_1 | Xiao2024 | 1.06 | 0.575 (0.575 - 0.575) | 0.490 (0.490 - 0.490) | 1780474 | 1035000000 | 4h (1 RTX A5000) | ||
Cai_USTC_task4_2 | Cai2024 | 0.63 | 0.574 (0.573 - 0.574) | 0.050 (0.050 - 0.050) | 92608612 | 110175122688 | 12h (NVIDIA GeForce RTX 3090) | ||
Cai_USTC_task4_1 | Cai2024 | 0.61 | 0.561 (0.560 - 0.561) | 0.050 (0.050 - 0.050) | 90592532 | 108500000000 | 12h (NVIDIA GeForce RTX 3090) | ||
Cai_USTC_task4_4 | Cai2024 | 0.56 | 0.506 (0.505 - 0.507) | 0.050 (0.050 - 0.050) | 185217224 | 110175122688 | 2 | 12h (NVIDIA GeForce RTX 3090) | |
Cai_USTC_task4_3 | Cai2024 | 0.47 | 0.417 (0.402 - 0.428) | 0.050 (0.050 - 0.050) | 185217224 | 110175122688 | 2 | 12h (NVIDIA GeForce RTX 3090) | |
Huang_SJTU_task4_1 | Huang2024 | 0.20 | 0.000 (0.000 - 0.000) | 0.196 (0.189 - 0.202) | 9100000 | 1688000000 | 7 | 8h (1 NVIDIA A10) | |
Huang_SJTU_task4_3 | Huang2024 | 0.17 | 0.000 (0.000 - 0.000) | 0.172 (0.165 - 0.179) | 26000000 | 1688000000 | 20 | 8h (1 NVIDIA A10) | |
Huang_SJTU_task4_2 | Huang2024 | 0.15 | 0.000 (0.000 - 0.000) | 0.149 (0.137 - 0.159) | 13000000 | 1688000000 | 10 | 8h (1 NVIDIA A10) | |
Huang_SJTU_task4_4 | Huang2024 | 1.20 | 0.519 (0.516 - 0.522) | 0.678 (0.669 - 0.685) | 1300000 | 1688000000 | 8h (1 NVIDIA A10) |
Technical reports
TRANSFORMER-BASED SOUND EVENT DETECTION SYSTEM FOR DCASE2024 TASK4
Pengfei Cai, Yan Song
University of Science and Technology of China
Cai_USTC_task4_1Cai_USTC_task4_2Cai_USTC_task4_3Cai_USTC_task4_4
TRANSFORMER-BASED SOUND EVENT DETECTION SYSTEM FOR DCASE2024 TASK4
Pengfei Cai, Yan Song
University of Science and Technology of China
Abstract
In this technical report, we describe our systems for DCASE 2024 Challenge Task4. Our systems are mainly based on MAT-SED, a pure Transformer-based SED model with masked-reconstruction based pre-training. In MAT-SED, a Transformer with relative positional encoding is first designed as the context network instead of RNNs. The Transformer-based context network is pre-trained by the masked-reconstruction task on all available target data in a self-supervised way. Both the encoder and the context network are jointly fine-tuned in a semi-supervised manner. Our final systems achieve PSDS1 of 0.588(single model) and 0.600(ensemble) on the validation set of DESED dataset.
SOUND EVENT DETECTION WITH HETEROGENEOUS TRAINING DATASET AND POTENTIALLY MISSING LABELS FOR DCASE 2024 TASK 4
Wei-Yu Chen, Chung-Li Lu, Hsiang-Feng Chuang, Yu-Han Cheng, Bo-Cheng Chan
Advanced Technology Laboratory, Telecommunication Laboratories, Chunghwa Telecom Co., Ltd., Taiwan
Chen_CHT_task4_1 Chen_CHT_task4_2Chen_CHT_task4_3 Chen_CHT_task4_4
SOUND EVENT DETECTION WITH HETEROGENEOUS TRAINING DATASET AND POTENTIALLY MISSING LABELS FOR DCASE 2024 TASK 4
Wei-Yu Chen, Chung-Li Lu, Hsiang-Feng Chuang, Yu-Han Cheng, Bo-Cheng Chan
Advanced Technology Laboratory, Telecommunication Laboratories, Chunghwa Telecom Co., Ltd., Taiwan
Abstract
In this technical report, we briefly describe the system we designed for Detection and Classification of Acoustic Scenes and Events (DCASE) 2024 Challenge Task4: Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels. Our optimal single system employs a two-stage training process. Pretrained BEATs[1] model is utilized as front-end feature extractor, with Bi-GRU module as back-end classifier for each single frame. We employ the mean teacher method for semi-supervised learning, incorporating the EMA strategy to update parameters of the teacher model. Additionally, we generate pseudo-labels using the student model to leverage unlabeled data. For data augmentation, techniques such as mix-up and SpecAugment [2] are employed. Median filter is used for post-processing. The submitted system without ensemble, achieves a Polyphonic sound event detection scores-scenario 1 (PSDS1)[3] score of 0.50 and a mean partial AUC(mean pAUC) of 0.73, while with ensemble it achieves a PSDS1 score of 0.53 and a mean pAUC of 0.77 on the validation set.
SEMI-SUPERVISED SOUND EVENT DETECTION BASED ON PRETRAINED MODELS FOR DCASE 2024 TASK 4
Jingxuan Chen, Xichang Cai, Ziyi Liu, Haiyue Zhang, Liangxiao Zuo, Menglong Wu
North China University of Technology, China
Chen_NCUT_task4_1 Chen_NCUT_task4_2Chen_NCUT_task4_3 Chen_NCUT_task4_4
SEMI-SUPERVISED SOUND EVENT DETECTION BASED ON PRETRAINED MODELS FOR DCASE 2024 TASK 4
Jingxuan Chen, Xichang Cai, Ziyi Liu, Haiyue Zhang, Liangxiao Zuo, Menglong Wu
North China University of Technology, China
Abstract
In this technical report, we present our submission system for DCASE 2024 Task 4: Sound Event Detection in Domestic Environments with Heterogeneous Training Dataset and Potentially Missing Labels. Firstly, our proposed system employs a full-frequency dynamic convolution (FFD-Conv) network based on the Mean Teacher semi-supervised learning framework. Secondly, we utilize a two-stage training framework, where in the first stage, a large unlabeled in-domain set is converted into pseudo-weak labels to balance the number of strongly labeled datasets in the second stage. Additionally, we employ various methods such as data augmentation, post-processing, and model ensembling to further enhance the generalization capability of the system. Ultimately, our system achieved a PSDS-scenario1 score of 0.535 and a macro-average pAUC score of 0.697 on the validation set.
DCASE 2024 TASK 4: SOUND EVENT DETECTION WITH HETEROGENEOUS DATA AND MISSING LABELS
Samuele Cornell1, Janek Ebbers2, Constance Douwes3, Irene Martı́n-Morató4, Manu Harju4, Annamaria Mesaros4, Romain Serizel3
1Carnegie Mellon University, USA, 2Mitsubishi Electric Research Laboratories, USA, 3Universite de Lorraine, CNRS, Inria, Loria, France, 4Tampere University, Finland
Cornell_CMU_task4_1
DCASE 2024 TASK 4: SOUND EVENT DETECTION WITH HETEROGENEOUS DATA AND MISSING LABELS
Samuele Cornell1, Janek Ebbers2, Constance Douwes3, Irene Martı́n-Morató4, Manu Harju4, Annamaria Mesaros4, Romain Serizel3
1Carnegie Mellon University, USA, 2Mitsubishi Electric Research Laboratories, USA, 3Universite de Lorraine, CNRS, Inria, Loria, France, 4Tampere University, Finland
Abstract
The Detection and Classification of Acoustic Scenes and Events Challenge Task 4 aims to advance sound event detection (SED) systems in domestic environments by leveraging training data with different supervision uncertainty. Participants are challenged in exploring how to best use training data from different domains and with varying annotation granularity (strong/weak temporal resolution, soft/hard labels), to obtain a robust SED system that can generalize across different scenarios. Crucially, annotation across available training datasets can be inconsistent and hence sound labels of one dataset may be present but not annotated in the other one and vice-versa. As such, systems will have to cope with potentially missing target labels during training. Moreover, as an additional novelty, systems will also be evaluated on labels with different granularity in order to assess their robustness for different applications. To lower the entry barrier for participants, we developed an updated baseline system with several caveats to address these aforementioned problems. Results with our baseline system indicate that this research direction is promising and is possible to obtain a stronger SED system by using diverse domain training data with missing labels compared to training a SED system for each domain separately.
SOUND EVENT DETECTION ENHANCED BY SCENE INFORMATION FOR DCASE CHALLENGE 2024 TASK4
Wen Huang1, Bing Han1, Xie Chen1, Pingyi Fan2, Cheng Lu3, Zhiqiang Lv4, Jia Liu2,4, Wei-Qiang Zhang2, Yanmin Qian1
1Shanghai Jiao Tong University, Shanghai, China, 2Tsinghua University, Beijing, China, 3North China Electric Power University, Beijing, China, 4Huakong AI Plus Company Limited, Beijing, China
Cai_USTC_task4_1Cai_USTC_task4_2Cai_USTC_task4_3Cai_USTC_task4_4
SOUND EVENT DETECTION ENHANCED BY SCENE INFORMATION FOR DCASE CHALLENGE 2024 TASK4
Wen Huang1, Bing Han1, Xie Chen1, Pingyi Fan2, Cheng Lu3, Zhiqiang Lv4, Jia Liu2,4, Wei-Qiang Zhang2, Yanmin Qian1
1Shanghai Jiao Tong University, Shanghai, China, 2Tsinghua University, Beijing, China, 3North China Electric Power University, Beijing, China, 4Huakong AI Plus Company Limited, Beijing, China
Abstract
In this technical report, we describe our submission to the DCASE 2024 Challenge Task 4: Sound Event Detection with Heterogeneous Training Data and Potentially Missing Labels. Our approach leverages a Convolutional Recurrent Neural Network (CRNN) architecture enhanced with pre-trained BEATs embeddings to perform robust sound event detection. To effectively utilize different sources of data, we integrate scene information to enhance event detection performance through multi-task learning. Additionally, we address the challenge of partially missing labels by employing a semi-supervised strategy that combines the mean teacher model with pseudo-labeling to improve performance. Our final ensemble system achieves a PSDS1 score of 0.545 on the DESED validation set and an mpAUC score of 0.759 on the MAESTRO real validation set. These results highlight the efficacy of incorporating scene information and semi-supervised learning strategies in sound event detection tasks with heterogeneous and incomplete datasets.
TECHNICAL REPORT ON LEE SUBMISSION: SOUND EVENT DETECTION USING CONFORMER AND ATST FRAMEWORK FOR DCASE CHALLENGE 2024 TASK 4
Yuna Lee, JaeHoon Jung
KT Corporation, Republic of Korea
LEE_KT_task4_1 LEE_KT_task4_2LEE_KT_task4_3 LEE_KT_task4_4
TECHNICAL REPORT ON LEE SUBMISSION: SOUND EVENT DETECTION USING CONFORMER AND ATST FRAMEWORK FOR DCASE CHALLENGE 2024 TASK 4
Yuna Lee, JaeHoon Jung
KT Corporation, Republic of Korea
Abstract
Sound Event Detection (SED) has shown promising performance in detecting and classifying meaningful events on the given audio signal input. Since the real-world scenario does not provide well-labeled data, there had been an urge to extend the research to a rather “coarse” labeled dataset. In this report, we propose a novel model to perform robustly on the well-labeled datasets and potentially missing labeled datasets using large pre-trained audio transformers throughout the training process. Our method can improve the performance to 0.52 in P SDS1 and 0.77 in pAUCM.
Semi-Supervised Sound Event Detection System Based on Complex Convolutional Recurrent Neural Network
Hong Lyu, Qianhua He
School of Electronic and Information Engineering, South China University of Technology, China
Lyu_SCUT_task4_1 Lyu_SCUT_task4_2
Semi-Supervised Sound Event Detection System Based on Complex Convolutional Recurrent Neural Network
Hong Lyu, Qianhua He
School of Electronic and Information Engineering, South China University of Technology, China
Abstract
This report describes the system we proposed for Task 4 of DCASE 2024. To investigate the impact of complex information on sound event detection tasks, we designed a system based on Complex Convolutional Recurrent Neural Network[1] for semi-supervised Sound Event Detection (CCRN-SED). We utilized the Mean Teacher[2] for semi-supervised learning, which can address the challenge of unlabeled data. In addition, we use BEATs pretrained model[3] to extract information from data outside the development set. The optimal PSDS1 and mean pAUC of CCRN-SED on the development test set are 0.508 and 0.693.
SELF TRAINING AND ENSEMBLING FREQUENCY DEPENDENT NETWORKS WITH COARSE PREDICTION POOLING AND SOUND EVENT BOUNDING BOXES
Hyeonuk Nam, Deokki Min, Seungdeok Choi, Inhan Choi, Yong-Hwa Park
Korea Advanced Institute of Science and Technology, South Korea
NAM_SED_1 NAM_SED_2NAM_SED_3 NAM_SED_4
SELF TRAINING AND ENSEMBLING FREQUENCY DEPENDENT NETWORKS WITH COARSE PREDICTION POOLING AND SOUND EVENT BOUNDING BOXES
Hyeonuk Nam, Deokki Min, Seungdeok Choi, Inhan Choi, Yong-Hwa Park
Korea Advanced Institute of Science and Technology, South Korea
Abstract
To tackle sound event detection (SED) task, we propose frequency dependent networks (FreDNets), which heavily leverage frequency-dependent methods. We apply frequency warping and FilterAugment, which are frequency-dependent data augmentation methods. The model architecture consists of 3 branches: audio teacher-student transformer (ATST) branch, BEATs branch and CNN branch including either partial dilated frequency dynamic convolution (PDFD) or squeeze-and-Excitation (SE) with time-frame frequency-wise SE (tfwSE). To train MAESTRO labels with coarse temporal resolution, we apply max pooling on prediction for the MAESTRO dataset. Using best ensemble model, we apply self training to obtain pseudo label from DESED weak set, DESED unlabeled set and AudioSet. AudioSet labels are filtered to focus on high-confidence pseudo labels and AudioSet pseudo labels are used to train on DESED labels only. We used change-detection-based sound event bounding boxes (cSEBBs) as post processing for ensemble models on self training and submission models.
A EFFICIENCE SOUND EVENT DETECTION SYSTEM FOR DCASE 2024 TASK 4
ZunXue Niu1,2, Ying Hu1,2, Xin Fan1,2, Jie Liu1,2, Ye Dong1,2, Fujie Xu1,2, ShangKun Tu1,2, KaiMin Cao1,2, JiaBo Jing1,2, Qiong Wu1,2, QingJing Wan1,2
1XinJiang University, School of Information Science and Engineering, China, 2Key Laboratory of Signal Detection and Processing in Xinjiang, China
Niu_XJU_task4_1
A EFFICIENCE SOUND EVENT DETECTION SYSTEM FOR DCASE 2024 TASK 4
ZunXue Niu1,2, Ying Hu1,2, Xin Fan1,2, Jie Liu1,2, Ye Dong1,2, Fujie Xu1,2, ShangKun Tu1,2, KaiMin Cao1,2, JiaBo Jing1,2, Qiong Wu1,2, QingJing Wan1,2
1XinJiang University, School of Information Science and Engineering, China, 2Key Laboratory of Signal Detection and Processing in Xinjiang, China
Abstract
This technical report describes the system we submitted to DCASE2024 Task4: Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels. Specifically, we apply three main techniques to improve the performance of the official baseline system. Firstly, We exploiting a dual-branch convolutional recurrent neural network (CRNN) structure including the main branch and auxiliary branch. We adopt an SCT strategy to apply the self-consistency regularization in addition to the Mean Teacher loss to maintain the consistency between the outputs of the auxiliary and main branches. Secondly, a HTA module is designed to aggregate the information at different temporal resolutions so that the receptive fields of the network can be adjusted according to the short-term and long-term correlation. Thirdly, several data augmentation strategies are adopted to improve the robust of the network. Experiments on the DCASE2024 Task4 validation dataset demonstrate the effectiveness of the techniques used in our system.
IMPROVING AUDIO SPECTROGRAM TRANSFORMERS FOR SOUND EVENT DETECTION THROUGH MULTI-STAGE TRAINING
Florian Schmid1, Paul Primus1, Tobias Morocutti1, Jonathan Greif1, Gerhard Widmer1,2
1Institute of Computational Perception (CP-JKU), Johannes Kepler University Linz, Austria, 2LIT Artificial Intelligence Lab, Johannes Kepler University Linz, Austria
Schmid_CPJKU_task4_1 Schmid_CPJKU_task4_2Schmid_CPJKU_task4_3 Schmid_CPJKU_task4_4
IMPROVING AUDIO SPECTROGRAM TRANSFORMERS FOR SOUND EVENT DETECTION THROUGH MULTI-STAGE TRAINING
Florian Schmid1, Paul Primus1, Tobias Morocutti1, Jonathan Greif1, Gerhard Widmer1,2
1Institute of Computational Perception (CP-JKU), Johannes Kepler University Linz, Austria, 2LIT Artificial Intelligence Lab, Johannes Kepler University Linz, Austria
Abstract
This technical report describes the CP-JKU team’s submission for Task 4 Sound Event Detection with Heterogeneous Training Datasets and Potentially Missing Labels of the DCASE 24 Challenge. We fine-tune three large Audio Spectrogram Transformers, PaSST, BEATs, and ATST, on the joint DESED and MAESTRO datasets in a two-stage training procedure. The first stage closely matches the baseline system setup and trains a CRNN model while keeping the large pre-trained transformer model frozen. In the second stage, both CRNN and transformer are fine-tuned using heavily weighted self-supervised losses. After the second stage, we compute strong pseudo-labels for all audio clips in the training set using an ensemble of all three fine-tuned transformers. Then, in a second iteration, we repeat the two-stage training process and include a distillation loss based on the pseudo-labels, boosting single-model performance substantially. Additionally, we pre-train PaSST and ATST on the subset of AudioSet that comes with strong temporal labels, before fine-tuning them on the Task 4 datasets.
SOUND EVENT DETECTION BASED ON AUXILIARY DECODER AND MAXIMUM PROBABILITY AGGREGATION FOR DCASE CHALLENGE 2024 TASK 4
Sang Won Son1, Jongyeon Park1, Hong Kook Kim1,2, Sulaiman Vesal3, Jeong Eun Lim4
1AI Graduate School, Gwangju Institute of Science and Technology Korea, 2 School of EECSHanwha Vision, Gwangju Institute of Science and Technology Korea, 3AI Lab., Innovation Center, USA, 4AI Lab., R&D Center Hanwha Vision, Korea
Kim_GIST-HanwhaVision_task4_1 Kim_GIST-HanwhaVision_task4_2Kim_GIST-HanwhaVision_task4_3 Kim_GIST-HanwhaVision_task4_4
SOUND EVENT DETECTION BASED ON AUXILIARY DECODER AND MAXIMUM PROBABILITY AGGREGATION FOR DCASE CHALLENGE 2024 TASK 4
Sang Won Son1, Jongyeon Park1, Hong Kook Kim1,2, Sulaiman Vesal3, Jeong Eun Lim4
1AI Graduate School, Gwangju Institute of Science and Technology Korea, 2 School of EECSHanwha Vision, Gwangju Institute of Science and Technology Korea, 3AI Lab., Innovation Center, USA, 4AI Lab., R&D Center Hanwha Vision, Korea
Abstract
In this report, we propose three novel methods for developing a sound event detection (SED) model for the DCASE 2024 Challenge Task 4. First, we propose an auxiliary decoder attached to the final convolutional block to improve feature extraction capabilities while reducing dependency on embeddings from pretrained large models. The proposed auxiliary decoder operates independently from the main decoder, enhancing performance of the convolutional block during the initial training stages by assigning a different weight strategy between main and auxiliary decoder losses. Next, to address the time interval issue between the DESED and MAESTRO datasets, we propose maximum probability aggregation (MPA) during the training step. The proposed MPA method enables the model’s output to be aligned with soft labels of 1 s in the MAESTRO dataset. Finally, we propose a multi-channel input feature that employs various versions of log-mel and MFCC features to generate time-frequency pattern. The experimental results demonstrate the efficacy of these proposed methods in a view of improving SED performance by achieving a balanced enhancement across different datasets and label types. Ultimately, this approach presents a significant step forward in developing more robust and flexible SED models.
FMSG-JLESS SUBMISSION FOR DCASE 2024 TASK4 ON SOUND EVENT DETECTION WITH HETEROGENEOUS TRAINING DATASET AND POTENTIALLY MISSING LABELS
Yang Xiao1, Han Yin2, Jisheng Bai2, Rohan Kumar Das1
1Fortemedia Singapore, Singapore, 2 Joint Laboratory of Environmental Sound Sensing, School of Marine Science and Technology, Northwestern Polytechnical University, China
XIAO_FMSG-JLESS_task4_1 XIAO_FMSG-JLESS_task4_2XIAO_FMSG-JLESS_task4_3 XIAO_FMSG-JLESS_task4_4
FMSG-JLESS SUBMISSION FOR DCASE 2024 TASK4 ON SOUND EVENT DETECTION WITH HETEROGENEOUS TRAINING DATASET AND POTENTIALLY MISSING LABELS
Yang Xiao1, Han Yin2, Jisheng Bai2, Rohan Kumar Das1
1Fortemedia Singapore, Singapore, 2 Joint Laboratory of Environmental Sound Sensing, School of Marine Science and Technology, Northwestern Polytechnical University, China
Abstract
This report presents the systems developed and submitted by Fortemedia Singapore (FMSG) and Joint Laboratory of Environmental Sound Sensing (JLESS) for DCASE 2024 Task 4. The task focuses on recognizing event classes and their time boundaries, given that multiple events can be present and may overlap in an audio recording. The novelty this year is a dataset with two sources, making it challenging to achieve good performance without knowing the source of the audio clips during evaluation. To address this, we propose a sound event detection method using domain generalization. Our approach integrates features from bidirectional encoder representations from audio transformers and a convolutional recurrent neural network. We focus on three main strategies to improve our method. First, we apply mixstyle to the frequency dimension to adapt the mel-spectrograms from different domains. Second, we consider training loss of our model specific to each datasets for their corresponding classes. This independent learning framework helps the model extract domain-specific features effectively. Lastly, we use the sound event bounding boxes method for post-processing. Our proposed method shows superior macro-average pAUC and polyphonic SED score performance on the DCASE 2024 Challenge Task 4 validation dataset and public evaluation dataset.
LOCAL AND GLOBAL FEATURES FUSION FOR SOUND EVENT DETECTION WITH HETEROGENEOUS TRAINING DATASET AND POTENTIALLY MISSING LABELS
Haobo Yue, Zehao Wang, Da Mu, Huamei Sun, Yuanyuan Jiang, Zhicheng Zhang, Jianqin Yin
Beijing University of Posts and Telecommunications, China
Zhang_BUPT_task4_1 Zhang_BUPT_task4_2
LOCAL AND GLOBAL FEATURES FUSION FOR SOUND EVENT DETECTION WITH HETEROGENEOUS TRAINING DATASET AND POTENTIALLY MISSING LABELS
Haobo Yue, Zehao Wang, Da Mu, Huamei Sun, Yuanyuan Jiang, Zhicheng Zhang, Jianqin Yin
Beijing University of Posts and Telecommunications, China
Abstract
In this work, we present our submission system for DCASE 2024 Task4 on Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels, where we introduce the BEATs-CRNN interactive systems. Considering that the pretrained BEATs model predominantly captures global features for the dataset, while the CRNN model focuses on learning local features, this work aims to fuse the middle layer information of the two to enhance the system’s feature extraction capabilities. Firstly, we modify the BEATs model and the CRNN model so that the feature extraction of the dataset by the two models is performed at the same stage. Secondly, due to the differing number of layers in CNN and BEATs, we extract intermediate features from both models at regular intervals, interact them through cross-attention, and then feed the resulting features back to the respective models for the feature extration in the subsequent layer. Finally, the final interaction results of the two models are used as the final features for learning. Compared to the baseline system using BEATs embeddings, which achieved 48.3% in PSDS-scenario 1, 49.4% in PSDS-scenario1 (sed score), and 73.7% in mean-pAUC, our BEATs-CRNN interactive system achieves 53.2%, 54.1%, and 76.3%, respectively. The ensemble of the BEATs-CRNN interactive system further improves the PSDS-scenario 1 to 56.4%, the PSDS-scenario1 (sed score) to 57.4% and the mean-pAUC to 75.6%.