Task description

Detailed task description can be found in the task description page

Teams ranking

Table including the best-ranked system for each participant team. The DCASE 2026 Task 4 baseline is included as a reference.

Submission Information		Evaluation Set			Test (Development) Set
Submission Code	Technical Report	Official Team Rank	CAPI-SDRi (eval)	Label Prediction Accuracy (mix) (eval)	CAPI-SDRi (test)	Label Prediction Accuracy (mix) (test)
Bando_AIST_task4_3	Bando_AIST2026	1	14.93	65.54	16.32	64.88
Choi_KAIST_task4_4	Choi_KAIST2026	2	12.98	64.88	14.65	66.07
Saijo_Mitsubishi_task4_3	Saijo_Mitsubishi2026	3	12.94	76.92	14.95	79.10
Wang_SRCN_task4_2	Wang_SRCN2026	4	10.13	57.80	11.74	62.23
Park_SGU_task4_3	Park_SGU2026	5	10.10	53.17	11.42	56.09
You_PKU_task4_4	You_PKU2026	6	8.24	52.58	11.85	74.87
Jeong_Medisensing_task4_1	Jeong_Medisensing2026	7	6.97	58.53	8.98	63.09
Wang_BUPT_task4_2	Wang_BUPT2026	8	6.90	56.61	8.81	61.44
Deng_WHU_task4_1	Deng_WHU2026	9	6.84	58.13	8.62	61.71
Baseline_Task4_1c		10	6.77	56.55	8.17	57.14
Park_KUBIG_task4_1	Park_KUBIG2026	11	1.18	39.81	1.20	40.28

Systems ranking

Table shows the ranking of all submitted systems. The DCASE 2026 Task 4 baseline systems are included as references.

Submission Information		Evaluation Set			Test (Development) Set
Submission Code	Technical Report	Official System Rank	CAPI-SDRi (eval)	Label Prediction Accuracy (mix) (eval)	CAPI-SDRi (test)	Label Prediction Accuracy (mix) (test)
Bando_AIST_task4_3	Bando_AIST2026	1	14.93	65.54	16.32	64.88
Bando_AIST_task4_4	Bando_AIST2026	2	14.61	59.52	16.36	61.51
Bando_AIST_task4_2	Bando_AIST2026	3	14.34	58.40	16.45	62.30
Bando_AIST_task4_1	Bando_AIST2026	4	14.23	58.33	15.75	57.74
Choi_KAIST_task4_4	Choi_KAIST2026	5	12.98	64.88	14.65	66.07
Saijo_Mitsubishi_task4_3	Saijo_Mitsubishi2026	6	12.94	76.92	14.95	79.10
Choi_KAIST_task4_1	Choi_KAIST2026	7	12.88	64.15	15.51	71.10
Saijo_Mitsubishi_task4_2	Saijo_Mitsubishi2026	8	12.77	76.92	14.74	79.10
Saijo_Mitsubishi_task4_1	Saijo_Mitsubishi2026	9	12.77	75.00	14.84	78.11
Saijo_Mitsubishi_task4_4	Saijo_Mitsubishi2026	10	12.54	73.28	14.41	76.06
Choi_KAIST_task4_2	Choi_KAIST2026	11	12.32	59.59	14.80	66.40
Choi_KAIST_task4_3	Choi_KAIST2026	12	12.25	59.13	15.50	71.10
Wang_SRCN_task4_2	Wang_SRCN2026	13	10.13	57.80	11.74	62.23
Wang_SRCN_task4_1	Wang_SRCN2026	14	10.12	57.94	11.73	62.04
Park_SGU_task4_3	Park_SGU2026	15	10.10	53.17	11.42	56.09
Wang_SRCN_task4_3	Wang_SRCN2026	16	10.10	57.94	11.74	62.04
Park_SGU_task4_4	Park_SGU2026	17	10.08	53.64	11.43	54.70
Park_SGU_task4_2	Park_SGU2026	18	9.42	53.17	10.53	56.09
Wang_SRCN_task4_4	Wang_SRCN2026	19	9.20	45.63	11.26	51.98
You_PKU_task4_4	You_PKU2026	20	8.24	52.58	11.85	74.87
You_PKU_task4_1	You_PKU2026	21	8.24	52.58	11.86	74.87
You_PKU_task4_2	You_PKU2026	22	8.20	52.58	11.86	74.87
You_PKU_task4_3	You_PKU2026	23	8.20	52.58	11.86	74.87
Park_SGU_task4_1	Park_SGU2026	24	8.12	53.70	9.05	55.36
Jeong_Medisensing_task4_1	Jeong_Medisensing2026	25	6.97	58.53	8.98	63.09
Jeong_Medisensing_task4_2	Jeong_Medisensing2026	26	6.96	58.20	9.05	63.95
Jeong_Medisensing_task4_3	Jeong_Medisensing2026	27	6.94	58.33	9.06	63.82
Wang_BUPT_task4_2	Wang_BUPT2026	28	6.90	56.61	8.81	61.44
Deng_WHU_task4_1	Deng_WHU2026	29	6.84	58.13	8.62	61.71
Baseline_Task4_1c		30	6.77	56.55	8.17	57.14
Wang_BUPT_task4_1	Wang_BUPT2026	31	6.76	55.56	8.56	60.05
Baseline_Task4_4c		32	6.76	55.16	8.49	60.71
Park_KUBIG_task4_1	Park_KUBIG2026	33	1.18	39.81	1.20	40.28

Supplementary metrics

Detailed analysis of joint scores, separation, and detection performance

All metrics in this table are evaluated on the evaluation set. CAPI-SDRi and CASA-SDRi are joint separation and label prediction scores computed by the official evaluator, while TP-SDRi is a separation-only score computed from matched true-positive source pairs. True Positive (TP), False Positive (FP), and False Negative (FN) are counted over target-source hypotheses after class-aware permutation-invariant matching. Accuracy (mix) is mixture-level label prediction accuracy, while Accuracy (src) is source-level label prediction accuracy.

Submission Information		Joint Separation and Label Prediction Scores		Separation Score	Label Prediction Scores						Counts
Submission Code	Technical Report	CAPI-SDRi	CASA-SDRi	TP-SDRi	Accuracy (mix)	Accuracy (src)	Precision	Recall	F-Score (micro)	F-Score (macro)	TP	FP	FN
Bando_AIST_task4_3	Bando_AIST2026	14.93	14.92	20.15	65.54	73.80	0.91	0.81	0.85	0.85	2242	266	530
Bando_AIST_task4_4	Bando_AIST2026	14.61	14.59	20.90	59.52	69.61	0.91	0.77	0.82	0.81	2116	268	656
Bando_AIST_task4_2	Bando_AIST2026	14.34	14.33	20.91	58.40	67.95	0.90	0.76	0.81	0.80	2093	308	679
Bando_AIST_task4_1	Bando_AIST2026	14.23	14.22	20.87	58.33	67.48	0.92	0.73	0.81	0.79	2017	217	755
Choi_KAIST_task4_4	Choi_KAIST2026	12.98	12.98	17.78	64.88	73.95	0.89	0.82	0.85	0.85	2279	310	493
Saijo_Mitsubishi_task4_3	Saijo_Mitsubishi2026	12.94	12.94	15.96	76.92	82.84	0.93	0.90	0.91	0.91	2486	229	286
Choi_KAIST_task4_1	Choi_KAIST2026	12.88	12.88	17.76	64.15	73.57	0.89	0.82	0.85	0.85	2277	323	495
Saijo_Mitsubishi_task4_2	Saijo_Mitsubishi2026	12.77	12.76	15.72	76.92	82.84	0.93	0.90	0.91	0.91	2486	229	286
Saijo_Mitsubishi_task4_1	Saijo_Mitsubishi2026	12.77	12.76	15.82	75.00	82.31	0.92	0.90	0.90	0.90	2499	264	273
Saijo_Mitsubishi_task4_4	Saijo_Mitsubishi2026	12.54	12.54	15.90	73.28	80.45	0.90	0.90	0.89	0.89	2493	327	279
Choi_KAIST_task4_2	Choi_KAIST2026	12.32	12.31	17.72	59.59	71.64	0.86	0.84	0.83	0.84	2314	458	458
Choi_KAIST_task4_3	Choi_KAIST2026	12.25	12.24	17.69	59.13	71.47	0.85	0.84	0.83	0.83	2315	467	457
Wang_SRCN_task4_2	Wang_SRCN2026	10.13	10.10	14.48	57.80	69.89	0.86	0.80	0.82	0.82	2221	406	551
Wang_SRCN_task4_1	Wang_SRCN2026	10.12	10.09	14.46	57.94	69.92	0.85	0.81	0.82	0.82	2232	420	540
Park_SGU_task4_3	Park_SGU2026	10.10	10.07	14.78	53.17	68.26	0.90	0.75	0.81	0.81	2047	227	725
Wang_SRCN_task4_3	Wang_SRCN2026	10.10	10.05	14.31	57.94	69.92	0.85	0.81	0.82	0.82	2232	420	540
Park_SGU_task4_4	Park_SGU2026	10.08	10.02	14.74	53.64	68.28	0.90	0.75	0.81	0.81	2051	232	721
Park_SGU_task4_2	Park_SGU2026	9.42	9.38	13.75	53.17	68.26	0.90	0.75	0.81	0.81	2047	227	725
Wang_SRCN_task4_4	Wang_SRCN2026	9.20	9.18	14.49	45.63	62.11	0.75	0.81	0.77	0.77	2226	812	546
You_PKU_task4_4	You_PKU2026	8.24	8.16	12.08	52.58	68.08	0.83	0.82	0.81	0.81	2244	524	528
You_PKU_task4_1	You_PKU2026	8.24	8.16	12.01	52.58	68.08	0.83	0.82	0.81	0.81	2244	524	528
You_PKU_task4_2	You_PKU2026	8.20	8.13	11.97	52.58	68.08	0.83	0.82	0.81	0.81	2244	524	528
You_PKU_task4_3	You_PKU2026	8.20	8.13	11.96	52.58	68.08	0.83	0.82	0.81	0.81	2244	524	528
Park_SGU_task4_1	Park_SGU2026	8.12	8.06	11.82	53.70	68.68	0.91	0.75	0.81	0.81	2057	223	715
Jeong_Medisensing_task4_1	Jeong_Medisensing2026	6.97	6.90	9.78	58.53	70.00	0.86	0.80	0.82	0.82	2196	365	576
Jeong_Medisensing_task4_2	Jeong_Medisensing2026	6.96	6.89	9.79	58.20	69.87	0.86	0.80	0.82	0.82	2201	378	571
Jeong_Medisensing_task4_3	Jeong_Medisensing2026	6.94	6.87	9.76	58.33	70.07	0.86	0.80	0.82	0.82	2201	369	571
Wang_BUPT_task4_2	Wang_BUPT2026	6.90	6.84	9.92	56.61	68.95	0.84	0.81	0.82	0.82	2221	449	551
Deng_WHU_task4_1	Deng_WHU2026	6.84	6.80	9.70	58.13	69.58	0.85	0.80	0.82	0.82	2210	404	562
Baseline_Task4_1c		6.77	6.72	9.74	56.55	68.00	0.83	0.80	0.81	0.81	2193	453	579
Wang_BUPT_task4_1	Wang_BUPT2026	6.76	6.72	9.97	55.56	66.57	0.84	0.78	0.80	0.80	2147	453	625
Baseline_Task4_4c		6.76	6.71	9.81	55.16	66.66	0.83	0.79	0.80	0.80	2175	491	597
Park_KUBIG_task4_1	Park_KUBIG2026	1.18	0.98	2.84	39.81	56.68	0.70	0.77	0.72	0.73	2133	991	639

Detailed analysis focused on signal quality

The table shows the quality of the separated speech and non-speech sources in the evaluation dataset. Specifically, the Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI) are calculated on the TP speech predictions, while the Perceptual Evaluation of Audio Quality (PEAQ) is calculated on the TP non-speech predictions.

Submission Information			PESQ				STOI				PEAQ
Submission Code	Technical Report	CAPI-SDRi	PESQ mean	PESQ std	PESQ min	PESQ max	STOI mean	STOI std	STOI min	STOI max	PEAQ mean	PEAQ std	PEAQ min	PEAQ max
Bando_AIST_task4_3	Bando_AIST2026	14.93	3.26	0.60	1.72	4.38	0.94	0.05	0.62	1.00	-2.37	0.85	-3.91	-0.12
Bando_AIST_task4_4	Bando_AIST2026	14.61	3.29	0.70	1.07	4.40	0.92	0.19	-0.05	1.00	-2.27	0.86	-3.91	-0.07
Bando_AIST_task4_2	Bando_AIST2026	14.34	3.18	0.68	1.37	4.39	0.95	0.05	0.64	1.00	-2.35	0.87	-3.91	-0.09
Bando_AIST_task4_1	Bando_AIST2026	14.23	3.23	0.68	1.30	4.37	0.95	0.04	0.76	1.00	-2.33	0.87	-3.91	-0.09
Choi_KAIST_task4_4	Choi_KAIST2026	12.98	2.98	0.57	1.65	4.18	0.93	0.05	0.77	0.99	-2.93	0.78	-3.91	-0.33
Saijo_Mitsubishi_task4_3	Saijo_Mitsubishi2026	12.94	3.07	0.51	1.85	4.23	0.93	0.04	0.76	0.99	-3.01	0.71	-3.91	-0.51
Choi_KAIST_task4_1	Choi_KAIST2026	12.88	3.01	0.54	1.67	4.17	0.94	0.04	0.77	0.99	-2.93	0.78	-3.91	-0.34
Saijo_Mitsubishi_task4_2	Saijo_Mitsubishi2026	12.77	2.92	0.58	1.63	4.23	0.92	0.06	0.72	0.99	-3.03	0.71	-3.91	-0.40
Saijo_Mitsubishi_task4_1	Saijo_Mitsubishi2026	12.77	2.88	0.56	1.55	4.14	0.92	0.05	0.71	0.99	-3.02	0.71	-3.91	-0.47
Saijo_Mitsubishi_task4_4	Saijo_Mitsubishi2026	12.54	3.06	0.52	1.85	4.23	0.93	0.05	0.62	0.99	-3.01	0.72	-3.91	-0.51
Choi_KAIST_task4_2	Choi_KAIST2026	12.32	2.97	0.58	1.43	4.18	0.93	0.05	0.66	0.99	-2.94	0.78	-3.91	-0.33
Choi_KAIST_task4_3	Choi_KAIST2026	12.25	2.97	0.57	1.67	4.18	0.93	0.05	0.77	0.99	-2.94	0.78	-3.91	-0.33
Wang_SRCN_task4_2	Wang_SRCN2026	10.13	2.70	0.56	1.47	4.07	0.91	0.05	0.68	0.99	-3.07	0.71	-3.91	-0.54
Wang_SRCN_task4_1	Wang_SRCN2026	10.12	2.70	0.56	1.47	4.07	0.91	0.05	0.68	0.99	-3.07	0.71	-3.91	-0.54
Park_SGU_task4_3	Park_SGU2026	10.10	2.57	0.70	1.08	4.07	0.86	0.18	0.13	0.99	-3.08	0.83	-3.91	-0.31
Wang_SRCN_task4_3	Wang_SRCN2026	10.10	2.42	0.66	1.06	4.00	0.88	0.12	-0.03	0.99	-3.40	0.68	-3.91	-0.46
Park_SGU_task4_4	Park_SGU2026	10.08	2.57	0.70	1.07	4.08	0.86	0.17	0.08	0.99	-3.08	0.82	-3.91	-0.36
Park_SGU_task4_2	Park_SGU2026	9.42	2.53	0.70	1.08	4.03	0.86	0.19	0.14	0.99	-3.15	0.80	-3.91	-0.31
Wang_SRCN_task4_4	Wang_SRCN2026	9.20	2.59	0.60	1.18	4.05	0.90	0.06	0.66	0.99	-3.14	0.71	-3.91	-0.54
You_PKU_task4_4	You_PKU2026	8.24	2.47	0.77	1.15	4.16	0.86	0.14	0.33	0.99	-3.27	0.73	-3.91	-0.50
You_PKU_task4_1	You_PKU2026	8.24	2.37	0.84	1.09	4.16	0.81	0.19	0.29	0.99	-3.28	0.72	-3.91	-0.50
You_PKU_task4_2	You_PKU2026	8.20	2.47	0.75	1.15	4.16	0.87	0.12	0.40	0.99	-3.28	0.72	-3.91	-0.50
You_PKU_task4_3	You_PKU2026	8.20	2.47	0.75	1.15	4.16	0.87	0.12	0.40	0.99	-3.28	0.72	-3.91	-0.50
Park_SGU_task4_1	Park_SGU2026	8.12	2.29	0.74	1.08	3.92	0.82	0.21	0.11	0.99	-3.36	0.72	-3.91	-0.39
Jeong_Medisensing_task4_1	Jeong_Medisensing2026	6.97	1.89	0.66	1.09	3.91	0.75	0.18	0.30	0.99	-3.54	0.61	-3.91	-0.60
Jeong_Medisensing_task4_2	Jeong_Medisensing2026	6.96	1.89	0.66	1.09	3.91	0.75	0.18	0.30	0.99	-3.54	0.61	-3.91	-0.60
Jeong_Medisensing_task4_3	Jeong_Medisensing2026	6.94	1.90	0.66	1.08	3.93	0.75	0.18	0.31	0.99	-3.53	0.61	-3.91	-0.47
Wang_BUPT_task4_2	Wang_BUPT2026	6.90	1.90	0.66	1.09	3.91	0.75	0.18	0.30	0.99	-3.54	0.61	-3.91	-0.60
Deng_WHU_task4_1	Deng_WHU2026	6.84	1.79	0.60	1.09	4.00	0.74	0.18	0.27	0.98	-3.53	0.61	-3.91	-0.63
Baseline_Task4_1c		6.77	1.88	0.65	1.11	3.91	0.75	0.18	0.30	0.99	-3.54	0.60	-3.91	-0.61
Wang_BUPT_task4_1	Wang_BUPT2026	6.76	1.89	0.65	1.09	3.91	0.75	0.18	0.30	0.99	-3.53	0.61	-3.91	-0.60
Baseline_Task4_4c		6.76	1.90	0.65	1.11	3.93	0.76	0.18	0.29	0.99	-3.53	0.61	-3.91	-0.57
Park_KUBIG_task4_1	Park_KUBIG2026	1.18	1.91	0.55	1.11	3.40	0.84	0.09	0.57	0.98	-2.73	0.60	-3.91	-1.21

System performance under partially known conditions

Table shows the separation and mixture-level label prediction performance of each system under partially known conditions. The Known IR condition uses evaluation mixtures synthesized with room impulse responses included in the training data. The Known Target condition uses evaluation mixtures synthesized with target sound event samples included in the training data. The Known Noise condition uses evaluation mixtures synthesized with background noise included in the training data. The Known Interference condition uses evaluation mixtures synthesized with interference sound samples included in the training data.

Submission Information		Evaluation Set		Known IR Condition		Known Target Condition		Known Noise Condition		Known Interference Condition
Submission Code	Technical Report	CAPI-SDRi	Accuracy (mix)	Known IR CAPI-SDRi	Known IR Accuracy (mix)	Known Target CAPI-SDRi	Known Target Accuracy (mix)	Known Noise CAPI-SDRi	Known Noise Accuracy (mix)	Known Interference CAPI-SDRi	Known Interference Accuracy (mix)
Bando_AIST_task4_3	Bando_AIST2026	14.93	65.54	16.21	69.44	15.73	70.37	15.39	67.59	15.62	70.37
Bando_AIST_task4_4	Bando_AIST2026	14.61	59.52	15.62	60.32	15.13	62.04	15.08	61.11	16.14	68.98
Bando_AIST_task4_2	Bando_AIST2026	14.34	58.40	15.62	61.90	15.70	65.74	15.55	65.28	15.91	68.52
Bando_AIST_task4_1	Bando_AIST2026	14.23	58.33	16.12	64.68	15.30	62.50	15.27	61.11	15.45	65.28
Choi_KAIST_task4_4	Choi_KAIST2026	12.98	64.88	13.97	63.89	13.82	68.98	13.63	68.06	13.71	70.83
Saijo_Mitsubishi_task4_3	Saijo_Mitsubishi2026	12.94	76.92	13.69	77.38	14.52	86.11	13.70	80.09	14.10	85.65
Choi_KAIST_task4_1	Choi_KAIST2026	12.88	64.15	13.75	62.30	13.53	67.59	13.80	68.98	14.13	72.69
Saijo_Mitsubishi_task4_2	Saijo_Mitsubishi2026	12.77	76.92	13.55	77.38	14.28	86.11	13.56	80.09	13.90	85.65
Saijo_Mitsubishi_task4_1	Saijo_Mitsubishi2026	12.77	75.00	13.51	76.19	14.29	83.80	13.50	77.78	13.90	85.19
Saijo_Mitsubishi_task4_4	Saijo_Mitsubishi2026	12.54	73.28	13.19	73.41	13.67	79.17	13.06	73.61	13.62	81.48
Choi_KAIST_task4_2	Choi_KAIST2026	12.32	59.59	13.51	60.71	12.77	62.96	12.96	62.04	13.75	71.76
Choi_KAIST_task4_3	Choi_KAIST2026	12.25	59.13	13.68	61.11	13.02	64.81	12.84	61.57	13.73	72.22
Wang_SRCN_task4_2	Wang_SRCN2026	10.13	57.80	10.63	56.35	10.91	65.28	10.48	61.11	11.03	65.74
Wang_SRCN_task4_1	Wang_SRCN2026	10.12	57.94	10.51	55.95	10.93	65.74	10.40	60.19	10.93	64.81
Park_SGU_task4_3	Park_SGU2026	10.10	53.17	10.86	53.97	11.00	68.06	10.59	54.17	11.27	62.04
Wang_SRCN_task4_3	Wang_SRCN2026	10.10	57.94	10.49	55.95	10.99	65.74	10.33	60.19	11.18	64.81
Park_SGU_task4_4	Park_SGU2026	10.08	53.64	4.10	29.37	11.22	66.20	10.45	53.24	10.92	60.19
Park_SGU_task4_2	Park_SGU2026	9.42	53.17	9.94	53.97	10.26	68.06	9.97	54.17	10.48	62.04
Wang_SRCN_task4_4	Wang_SRCN2026	9.20	45.63	9.29	43.25	10.79	52.78	10.06	49.54	9.79	47.69
You_PKU_task4_4	You_PKU2026	8.24	52.58	8.78	51.59	8.97	50.93	8.39	53.24	8.70	57.41
You_PKU_task4_1	You_PKU2026	8.24	52.58	8.91	51.59	8.96	50.93	8.35	53.24	8.65	57.41
You_PKU_task4_2	You_PKU2026	8.20	52.58	8.94	51.59	8.82	50.93	8.38	53.24	8.60	57.41
You_PKU_task4_3	You_PKU2026	8.20	52.58	8.94	51.59	8.81	50.93	8.38	53.24	8.60	57.41
Park_SGU_task4_1	Park_SGU2026	8.12	53.70	8.76	54.76	8.90	68.52	8.28	54.17	9.03	65.74
Jeong_Medisensing_task4_1	Jeong_Medisensing2026	6.97	58.53	6.94	56.75	9.23	72.22	7.31	57.87	7.79	64.35
Jeong_Medisensing_task4_2	Jeong_Medisensing2026	6.96	58.20	6.93	57.54	9.24	72.22	7.32	59.26	7.81	63.89
Jeong_Medisensing_task4_3	Jeong_Medisensing2026	6.94	58.33	7.00	57.14	9.20	72.22	7.35	59.26	7.79	63.89
Wang_BUPT_task4_2	Wang_BUPT2026	6.90	56.61	6.94	53.17	9.26	71.30	7.37	59.72	7.51	63.89
Deng_WHU_task4_1	Deng_WHU2026	6.84	58.13	6.68	58.33	8.95	70.83	7.33	59.26	7.34	63.89
Baseline_Task4_1c		6.77	56.55	6.76	52.38	8.98	70.37	7.10	56.02	7.43	62.04
Wang_BUPT_task4_1	Wang_BUPT2026	6.76	55.56	6.63	52.78	9.22	70.83	7.40	59.72	7.84	65.74
Baseline_Task4_4c		6.76	55.16	6.65	55.16	9.01	70.37	6.79	52.78	7.81	65.74
Park_KUBIG_task4_1	Park_KUBIG2026	1.18	39.81	0.67	39.29	1.08	37.50	1.37	45.83	1.23	42.59

System performance by target-source overlap condition

This table shows performance by target-source overlap condition. Each condition is denoted as (N, M), where N is the number of active target sound sources in the mixture and M is the number of active target sound sources involved in same-class overlap. M=0 means that no same-class target overlap is present. (0,0) denotes mixtures with no target sound source, for which CAPI-SDRi is not defined; the (0,0) column therefore reports the number of false positives (FP) in zero-target mixtures.

Submission Information		Evaluation Set		Target-Source Overlap Conditions
Submission Code	Technical Report	CAPI-SDRi	Accuracy (mix)	FP (0,0)	CAPI-SDRi (1,0)	CAPI-SDRi (2,0)	CAPI-SDRi (2,2)	CAPI-SDRi (3,0)	CAPI-SDRi (3,2)	CAPI-SDRi (3,3)
Bando_AIST_task4_3	Bando_AIST2026	14.93	65.54	33	12.26	15.23	16.86	15.51	16.37	17.08
Bando_AIST_task4_4	Bando_AIST2026	14.61	59.52	32	12.01	14.37	16.77	15.02	15.88	17.25
Bando_AIST_task4_2	Bando_AIST2026	14.34	58.40	43	11.73	14.64	16.52	14.87	16.01	16.58
Bando_AIST_task4_1	Bando_AIST2026	14.23	58.33	26	11.96	13.84	16.80	13.81	15.56	16.61
Choi_KAIST_task4_4	Choi_KAIST2026	12.98	64.88	51	10.66	13.02	14.00	15.34	14.89	14.09
Saijo_Mitsubishi_task4_3	Saijo_Mitsubishi2026	12.94	76.92	30	9.71	13.27	13.51	15.21	14.87	14.14
Choi_KAIST_task4_1	Choi_KAIST2026	12.88	64.15	57	10.81	12.91	13.72	15.39	14.89	14.02
Saijo_Mitsubishi_task4_2	Saijo_Mitsubishi2026	12.77	76.92	30	9.67	13.13	13.36	14.99	14.55	13.82
Saijo_Mitsubishi_task4_1	Saijo_Mitsubishi2026	12.77	75.00	30	9.65	13.15	13.50	14.91	14.66	13.53
Saijo_Mitsubishi_task4_4	Saijo_Mitsubishi2026	12.54	73.28	47	9.36	12.99	13.17	15.13	14.81	13.84
Choi_KAIST_task4_2	Choi_KAIST2026	12.32	59.59	115	10.34	12.56	13.77	15.53	15.07	14.20
Choi_KAIST_task4_3	Choi_KAIST2026	12.25	59.13	119	10.36	12.69	13.67	15.41	15.08	14.13
Wang_SRCN_task4_2	Wang_SRCN2026	10.13	57.80	55	8.28	10.80	11.10	11.29	11.08	11.21
Wang_SRCN_task4_1	Wang_SRCN2026	10.12	57.94	58	8.19	10.90	11.06	11.29	11.11	11.34
Park_SGU_task4_3	Park_SGU2026	10.10	53.17	24	9.33	11.72	8.46	13.20	10.00	7.57
Wang_SRCN_task4_3	Wang_SRCN2026	10.10	57.94	58	8.31	10.87	11.27	10.94	10.94	11.42
Park_SGU_task4_4	Park_SGU2026	10.08	53.64	25	9.05	12.15	8.50	12.99	9.64	7.79
Park_SGU_task4_2	Park_SGU2026	9.42	53.17	24	8.98	11.44	7.07	12.81	9.22	6.23
Wang_SRCN_task4_4	Wang_SRCN2026	9.20	45.63	131	6.71	9.36	10.29	11.04	11.38	11.35
You_PKU_task4_4	You_PKU2026	8.24	52.58	61	6.77	10.35	6.29	12.38	8.78	5.71
You_PKU_task4_1	You_PKU2026	8.24	52.58	61	7.00	10.40	6.12	12.35	8.62	5.67
You_PKU_task4_2	You_PKU2026	8.20	52.58	61	6.91	10.33	6.09	12.33	8.64	5.71
You_PKU_task4_3	You_PKU2026	8.20	52.58	61	6.91	10.33	6.09	12.32	8.65	5.70
Park_SGU_task4_1	Park_SGU2026	8.12	53.70	27	7.97	10.71	4.73	11.54	8.03	5.05
Jeong_Medisensing_task4_1	Jeong_Medisensing2026	6.97	58.53	15	6.37	8.70	4.19	10.11	7.09	4.64
Jeong_Medisensing_task4_2	Jeong_Medisensing2026	6.96	58.20	18	6.44	8.71	4.21	10.15	7.08	4.51
Jeong_Medisensing_task4_3	Jeong_Medisensing2026	6.94	58.33	18	6.32	8.75	4.19	10.20	7.04	4.41
Wang_BUPT_task4_2	Wang_BUPT2026	6.90	56.61	48	6.30	8.90	4.46	10.10	7.10	5.02
Deng_WHU_task4_1	Deng_WHU2026	6.84	58.13	17	5.96	8.66	3.91	10.40	6.85	4.66
Baseline_Task4_1c		6.77	56.55	21	6.13	8.28	4.21	9.94	7.10	4.54
Wang_BUPT_task4_1	Wang_BUPT2026	6.76	55.56	34	6.23	8.66	4.07	10.12	6.80	4.40
Baseline_Task4_4c		6.76	55.16	20	6.27	8.59	3.92	9.96	6.91	4.26
Park_KUBIG_task4_1	Park_KUBIG2026	1.18	39.81	68	-2.81	0.90	1.77	2.59	3.04	4.38

System characteristics

General characteristics

Submission Code	Technical Report	CAPI-SDRi	Label Prediction Accuracy (mix)	Input Sampling Rate	Input Acoustic Features
Bando_AIST_task4_3	Bando_AIST2026	14.93	65.54	32kHz	spectrogram
Bando_AIST_task4_4	Bando_AIST2026	14.61	59.52	32kHz	spectrogram
Bando_AIST_task4_2	Bando_AIST2026	14.34	58.40	32kHz	spectrogram
Bando_AIST_task4_1	Bando_AIST2026	14.23	58.33	32kHz	spectrogram
Choi_KAIST_task4_4	Choi_KAIST2026	12.98	64.88	32kHz	waveform, spectrogram
Saijo_Mitsubishi_task4_3	Saijo_Mitsubishi2026	12.94	76.92	32kHz for separation, 16kHz for three tagging models, 48 kHz for two tagging models	spectrogram for separation, Kaldi fbank features for two tagging models, log-Mel spectrogram for one tagging model, and DAC-VAE features for two tagging models
Choi_KAIST_task4_1	Choi_KAIST2026	12.88	64.15	32kHz	waveform, spectrogram
Saijo_Mitsubishi_task4_2	Saijo_Mitsubishi2026	12.77	76.92	32kHz for separation, 16kHz for three tagging models, 48 kHz for two tagging models	spectrogram for separation, Kaldi fbank features for two tagging models, log-Mel spectrogram for one tagging model, and DAC-VAE features for two tagging models
Saijo_Mitsubishi_task4_1	Saijo_Mitsubishi2026	12.77	75.00	32kHz for separation, 16kHz for three tagging models, 48 kHz for two tagging models	spectrogram for separation, Kaldi fbank features for two tagging models, log-Mel spectrogram for one tagging model, and DAC-VAE features for two tagging models
Saijo_Mitsubishi_task4_4	Saijo_Mitsubishi2026	12.54	73.28	32kHz for separation, 16kHz for three tagging models, 48 kHz for two tagging models	spectrogram for separation, Kaldi fbank features for two tagging models
Choi_KAIST_task4_2	Choi_KAIST2026	12.32	59.59	32kHz	waveform, spectrogram
Choi_KAIST_task4_3	Choi_KAIST2026	12.25	59.13	32kHz	waveform, spectrogram
Wang_SRCN_task4_2	Wang_SRCN2026	10.13	57.80	32kHz	waveform, spectrogram
Wang_SRCN_task4_1	Wang_SRCN2026	10.12	57.94	32kHz	waveform, spectrogram
Park_SGU_task4_3	Park_SGU2026	10.10	53.17	32kHz	spectrogram
Wang_SRCN_task4_3	Wang_SRCN2026	10.10	57.94	32kHz	waveform, spectrogram
Park_SGU_task4_4	Park_SGU2026	10.08	53.64	32kHz	spectrogram
Park_SGU_task4_2	Park_SGU2026	9.42	53.17	32kHz	spectrogram
Wang_SRCN_task4_4	Wang_SRCN2026	9.20	45.63	32kHz	waveform, spectrogram
You_PKU_task4_4	You_PKU2026	8.24	52.58	32kHz	FOA waveform, log-mel spectrogram, M2D audio-tagging embeddings, TUSS all-label candidate sources
You_PKU_task4_1	You_PKU2026	8.24	52.58	32kHz	FOA waveform, log-mel spectrogram, M2D audio-tagging embeddings
You_PKU_task4_2	You_PKU2026	8.20	52.58	32kHz	FOA waveform, log-mel spectrogram, M2D audio-tagging embeddings, TUSS query-conditioned separated sources
You_PKU_task4_3	You_PKU2026	8.20	52.58	32kHz	FOA waveform, log-mel spectrogram, M2D audio-tagging embeddings, TUSS query-conditioned separated sources
Park_SGU_task4_1	Park_SGU2026	8.12	53.70	32kHz	spectrogram
Jeong_Medisensing_task4_1	Jeong_Medisensing2026	6.97	58.53	32kHz	waveform, log-mel spectrogram, STFT spectrogram
Jeong_Medisensing_task4_2	Jeong_Medisensing2026	6.96	58.20	32kHz	waveform, log-mel spectrogram, STFT spectrogram
Jeong_Medisensing_task4_3	Jeong_Medisensing2026	6.94	58.33	32kHz	waveform, log-mel spectrogram, STFT spectrogram
Wang_BUPT_task4_2	Wang_BUPT2026	6.90	56.61	32kHz	four-channel FOA waveform, channel-wise log-mel spectrogram
Deng_WHU_task4_1	Deng_WHU2026	6.84	58.13	32kHz	waveform, spectrogram
Baseline_Task4_1c		6.77	56.55	32kHz	waveform, spectrogram
Wang_BUPT_task4_1	Wang_BUPT2026	6.76	55.56	32kHz	waveform, log-mel spectrogram
Baseline_Task4_4c		6.76	55.16	32kHz	waveform, spectrogram
Park_KUBIG_task4_1	Park_KUBIG2026	1.18	39.81	32kHz	waveform

Machine learning characteristics

Submission Code	Technical Report	CAPI-SDRi	Label Prediction Accuracy (mix)	Machine Learning Method	Loss Function	Training Dataset	Data Augmentation	Pretrained Models
Bando_AIST_task4_3	Bando_AIST2026	14.93	65.54	TF-Locoformer-based separation model	BCE, SNR	DCASE2026Task4Dataset		ATST-Frame
Bando_AIST_task4_4	Bando_AIST2026	14.61	59.52	TF-Locoformer-based separation model	BCE, SNR	DCASE2026Task4Dataset		ATST-Frame
Bando_AIST_task4_2	Bando_AIST2026	14.34	58.40	TF-Locoformer-based separation model	BCE, SNR	DCASE2026Task4Dataset
Bando_AIST_task4_1	Bando_AIST2026	14.23	58.33	TF-Locoformer-based separation model	BCE, SNR	DCASE2026Task4Dataset
Choi_KAIST_task4_4	Choi_KAIST2026	12.98	64.88	TTransformer-Mamba-based separation/extraction model, CRNN-based audio classification model	SA-SDR loss, ArcFace loss, KL-divergence loss, BCE loss	DCASE2026Task4Dataset; AudioSet-2M-VacuumCleaner	difficulty-based mixup	M2D; Audio-Flamingo 3
Saijo_Mitsubishi_task4_3	Saijo_Mitsubishi2026	12.94	76.92	BEATs-based, M2D-based, AIST-based, PE-A-Frame-small-based, and PE-A-Frame-base-based tagging models, TF-Locoformer-based separation model	SNR (separation), CE (counting and classification)	DCASE2026Task4Dataset; AudioSet; SINS database; NIGENS; STARSS23	MixUp, frequency warping, and filter augmentation for three 16-kHz tagging models	BEATs (BEATs_strong_1.pt); M2D (M2D_strong_1.pt); AIST-Frame (ATST-F_strong_1.pt); PE-A-Frame-small; PE-A-Frame-base
Choi_KAIST_task4_1	Choi_KAIST2026	12.88	64.15	Transformer-Mamba-based separation/extraction model, CRNN-based audio classification model	SA-SDR loss, ArcFace loss, KL-divergence loss, BCE loss	DCASE2026Task4Dataset; AudioSet-2M-VacuumCleaner	difficulty-based mixup	M2D; Audio-Flamingo 3
Saijo_Mitsubishi_task4_2	Saijo_Mitsubishi2026	12.77	76.92	BEATs-based, M2D-based, AIST-based, PE-A-Frame-small-based, and PE-A-Frame-base-based tagging models, TF-Locoformer-based separation model	SNR (separation), CE (counting and classification)	DCASE2026Task4Dataset; AudioSet; SINS database; NIGENS; STARSS23	MixUp, frequency warping, and filter augmentation for three 16-kHz tagging models	BEATs (BEATs_strong_1.pt); M2D (M2D_strong_1.pt); AIST-Frame (ATST-F_strong_1.pt); PE-A-Frame-small; PE-A-Frame-base
Saijo_Mitsubishi_task4_1	Saijo_Mitsubishi2026	12.77	75.00	BEATs-based, M2D-based, AIST-based, PE-A-Frame-small-based, and PE-A-Frame-base-based tagging models, TF-Locoformer-based separation model	SNR (separation), CE (counting and classification)	DCASE2026Task4Dataset; AudioSet; SINS database; NIGENS; STARSS23	MixUp, frequency warping, and filter augmentation for three 16-kHz tagging models	BEATs (BEATs_strong_1.pt); M2D (M2D_strong_1.pt); AIST-Frame (ATST-F_strong_1.pt); PE-A-Frame-small; PE-A-Frame-base
Saijo_Mitsubishi_task4_4	Saijo_Mitsubishi2026	12.54	73.28	BEATs-based tagging model, TF-Locoformer-based separation model	SNR (separation), CE (counting and classification)	DCASE2026Task4Dataset; AudioSet; SINS database; NIGENS; STARSS23	MixUp, frequency warping, and filter augmentation for three 16-kHz tagging models	BEATs (BEATs_strong_1.pt)
Choi_KAIST_task4_2	Choi_KAIST2026	12.32	59.59	Transformer-Mamba-based separation/extraction model, CRNN-based audio classification model	SA-SDR loss, ArcFace loss, KL-divergence loss, BCE loss	DCASE2026Task4Dataset; AudioSet-2M-VacuumCleaner	difficulty-based mixup	M2D; Audio-Flamingo 3
Choi_KAIST_task4_3	Choi_KAIST2026	12.25	59.13	Transformer-Mamba-based separation/extraction model, CRNN-based audio classification model	SA-SDR loss, ArcFace loss, KL-divergence loss, BCE loss	DCASE2026Task4Dataset; AudioSet-2M-VacuumCleaner	difficulty-based mixup	M2D; Audio-Flamingo 3
Wang_SRCN_task4_2	Wang_SRCN2026	10.13	57.80	Transformer-based separation model, PretrainedSED-based audio tagging model	BCE, SA-SDR loss, KL-divergence	DCASE2026Task4Dataset; AudioSet	Frame Shift, SpecAugmentation	PretrainedSED
Wang_SRCN_task4_1	Wang_SRCN2026	10.12	57.94	Transformer-based separation model, PretrainedSED-based audio tagging model	BCE, SA-SDR loss, KL-divergence	DCASE2026Task4Dataset; AudioSet	Frame Shift, SpecAugmentation	PretrainedSED
Park_SGU_task4_3	Park_SGU2026	10.10	53.17	SRCorrNet-based separation model, M2D-fPaSST based audio tagging model	cross entropy, PIT-SNR	DCASE2026Task4Dataset	spec augmentation, angle rotation, random gain filter	M2D; fPaSST
Wang_SRCN_task4_3	Wang_SRCN2026	10.10	57.94	Transformer-based separation model, PretrainedSED-based audio tagging model	BCE, SA-SDR loss, KL-divergence	DCASE2026Task4Dataset; AudioSet	Frame Shift, SpecAugmentation	PretrainedSED
Park_SGU_task4_4	Park_SGU2026	10.08	53.64	SRCorrNet-based separation model, M2D-fPaSST based audio tagging model	cross entropy, PIT-SNR	DCASE2026Task4Dataset	spec augmentation, angle rotation, random gain filter	M2D; fPaSST
Park_SGU_task4_2	Park_SGU2026	9.42	53.17	SRCorrNet-based separation model, M2D-fPaSST based audio tagging model	cross entropy, PIT-SNR	DCASE2026Task4Dataset	spec augmentation, angle rotation, random gain filter	M2D; fPaSST
Wang_SRCN_task4_4	Wang_SRCN2026	9.20	45.63	Transformer-based separation model, PretrainedSED-based audio tagging model	BCE, SA-SDR loss, KL-divergence	DCASE2026Task4Dataset; AudioSet	Frame Shift, SpecAugmentation	PretrainedSED
You_PKU_task4_4	You_PKU2026	8.24	52.58	global candidate selector over raw TUSS source hypotheses and baseline separation outputs	binary cross entropy for audio tagging, SDR-style separation objectives for source separation	DCASE2026 Task 4 development dataset	all-label TUSS candidate pool with ExtraTrees threshold selection
You_PKU_task4_1	You_PKU2026	8.24	52.58	ResUNet/ResUNetK source separation with M2D audio tagging and duplicate-label-aware meta-selection	binary cross entropy for audio tagging, SDR-style separation objectives for source separation	DCASE2026 Task 4 development dataset	class-balanced candidate generation, source-level overlay selection, duplicate-label-aware filtering
You_PKU_task4_2	You_PKU2026	8.20	52.58	duplicate-label weighted selector with label-specific TUSS overlay grafting	binary cross entropy for audio tagging, SDR-style separation objectives for source separation	DCASE2026 Task 4 development dataset	source-level replacement using labels with positive development-set overlay evidence
You_PKU_task4_3	You_PKU2026	8.20	52.58	duplicate-label weighted selector with label-specific TUSS overlay grafting and small all-label Blender-line refinement	binary cross entropy for audio tagging, SDR-style separation objectives for source separation	DCASE2026 Task 4 development dataset	source-level replacement using labels with positive development-set overlay evidence, plus a small Blender-focused overlay
Park_SGU_task4_1	Park_SGU2026	8.12	53.70	SRCorrNet-based separation model, M2D-fPaSST based audio tagging model	cross entropy, PIT-SNR	DCASE2026Task4Dataset	spec augmentation, angle rotation, random gain filter	M2D; fPaSST
Jeong_Medisensing_task4_1	Jeong_Medisensing2026	6.97	58.53	ResUNet-based label-queried separation; dual single/4-channel M2D audio tagging with weighted-confidence fusion; verify-and-refine stem re-tagging; energy/probability silence gating	BCE (audio tagging), CAPI-SDR (separation)	DCASE2026Task4Dataset; FSD50K; EARS; Semantic Hearing (BinauralCuratedDataset)	spatial soundscape synthesis, zero-target oversampling (40%), duplicate-source-event (DUPSE) curriculum, interference mixing	M2D; ResUNetK (DCASE2026 Task 4 baseline)
Jeong_Medisensing_task4_2	Jeong_Medisensing2026	6.96	58.20	As submission 1 plus per-class verification/drop/probability-gate thresholds tuned offline against exact CAPI-SDRi semantics	BCE (audio tagging), CAPI-SDR (separation)	DCASE2026Task4Dataset; FSD50K; EARS; Semantic Hearing (BinauralCuratedDataset)	spatial soundscape synthesis, zero-target oversampling (40%), duplicate-source-event (DUPSE) curriculum, interference mixing	M2D; ResUNetK (DCASE2026 Task 4 baseline)
Jeong_Medisensing_task4_3	Jeong_Medisensing2026	6.94	58.33	As submission 2 plus dual-separator stem selection: per stem keep whichever of two ResUNet separators its single-channel re-tagging verifies more strongly	BCE (audio tagging), CAPI-SDR (separation)	DCASE2026Task4Dataset; FSD50K; EARS; Semantic Hearing (BinauralCuratedDataset)	spatial soundscape synthesis, zero-target oversampling (40%), duplicate-source-event (DUPSE) curriculum, interference mixing	M2D; ResUNetK (DCASE2026 Task 4 baseline)
Wang_BUPT_task4_2	Wang_BUPT2026	6.90	56.61	four-channel M2D audio tagging with Qwen2-Audio semantic distillation, followed by baseline ResUNetK label-queried separation	permutation-invariant classification loss, linear CKA loss, cosine loss, CAPI-SDR loss	DCASE2026Task4Dataset	on-the-fly spatial sound-scene synthesis	M2D; Qwen2-Audio-7B-Instruct; DCASE2026 baseline ResUNetK
Deng_WHU_task4_1	Deng_WHU2026	6.84	58.13	Label-guided ensemble with multi-model audio tagging fusion and fine-tuned ResUNetK or TF-GridNet separation result selection or fusion	BCE, CAPI-SDR	DCASE2026Task4Dataset	Random mixture synthesis with same-class duplication, random SNR/angle sampling, background/interference mixing, and label shuffling	M2D; CLAP
Baseline_Task4_1c		6.77	56.55	ResUNet-based separation model, M2D-based audio tagging model	BCE, CAPI-SDR	DCASE2026Task4Dataset		M2D
Wang_BUPT_task4_1	Wang_BUPT2026	6.76	55.56	single-channel M2D audio tagging with Qwen2-Audio semantic distillation, followed by baseline ResUNetK label-queried separation	permutation-invariant classification loss, linear CKA loss, cosine loss, CAPI-SDR loss	DCASE2026Task4Dataset	on-the-fly spatial sound-scene synthesis	M2D; Qwen2-Audio-7B-Instruct; DCASE2026 baseline ResUNetK
Baseline_Task4_4c		6.76	55.16	ResUNet-based separation model, M2D-based audio tagging model	BCE, CAPI-SDR	DCASE2026Task4Dataset		M2D
Park_KUBIG_task4_1	Park_KUBIG2026	1.18	39.81	Joint separation-classification-DoA model (SpatialSeparatorModel). BEATs encoder (frozen) for feature extraction, followed by mask-based waveform separation and class prediction heads. Trained end-to-end with joint PIT loss (SI-SDR + CE + DoA).	SI-SDR, Cross-Entropy, DoA regression (joint PIT)	DCASE2026Task4Dataset; FSD50K; EARS	on-the-fly spatial audio synthesis using SpAudSyn	BEATs

Complexity

Submission Code	Technical Report	CAPI-SDRi	Label Prediction Accuracy (mix)	Ensemble subsystems	Number of Parameters
Bando_AIST_task4_3	Bando_AIST2026	14.93	65.54	1	100917678
Bando_AIST_task4_4	Bando_AIST2026	14.61	59.52	1	108401934
Bando_AIST_task4_2	Bando_AIST2026	14.34	58.40	1	22569614
Bando_AIST_task4_1	Bando_AIST2026	14.23	58.33	1	15085358
Choi_KAIST_task4_4	Choi_KAIST2026	12.98	64.88	1	745744243
Saijo_Mitsubishi_task4_3	Saijo_Mitsubishi2026	12.94	76.92	5 for tagging, 10 for source counting	687300000
Choi_KAIST_task4_1	Choi_KAIST2026	12.88	64.15	2	760292329
Saijo_Mitsubishi_task4_2	Saijo_Mitsubishi2026	12.77	76.92	5 for tagging, 10 for source counting	687300000
Saijo_Mitsubishi_task4_1	Saijo_Mitsubishi2026	12.77	75.00	5 for tagging, 10 for source counting	681300000
Saijo_Mitsubishi_task4_4	Saijo_Mitsubishi2026	12.54	73.28	1	104840000
Choi_KAIST_task4_2	Choi_KAIST2026	12.32	59.59	2	760292329
Choi_KAIST_task4_3	Choi_KAIST2026	12.25	59.13	1	745744243
Wang_SRCN_task4_2	Wang_SRCN2026	10.13	57.80	1	375724043
Wang_SRCN_task4_1	Wang_SRCN2026	10.12	57.94	1	375724043
Park_SGU_task4_3	Park_SGU2026	10.10	53.17	1	197210110
Wang_SRCN_task4_3	Wang_SRCN2026	10.10	57.94	1	375724043
Park_SGU_task4_4	Park_SGU2026	10.08	53.64	1	197210110
Park_SGU_task4_2	Park_SGU2026	9.42	53.17	1	197210110
Wang_SRCN_task4_4	Wang_SRCN2026	9.20	45.63	1	375524106
You_PKU_task4_4	You_PKU2026	8.24	52.58	5	115400000
You_PKU_task4_1	You_PKU2026	8.24	52.58	4	115400000
You_PKU_task4_2	You_PKU2026	8.20	52.58	5	115400000
You_PKU_task4_3	You_PKU2026	8.20	52.58	5	115400000
Park_SGU_task4_1	Park_SGU2026	8.12	53.70	1	197210110
Jeong_Medisensing_task4_1	Jeong_Medisensing2026	6.97	58.53	1	215900000
Jeong_Medisensing_task4_2	Jeong_Medisensing2026	6.96	58.20	1	215900000
Jeong_Medisensing_task4_3	Jeong_Medisensing2026	6.94	58.33	2	245790000
Wang_BUPT_task4_2	Wang_BUPT2026	6.90	56.61	1	126434854
Deng_WHU_task4_1	Deng_WHU2026	6.84	58.13	3	29900000
Baseline_Task4_1c		6.77	56.55	1	119356966
Wang_BUPT_task4_1	Wang_BUPT2026	6.76	55.56	1	119356966
Baseline_Task4_4c		6.76	55.16	1	126434854
Park_KUBIG_task4_1	Park_KUBIG2026	1.18	39.81	1	91815245

Representative example of separated audio samples

Evaluation set

The following table shows separated sound samples from the evaluation set. Representative outputs from teams ranked 1 to 3 and the baseline are selected. The mixture column uses pseudo-stereo mixture files, and each score row reports clip-level CAPI-SDRi for the corresponding system.

Condition (evaluation set)	Mixture*	Oracle (azimuth, elevation)	Bando_AIST_task4_3 Rank 1	Choi_KAIST_task4_4 Rank 2	Saijo_Mitsubishi_task4_3 Rank 3	Baseline_Task4_1c Baseline
Success case (3 overlapping target events, including 2 same-class events)	FILLER FILLER FILLER	Speech (160°, -20°) BicycleBell (-20°, 0°) BicycleBell (80°, -20°) CAPI-SDRi dummy	Speech BicycleBell BicycleBell CAPI-SDRi (this sample)=25.89 dB	Speech BicycleBell BicycleBell CAPI-SDRi (this sample)=23.78 dB	Speech BicycleBell BicycleBell CAPI-SDRi (this sample)=21.06 dB	Speech -- BicycleBell CAPI-SDRi (this sample)=7.56 dB
Challenging case (3 overlapping target events, including 2 same-class events)	FILLER FILLER FILLER	Percussion (-40°, -20°) Percussion (60°, -20°) Blender (80°, -20°) CAPI-SDRi dummy	Percussion Percussion -- CAPI-SDRi (this sample)=10.04 dB	Percussion Percussion -- CAPI-SDRi (this sample)=9.30 dB	Percussion Percussion Blender CAPI-SDRi (this sample)=10.72 dB	Percussion -- -- CAPI-SDRi (this sample)=1.22 dB
3 Speech	FILLER FILLER FILLER	Speech (-140°, -20°) Speech (80°, -20°) Speech (160°, 0°) CAPI-SDRi dummy	Speech Speech Speech CAPI-SDRi (this sample)=23.56 dB	Speech Speech Speech CAPI-SDRi (this sample)=21.17 dB	Speech Speech Speech CAPI-SDRi (this sample)=20.00 dB	Speech Speech Speech CAPI-SDRi (this sample)=8.04 dB

* A pseudo-stereo signal extracted from the ambisonic input signal. Directional components toward azimuth -90° and 90° are extracted and assigned to the left and right channels, respectively.

Technical reports

END-TO-END ITERATIVE S5 SYSTEM BASED ON TF-LOCOFORMER AND ATST-FRAME

Yoshiaki Bando, Shun Sakurai, Yuto Nozaki, Keisuke Imoto, Masaki Onishi

National Institute of Advanced Industrial Science and Technology, Koto, Tokyo, Japan; Kyoto University, Kyoto, Kyoto, Japan

Bando_AIST_task4_3 Bando_AIST_task4_4 Bando_AIST_task4_2 Bando_AIST_task4_1

Content

Task description

Teams ranking

Systems ranking

Supplementary metrics

Detailed analysis of joint scores, separation, and detection performance

Detailed analysis focused on signal quality

System performance under partially known conditions

System performance by target-source overlap condition

System characteristics

General characteristics

Machine learning characteristics

Complexity

Representative example of separated audio samples

Evaluation set

Technical reports

END-TO-END ITERATIVE S5 SYSTEM BASED ON TF-LOCOFORMER AND ATST-FRAME

END-TO-END ITERATIVE S5 SYSTEM BASED ON TF-LOCOFORMER AND ATST-FRAME

Abstract

A MULTI-STAGE SEPARATION-AND-CLASSIFICATION FRAMEWORK GUIDED BY COMPLEMENTARY ACOUSTIC-TO-SEMANTIC CLUES

A MULTI-STAGE SEPARATION-AND-CLASSIFICATION FRAMEWORK GUIDED BY COMPLEMENTARY ACOUSTIC-TO-SEMANTIC CLUES

Abstract

A LABEL-GUIDED ENSEMBLE SYSTEM FOR SPATIAL SEMANTIC SEGMENTATION OF SAME-CLASS SOUND SOURCES

A LABEL-GUIDED ENSEMBLE SYSTEM FOR SPATIAL SEMANTIC SEGMENTATION OF SAME-CLASS SOUND SOURCES

Abstract

TAGGING-DRIVEN INFERENCE REFINEMENT AND DUAL-SEPARATOR SELECTION FOR SPATIAL SEMANTIC SEGMENTATION OF SOUND SCENES

TAGGING-DRIVEN INFERENCE REFINEMENT AND DUAL-SEPARATOR SELECTION FOR SPATIAL SEMANTIC SEGMENTATION OF SOUND SCENES

Abstract

END-TO-END SPATIAL SEMANTIC SEPARATOR WITH DOA MODULE

END-TO-END SPATIAL SEMANTIC SEPARATOR WITH DOA MODULE

Abstract

EXTENDING SR-CORRNET TO LABEL-QUERIED TARGET SOUND EXTRACTION

EXTENDING SR-CORRNET TO LABEL-QUERIED TARGET SOUND EXTRACTION

Abstract

THE MERL SYSTEMS FOR DCASE 2026 CHALLENGE TASK 4

THE MERL SYSTEMS FOR DCASE 2026 CHALLENGE TASK 4

Abstract

SEMANTIC DISTILLATION FOR SPATIAL SEMANTIC SEGMENTATION OF SOUND SCENES

SEMANTIC DISTILLATION FOR SPATIAL SEMANTIC SEGMENTATION OF SOUND SCENES

Abstract

Local-Global Transformer with Iterative Refinement for Multi-Channel Sound Source Separation and Extraction

Local-Global Transformer with Iterative Refinement for Multi-Channel Sound Source Separation and Extraction

Abstract

DCASE 2026 Task 4 Submission: Duplicate-Label-Aware Source Selection and TUSS Overlay Grafting

DCASE 2026 Task 4 Submission: Duplicate-Label-Aware Source Selection and TUSS Overlay Grafting

Abstract