Task description

The goal of the acoustic scene classification task is to classify recordings into one of the ten predefined acoustic scene classes. This task continues the Acoustic Scene Classification tasks from previous editions of the DCASE Challenge, with a slight shift of focus. This year, the task concentrates on three challenging aspects: (1) a recording device mismatch, (2) low-complexity constraints, and (3) the limited availability of labeled data for training.

More detailed task description can be found in the task description page

Teams ranking

	Submission information			Rank	Accuracy (maximum among entries)
Rank	Submission label	Name	Technical Report	Official rank	Accuracy
	Chang_HYU	mel	Han2025	5	58.98
	Chen_GXU	CD	Chen2025	9	56.63
	Han_CSU	KDTF-SepN	Han2025a	13	32.58
	Jeong_SEOULTECH	DAFA-TE	Jeong2025	8	57.86
	Karasin_JKU	MALACH25_4	Karasin2025	1	61.47
	Krishna_SRIB	SRIB-Team	Gurugubelli2025	10	56.06
	Li_NTU	S2	Li2025	6	58.85
	Luo_CQUPT	DynaCP	Luo2025	3	59.58
	Ramezanee_SUT	Sharif	Ramezanee2025	7	57.92
	DCASE2025 baseline	Baseline		12	53.24
	Tan_SNTLNTU	SNTLNTU_T1_1	Tan2025	2	59.94
	Zhang_AITHU-SJTU	Agp_c96_s1	Zhang2025	4	59.28
	Zhou_XJTLU	Baseline	Ziyang2025	11	55.52

Systems ranking

	Submission information						Evaluation dataset				Development dataset
							Accuracy
Rank	Submission label	Name	Technical Report	Official system rank	Memory rank	MACs rank	Overall Accuracy, with 95% confidence interval (Evaluation dataset)	Accuracy, Known devices (Evaluation dataset)	Accuracy, Unknown devices (Evaluation dataset)	Logloss (Evaluation dataset)	Accuracy (Development dataset)
	Chang_HYU_task1_1	base	Han2025	17	7	4	58.1 (57.9 - 58.4)	60.6	53.3	3.182	57.4
	Chang_HYU_task1_2	mel	Han2025	10	7	10	59.0 (58.7 - 59.2)	61.4	54.1	3.095	58.2
	Chang_HYU_task1_3	hop	Han2025	15	7	13	58.7 (58.4 - 58.9)	60.7	54.5	3.016	58.3
	Chang_HYU_task1_4	hop_mel	Han2025	16	7	13	58.6 (58.4 - 58.8)	60.8	54.2	3.210	57.6
	Chen_GXU_task1_1	CD	Chen2025	21	3	11	56.6 (56.4 - 56.9)	58.5	52.8	1.566	56.5
	Chen_GXU_task1_2	CD	Chen2025	22	3	11	56.5 (56.3 - 56.8)	58.4	52.9	1.411	57.7
	Chen_GXU_task1_3	CD	Chen2025	26	3	11	55.3 (55.0 - 55.5)	57.0	51.7	1.994	56.1
	Han_CSU_task1_1	TF-SepN	Han2025a	31	3	1	26.4 (26.2 - 26.7)	26.6	26.2	2.796	55.0
	Han_CSU_task1_2	KDTF-SepN	Han2025a	32	3	1	25.1 (24.9 - 25.3)	25.3	24.7	2.289	51.3
	Han_CSU_task1_3	KDTF-SepN	Han2025a	29	3	1	32.6 (32.3 - 32.8)	33.2	31.4	1.879	54.1
	Han_CSU_task1_4	KDTF-SepN	Han2025a	30	3	1	30.9 (30.7 - 31.1)	31.5	29.6	1.923	51.2
	Jeong_SEOULTECH_task1_1	DAFA-TE	Jeong2025	20	3	5	56.9 (56.6 - 57.1)	60.1	50.3	1.379	54.6
	Jeong_SEOULTECH_task1_2	DAFA-TE	Jeong2025	19	3	5	57.9 (57.6 - 58.1)	60.3	53.0	1.362	56.0
	Karasin_JKU_task1_1	MALACH25_1	Karasin2025	2	3	11	61.4 (61.1 - 61.6)	64.0	56.2	1.105	60.5
	Karasin_JKU_task1_2	MALACH25_2	Karasin2025	4	3	11	60.1 (59.9 - 60.4)	62.1	56.1	1.177	57.5
	Karasin_JKU_task1_3	MALACH25_3	Karasin2025	3	3	11	60.3 (60.0 - 60.5)	62.8	55.2	1.155	59.0
	Karasin_JKU_task1_4	MALACH25_4	Karasin2025	1	3	11	61.5 (61.2 - 61.7)	64.1	56.2	1.102	60.5
	Krishna_SRIB_task1_1	SRIB-Team	Gurugubelli2025	23	4	6	56.1 (55.8 - 56.3)	58.2	51.8	2.515	56.4
	Li_NTU_task1_1	S1	Li2025	14	3	3	58.7 (58.5 - 59.0)	60.6	55.1	1.133	58.8
	Li_NTU_task1_2	S2	Li2025	12	3	3	58.8 (58.6 - 59.1)	60.5	55.5	1.128	59.3
	Luo_CQUPT_task1_1	DynaCP	Luo2025	6	5	8	59.6 (59.3 - 59.8)	61.9	55.0	1.616	59.0
	Ramezanee_SUT_task1_1	Sharif	Ramezanee2025	27	6	7	54.6 (54.3 - 54.8)	54.8	54.1	1.318	58.2
	Ramezanee_SUT_task1_2	SUT	Ramezanee2025	25	6	7	55.5 (55.3 - 55.7)	56.2	54.1	1.262	58.2
	Ramezanee_SUT_task1_3	Sharif	Ramezanee2025	18	6	7	57.9 (57.7 - 58.2)	59.8	54.1	1.176	58.2
	DCASE2025 baseline	Baseline		28	3	11	53.2 (53.0 - 53.5)	55.4	49.0	1.686	50.7
	Tan_SNTLNTU_task1_1	SNTLNTU_T1_1	Tan2025	5	1	2	59.9 (59.7 - 60.2)	62.2	55.4	1.136	60.4
	Tan_SNTLNTU_task1_2	SNTLNTU_T1_2	Tan2025	9	2	2	59.0 (58.8 - 59.3)	61.6	53.9	1.179	60.2
	Zhang_AITHU-SJTU_task1_1	Agp_c64_s1	Zhang2025	13	10	14	58.8 (58.5 - 59.0)	60.7	54.8	1.118	59.0
	Zhang_AITHU-SJTU_task1_2	Agp_c64_s2	Zhang2025	11	10	14	58.9 (58.7 - 59.2)	60.9	55.0	1.110	58.5
	Zhang_AITHU-SJTU_task1_3	Agp_c96_s1	Zhang2025	7	8	9	59.3 (59.0 - 59.5)	60.8	56.2	1.108	59.0
	Zhang_AITHU-SJTU_task1_4	Agp_c96_s2	Zhang2025	8	8	9	59.3 (59.0 - 59.5)	60.8	56.1	1.105	58.7
	Zhou_XJTLU_task1_1	Baseline	Ziyang2025	24	9	12	55.5 (55.3 - 55.8)	60.0	46.6	1.232	58.5

System complexity

	Submission information			Accuracy / Evaluation dataset			Acoustic model			System
Rank	Submission label	Technical Report	System rank	Overall	Accuracy, Known devices (Evaluation dataset)	Accuracy, Unknown devices (Evaluation dataset)	Size	MACS	Parameters	Complexity management
	Chang_HYU_task1_1	Han2025	17	58.1	60.6	53.3	125156	18758084	62578	precision_16, network design, knowledge distillation
	Chang_HYU_task1_2	Han2025	10	59.0	61.4	54.1	125156	29302844	62578	precision_16, network design, knowledge distillation
	Chang_HYU_task1_3	Han2025	15	58.7	60.7	54.5	125156	29512940	62578	precision_16, network design, knowledge distillation
	Chang_HYU_task1_4	Han2025	16	58.6	60.8	54.2	125156	29512940	62578	precision_16, network design, knowledge distillation
	Chen_GXU_task1_1	Chen2025	21	56.6	58.5	52.8	122296	29419156	61148	knowledge distillation, precision_16
	Chen_GXU_task1_2	Chen2025	22	56.5	58.4	52.9	122296	29419156	61148	knowledge distillation, precision_16
	Chen_GXU_task1_3	Chen2025	26	55.3	57.0	51.7	122296	29419156	61148	knowledge distillation, precision_16
	Han_CSU_task1_1	Han2025a	31	26.4	26.6	26.2	122296	298637	61148	network design
	Han_CSU_task1_2	Han2025a	32	25.1	25.3	24.7	122296	298637	61148	knowledge distillation
	Han_CSU_task1_3	Han2025a	29	32.6	33.2	31.4	122296	298637	61148	knowledge distillation
	Han_CSU_task1_4	Han2025a	30	30.9	31.5	29.6	122296	298637	61148	knowledge distillation
	Jeong_SEOULTECH_task1_1	Jeong2025	20	56.9	60.1	50.3	122296	26059412	61148	knowledge distillation
	Jeong_SEOULTECH_task1_2	Jeong2025	19	57.9	60.3	53.0	122296	26059412	61148	knowledge distillation
	Karasin_JKU_task1_1	Karasin2025	2	61.4	64.0	56.2	122296	29419156	61148	precision_16, network design, knowledge distillation
	Karasin_JKU_task1_2	Karasin2025	4	60.1	62.1	56.1	122296	29419156	61148	precision_16, network design, knowledge distillation
	Karasin_JKU_task1_3	Karasin2025	3	60.3	62.8	55.2	122296	29419156	61148	precision_16, network design, knowledge distillation
	Karasin_JKU_task1_4	Karasin2025	1	61.5	64.1	56.2	122296	29419156	61148	precision_16, network design, knowledge distillation
	Krishna_SRIB_task1_1	Gurugubelli2025	23	56.1	58.2	51.8	122320	27862676	61160	precision_16, network design
	Li_NTU_task1_1	Li2025	14	58.7	60.6	55.1	122296	17050260	61160	knowledge distillation, network design, precision_16
	Li_NTU_task1_2	Li2025	12	58.8	60.5	55.5	122296	17050260	61160	knowledge distillation, network design
	Luo_CQUPT_task1_1	Luo2025	6	59.6	61.9	55.0	123300	28938900	61650	knowledge distillation, precision_16, network design
	Ramezanee_SUT_task1_1	Ramezanee2025	27	54.6	54.8	54.1	125040	28642220	31260	network design, knowledge distillation, pruning, reparametrization
	Ramezanee_SUT_task1_2	Ramezanee2025	25	55.5	56.2	54.1	125040	28642220	31260	network design, knowledge distillation, pruning, reparametrization
	Ramezanee_SUT_task1_3	Ramezanee2025	18	57.9	59.8	54.1	125040	28642220	31260	network design, knowledge distillation, pruning, reparametrization
	DCASE2025 baseline		28	53.2	55.4	49.0	122296	29419156	61148	precision_16, network design
	Tan_SNTLNTU_task1_1	Tan2025	5	59.9	62.2	55.4	116342	10902300	116342	precision_16, network design
	Tan_SNTLNTU_task1_2	Tan2025	9	59.0	61.6	53.9	117210	10902300	117210	precision_16, network design
	Zhang_AITHU-SJTU_task1_1	Zhang2025	13	58.8	60.7	54.8	127496	29982132	63748	precision_16, network design, knowledge distillation, pruning
	Zhang_AITHU-SJTU_task1_2	Zhang2025	11	58.9	60.9	55.0	127496	29982132	63748	precision_16, network design, knowledge distillation, pruning
	Zhang_AITHU-SJTU_task1_3	Zhang2025	7	59.3	60.8	56.2	126430	29221122	63215	precision_16, network design, knowledge distillation, pruning
	Zhang_AITHU-SJTU_task1_4	Zhang2025	8	59.3	60.8	56.1	126430	29221122	63215	precision_16, network design, knowledge distillation, pruning
	Zhou_XJTLU_task1_1	Ziyang2025	24	55.5	60.0	46.6	126858	29419648	126858	network design, weight quantization, knowledge distillation

Generalization performance

All results with evaluation dataset.

Class-wise performance

			Overall		Split
Rank	Submission label	Technical Report	System rank	Accuracy	Airport	Bus	Metro	Metro station	Park	Public square	Shopping mall	Street pedestrian	Street traffic	Tram
	Chang_HYU_task1_1	Han2025	17	58.1	42.6	76.8	59.3	52.1	80.4	34.2	59.3	38.2	74.5	64.1
	Chang_HYU_task1_2	Han2025	10	59.0	43.7	78.4	57.6	53.8	81.0	37.3	61.9	36.2	75.7	64.2
	Chang_HYU_task1_3	Han2025	15	58.7	43.9	74.4	59.1	54.5	80.6	35.6	61.4	35.5	75.6	65.9
	Chang_HYU_task1_4	Han2025	16	58.6	44.9	77.5	59.3	50.6	80.3	35.6	62.9	38.8	74.5	61.5
	Chen_GXU_task1_1	Chen2025	21	56.6	36.8	77.9	54.9	50.4	80.1	36.0	63.1	33.5	74.3	59.3
	Chen_GXU_task1_2	Chen2025	22	56.5	39.9	72.4	54.7	50.9	87.5	33.4	58.7	30.6	72.8	64.5
	Chen_GXU_task1_3	Chen2025	26	55.3	37.3	72.0	54.1	44.9	82.5	28.8	59.4	38.8	71.6	63.4
	Han_CSU_task1_1	Han2025a	31	26.4	4.6	38.8	3.2	25.8	16.7	10.7	17.8	28.2	78.2	40.4
	Han_CSU_task1_2	Han2025a	32	25.1	19.2	61.2	52.3	9.9	14.6	8.0	9.5	28.6	26.3	21.4
	Han_CSU_task1_3	Han2025a	29	32.6	12.0	26.9	18.0	32.6	30.5	25.5	49.8	35.9	66.6	28.0
	Han_CSU_task1_4	Han2025a	30	30.9	37.4	24.6	45.9	34.7	25.8	11.3	50.7	41.5	11.1	25.9
	Jeong_SEOULTECH_task1_1	Jeong2025	20	56.9	44.8	68.8	53.3	48.8	85.9	39.0	67.5	24.7	76.6	59.0
	Jeong_SEOULTECH_task1_2	Jeong2025	19	57.9	48.1	68.2	50.6	49.4	84.0	41.5	66.8	30.3	77.3	62.3
	Karasin_JKU_task1_1	Karasin2025	2	61.4	51.9	83.5	61.5	50.7	87.6	35.7	68.0	33.5	77.4	64.2
	Karasin_JKU_task1_2	Karasin2025	4	60.1	50.0	81.5	58.3	48.0	85.4	37.8	63.8	33.2	77.6	65.8
	Karasin_JKU_task1_3	Karasin2025	3	60.3	45.3	76.7	59.9	49.0	85.7	33.6	70.9	33.2	79.6	68.7
	Karasin_JKU_task1_4	Karasin2025	1	61.5	52.6	83.2	61.8	50.4	87.6	35.9	67.6	33.5	77.5	64.4
	Krishna_SRIB_task1_1	Gurugubelli2025	23	56.1	40.0	76.4	55.1	46.7	81.5	32.4	57.2	35.6	75.3	60.3
	Li_NTU_task1_1	Li2025	14	58.7	44.6	70.2	59.8	52.2	85.7	36.6	66.6	31.4	75.1	65.2
	Li_NTU_task1_2	Li2025	12	58.8	39.8	73.7	58.0	53.4	84.0	39.9	69.6	31.3	74.5	64.3
	Luo_CQUPT_task1_1	Luo2025	6	59.6	44.8	77.3	58.8	54.5	84.9	37.7	61.1	37.6	76.5	62.6
	Ramezanee_SUT_task1_1	Ramezanee2025	27	54.6	46.4	82.3	48.6	45.9	85.4	31.8	54.9	28.4	69.2	52.7
	Ramezanee_SUT_task1_2	Ramezanee2025	25	55.5	44.6	83.0	47.8	51.6	84.3	34.1	57.2	26.3	70.6	55.5
	Ramezanee_SUT_task1_3	Ramezanee2025	18	57.9	43.0	82.8	49.7	51.9	83.7	38.6	61.1	34.6	68.3	65.4
	DCASE2025 baseline		28	53.2	40.5	69.7	47.2	42.1	79.8	36.1	53.5	34.8	74.8	53.9
	Tan_SNTLNTU_task1_1	Tan2025	5	59.9	50.6	83.4	54.8	47.0	85.4	37.7	63.9	35.9	70.6	70.0
	Tan_SNTLNTU_task1_2	Tan2025	9	59.0	49.0	78.9	58.7	50.6	82.6	36.1	59.4	38.7	70.0	66.6
	Zhang_AITHU-SJTU_task1_1	Zhang2025	13	58.8	46.5	70.3	52.5	53.3	84.2	34.9	65.1	36.1	75.7	68.9
	Zhang_AITHU-SJTU_task1_2	Zhang2025	11	58.9	43.6	72.1	53.3	49.8	83.2	36.4	65.5	40.0	78.9	66.7
	Zhang_AITHU-SJTU_task1_3	Zhang2025	7	59.3	50.7	66.4	54.0	51.1	85.2	35.3	66.3	37.7	77.2	68.9
	Zhang_AITHU-SJTU_task1_4	Zhang2025	8	59.3	47.8	68.3	53.2	52.0	87.0	32.8	70.6	34.1	78.3	68.4
	Zhou_XJTLU_task1_1	Ziyang2025	24	55.5	41.1	68.8	60.1	48.4	65.6	34.9	67.1	33.9	74.9	60.5

Device-wise performance

				Overall			Devices
					Split		Unseen devices					Seen devices
Rank	Submission label	Technical Report	System rank	Accuracy	Accuracy / Unseen	Accuracy / Seen	D	S7	S8	S9	S10	A	B	C	S1	S2	S3
	Chang_HYU_task1_1	Han2025	17	58.1	53.3	60.6	44.3	57.9	57.3	52.4	54.6	67.3	60.9	61.7	57.1	56.8	59.5
	Chang_HYU_task1_2	Han2025	10	59.0	54.1	61.4	46.8	58.5	56.1	53.9	55.3	67.0	61.9	62.2	57.3	59.1	61.1
	Chang_HYU_task1_3	Han2025	15	58.7	54.5	60.7	48.1	58.8	57.7	54.0	53.8	67.3	60.4	61.8	57.2	57.8	59.9
	Chang_HYU_task1_4	Han2025	16	58.6	54.2	60.8	46.3	58.2	57.6	53.2	55.5	67.2	60.0	61.8	58.0	57.8	60.0
	Chen_GXU_task1_1	Chen2025	21	56.6	52.8	58.5	48.8	57.7	56.9	46.6	54.2	67.9	60.6	61.5	53.5	50.9	56.8
	Chen_GXU_task1_2	Chen2025	22	56.5	52.9	58.4	46.4	57.0	55.6	49.3	56.1	68.2	60.6	61.1	53.2	50.1	57.1
	Chen_GXU_task1_3	Chen2025	26	55.3	51.7	57.0	48.4	54.6	56.6	46.5	52.6	67.8	59.3	59.1	52.4	49.8	53.9
	Han_CSU_task1_1	Han2025a	31	26.4	26.2	26.6	27.0	27.7	25.5	24.1	26.5	29.5	29.4	27.1	25.6	22.4	25.5
	Han_CSU_task1_2	Han2025a	32	25.1	24.7	25.3	26.6	24.6	24.6	23.4	24.5	28.2	27.2	26.7	22.3	22.8	24.7
	Han_CSU_task1_3	Han2025a	29	32.6	31.4	33.2	36.5	31.6	29.3	29.4	30.3	41.1	34.1	38.9	30.4	26.1	28.3
	Han_CSU_task1_4	Han2025a	30	30.9	29.6	31.5	34.2	28.9	27.2	29.3	28.5	39.7	33.8	38.1	27.5	23.1	26.8
	Jeong_SEOULTECH_task1_1	Jeong2025	20	56.9	50.3	60.1	39.0	54.2	53.6	53.1	51.3	68.8	60.0	63.0	54.5	55.5	59.2
	Jeong_SEOULTECH_task1_2	Jeong2025	19	57.9	53.0	60.3	44.7	57.2	54.2	54.8	54.2	68.7	60.5	63.7	55.4	54.6	58.8
	Karasin_JKU_task1_1	Karasin2025	2	61.4	56.2	64.0	47.5	60.9	58.0	58.3	56.1	70.6	64.6	65.4	59.7	59.8	63.9
	Karasin_JKU_task1_2	Karasin2025	4	60.1	56.1	62.1	49.6	59.3	56.9	58.0	56.9	70.0	62.0	63.1	58.1	57.7	61.7
	Karasin_JKU_task1_3	Karasin2025	3	60.3	55.2	62.8	45.3	59.6	55.5	58.2	57.1	69.1	62.3	64.8	59.6	58.2	63.0
	Karasin_JKU_task1_4	Karasin2025	1	61.5	56.2	64.1	47.5	60.9	58.0	58.3	56.1	70.6	64.6	65.7	60.2	59.8	63.9
	Krishna_SRIB_task1_1	Gurugubelli2025	23	56.1	51.8	58.2	45.0	56.0	54.9	48.6	54.4	66.8	59.4	61.8	52.4	52.0	56.9
	Li_NTU_task1_1	Li2025	14	58.7	55.1	60.6	51.3	58.6	58.6	52.2	54.7	67.0	59.7	62.2	57.5	56.8	60.3
	Li_NTU_task1_2	Li2025	12	58.8	55.5	60.5	53.0	58.6	58.2	52.7	54.7	66.4	59.9	62.1	57.9	56.7	60.2
	Luo_CQUPT_task1_1	Luo2025	6	59.6	55.0	61.9	42.1	59.7	60.1	56.4	56.7	70.7	62.1	64.7	57.3	56.7	59.6
	Ramezanee_SUT_task1_1	Ramezanee2025	27	54.6	54.1	54.8	41.4	59.6	58.0	56.1	55.3	64.2	53.9	53.0	52.5	50.4	54.9
	Ramezanee_SUT_task1_2	Ramezanee2025	25	55.5	54.1	56.2	41.4	59.6	58.0	56.1	55.3	63.7	54.8	55.4	54.4	52.3	56.8
	Ramezanee_SUT_task1_3	Ramezanee2025	18	57.9	54.1	59.8	41.4	59.6	58.0	56.1	55.3	65.6	59.2	58.8	57.3	58.4	59.8
	DCASE2025 baseline		28	53.2	49.0	55.4	47.5	51.6	48.8	45.3	51.7	64.8	57.2	59.9	48.9	48.7	52.7
	Tan_SNTLNTU_task1_1	Tan2025	5	59.9	55.4	62.2	49.3	58.6	59.2	52.2	57.5	67.8	61.3	64.5	59.9	59.2	60.7
	Tan_SNTLNTU_task1_2	Tan2025	9	59.0	53.9	61.6	44.6	58.7	59.2	50.0	56.9	67.7	60.6	63.4	59.5	57.4	61.1
	Zhang_AITHU-SJTU_task1_1	Zhang2025	13	58.8	54.8	60.7	46.7	58.6	57.4	55.2	56.4	69.2	60.0	62.7	56.6	55.0	61.0
	Zhang_AITHU-SJTU_task1_2	Zhang2025	11	58.9	55.0	60.9	48.7	58.5	57.5	53.5	56.8	69.6	60.8	62.8	55.9	55.7	60.8
	Zhang_AITHU-SJTU_task1_3	Zhang2025	7	59.3	56.2	60.8	51.7	58.9	57.4	55.6	57.2	69.3	59.8	62.4	56.5	55.9	61.0
	Zhang_AITHU-SJTU_task1_4	Zhang2025	8	59.3	56.1	60.8	52.2	59.0	58.4	54.8	56.0	69.4	60.8	62.6	55.9	56.0	60.4
	Zhou_XJTLU_task1_1	Ziyang2025	24	55.5	46.6	60.0	49.3	51.5	50.0	42.7	39.7	67.5	60.7	60.5	55.3	57.7	58.1

Cities

	Submission information		Overall		Split
Rank	Submission label	Technical Report	System rank	Accuracy	Unseen / , accuracy, unseen cities (Evaluation dataset)	Seen / , accuracy, seen cities (Evaluation dataset)
	Chang_HYU_task1_1	Han2025	17	58.1	56.06	58.59
	Chang_HYU_task1_2	Han2025	10	59.0	58.13	59.18
	Chang_HYU_task1_3	Han2025	15	58.7	57.80	58.86
	Chang_HYU_task1_4	Han2025	16	58.6	56.79	58.98
	Chen_GXU_task1_1	Chen2025	21	56.6	54.71	57.05
	Chen_GXU_task1_2	Chen2025	22	56.5	54.38	57.01
	Chen_GXU_task1_3	Chen2025	26	55.3	53.89	55.59
	Han_CSU_task1_1	Han2025a	31	26.4	28.12	26.11
	Han_CSU_task1_2	Han2025a	32	25.1	25.57	25.02
	Han_CSU_task1_3	Han2025a	29	32.6	33.91	32.31
	Han_CSU_task1_4	Han2025a	30	30.9	32.42	30.57
	Jeong_SEOULTECH_task1_1	Jeong2025	20	56.9	56.03	57.06
	Jeong_SEOULTECH_task1_2	Jeong2025	19	57.9	57.03	58.07
	Karasin_JKU_task1_1	Karasin2025	2	61.4	61.95	61.30
	Karasin_JKU_task1_2	Karasin2025	4	60.1	60.10	60.17
	Karasin_JKU_task1_3	Karasin2025	3	60.3	60.95	60.16
	Karasin_JKU_task1_4	Karasin2025	1	61.5	62.06	61.38
	Krishna_SRIB_task1_1	Gurugubelli2025	23	56.1	54.69	56.37
	Li_NTU_task1_1	Li2025	14	58.7	57.92	58.94
	Li_NTU_task1_2	Li2025	12	58.8	58.52	58.94
	Luo_CQUPT_task1_1	Luo2025	6	59.6	57.91	59.96
	Ramezanee_SUT_task1_1	Ramezanee2025	27	54.6	53.90	54.74
	Ramezanee_SUT_task1_2	Ramezanee2025	25	55.5	54.72	55.69
	Ramezanee_SUT_task1_3	Ramezanee2025	18	57.9	56.96	58.15
	DCASE2025 baseline		28	53.2	52.95	53.33
	Tan_SNTLNTU_task1_1	Tan2025	5	59.9	58.45	60.28
	Tan_SNTLNTU_task1_2	Tan2025	9	59.0	58.85	59.11
	Zhang_AITHU-SJTU_task1_1	Zhang2025	13	58.8	57.87	58.98
	Zhang_AITHU-SJTU_task1_2	Zhang2025	11	58.9	58.39	59.09
	Zhang_AITHU-SJTU_task1_3	Zhang2025	7	59.3	58.32	59.51
	Zhang_AITHU-SJTU_task1_4	Zhang2025	8	59.3	58.18	59.51
	Zhou_XJTLU_task1_1	Ziyang2025	24	55.5	55.60	55.53

System characteristics

General characteristics

Submission label	Technical Report	Rank	Accuracy	Sampling rate	Data augmentation	Features
Chang_HYU_task1_1	Han2025	17	58.1	32kHz	freq-mixstyle, frequency masking, time rolling, DIR	log-mel energies
Chang_HYU_task1_2	Han2025	10	59.0	32kHz	freq-mixstyle, frequency masking, time rolling, DIR	log-mel energies
Chang_HYU_task1_3	Han2025	15	58.7	32kHz	freq-mixstyle, frequency masking, time rolling, DIR	log-mel energies
Chang_HYU_task1_4	Han2025	16	58.6	32kHz	freq-mixstyle, frequency masking, time rolling, DIR	log-mel energies
Chen_GXU_task1_1	Chen2025	21	56.6	32kHz	freq-mixstyle, time rolling	log-mel energies
Chen_GXU_task1_2	Chen2025	22	56.5	32kHz	freq-mixstyle, time rolling	log-mel energies
Chen_GXU_task1_3	Chen2025	26	55.3	32kHz	freq-mixstyle, time rolling	log-mel energies
Han_CSU_task1_1	Han2025a	31	26.4	44.1kHz	MixUp, MixStyle, SpecAug, FiltAug, AddNoise, FrameShift,TimeMask, FreqMask, DIR	log-mel spectrogram
Han_CSU_task1_2	Han2025a	32	25.1	44.1kHz	MixUp, MixStyle, SpecAug, FiltAug, AddNoise, FrameShift,TimeMask, FreqMask, DIR	log-mel spectrogram
Han_CSU_task1_3	Han2025a	29	32.6	44.1kHz	MixUp, MixStyle, SpecAug, FiltAug, AddNoise, FrameShift,TimeMask, FreqMask, DIR	log-mel spectrogram
Han_CSU_task1_4	Han2025a	30	30.9	44.1kHz	MixUp, MixStyle, SpecAug, FiltAug, AddNoise, FrameShift, TimeMask, FreqMask, DIR	log-mel spectrogram
Jeong_SEOULTECH_task1_1	Jeong2025	20	56.9	44.1kHz	freq-mixstyle, mixup	log-mel energies
Jeong_SEOULTECH_task1_2	Jeong2025	19	57.9	44.1kHz	freq-mixstyle, mixup	log-mel energies
Karasin_JKU_task1_1	Karasin2025	2	61.4	32kHz	freq-mixstyle, DIR, time masking, frequency masking, time rolling	log-mel energies
Karasin_JKU_task1_2	Karasin2025	4	60.1	32kHz	freq-mixstyle, DIR, time masking, frequency masking, time rolling	log-mel energies
Karasin_JKU_task1_3	Karasin2025	3	60.3	32kHz	freq-mixstyle, DIR, time masking, frequency masking, time rolling	log-mel energies
Karasin_JKU_task1_4	Karasin2025	1	61.5	32kHz	freq-mixstyle, DIR, time masking, frequency masking, time rolling	log-mel energies
Krishna_SRIB_task1_1	Gurugubelli2025	23	56.1	32kHz	freq-mixstyle, frequency masking, time rolling	log-mel energies
Li_NTU_task1_1	Li2025	14	58.7	32kHz	freq-mixstyle, time rolling, DIR	log-mel energies
Li_NTU_task1_2	Li2025	12	58.8	32kHz	freq-mixstyle, time rolling, DIR	log-mel energies
Luo_CQUPT_task1_1	Luo2025	6	59.6	44.1kHz	freq-mixstyle, pitch shifting, time rolling	log-mel energies
Ramezanee_SUT_task1_1	Ramezanee2025	27	54.6	32kHz	freq-mixstyle, frequency masking, time masking, random noise, random gain, DIR	log-mel energies
Ramezanee_SUT_task1_2	Ramezanee2025	25	55.5	32kHz	freq-mixstyle, frequency masking, time masking, random noise, random gain, DIR	log-mel energies
Ramezanee_SUT_task1_3	Ramezanee2025	18	57.9	32kHz	freq-mixstyle, frequency masking, time masking, random noise, random gain, DIR	log-mel energies
DCASE2025 baseline		28	53.2	32kHz	freq-mixstyle, pitch shifting, time rolling	log-mel energies
Tan_SNTLNTU_task1_1	Tan2025	5	59.9	44.1kHz	freq-mixstyle, DIR, SpecAug	log-mel energies
Tan_SNTLNTU_task1_2	Tan2025	9	59.0	44.1kHz	freq-mixstyle, DIR, SpecAug	log-mel energies
Zhang_AITHU-SJTU_task1_1	Zhang2025	13	58.8	32kHz	freq-mixstyle, frequency masking, time masking, time rolling	log-mel energies
Zhang_AITHU-SJTU_task1_2	Zhang2025	11	58.9	32kHz	freq-mixstyle, frequency masking, time masking, time rolling	log-mel energies
Zhang_AITHU-SJTU_task1_3	Zhang2025	7	59.3	32kHz	freq-mixstyle, frequency masking, time masking, time rolling	log-mel energies
Zhang_AITHU-SJTU_task1_4	Zhang2025	8	59.3	32kHz	freq-mixstyle, frequency masking, time masking, time rolling	log-mel energies
Zhou_XJTLU_task1_1	Ziyang2025	24	55.5	32kHz	mixup, freq-mixstyle, DIR	log-mel spectrogram

Machine learning characteristics

Code	Technical Report	Rank	Accuracy	External data usage	External data sources	Model complexity	Model MACS	Classifier	Framework	Pipeline	Device information	Number of models	Model weight sharing
Chang_HYU_task1_1	Han2025	17	58.1	pre-trained model	MicIRP, PaSST	62578	18758084	RF-regularized CNN, CTFAttention	pytorch	train teachers, ensemble teachers, train general student model with knowledge distillation, device-specific end-to-end fine-tuning	per-device end-to-end fine-tuning	7	fully device-specific
Chang_HYU_task1_2	Han2025	10	59.0	pre-trained model	MicIRP, PaSST	62578	29302844	RF-regularized CNN, CTFAttention	pytorch	train teachers, ensemble teachers, train general student model with knowledge distillation, device-specific end-to-end fine-tuning	per-device end-to-end fine-tuning	7	fully device-specific
Chang_HYU_task1_3	Han2025	15	58.7	pre-trained model	MicIRP, PaSST	62578	29512940	RF-regularized CNN, CTFAttention	pytorch	train teachers, ensemble teachers, train general student model with knowledge distillation, device-specific end-to-end fine-tuning	per-device end-to-end fine-tuning	7	fully device-specific
Chang_HYU_task1_4	Han2025	16	58.6	pre-trained model	MicIRP, PaSST	62578	29512940	RF-regularized CNN, CTFAttention	pytorch	train teachers, ensemble teachers, train general student model with knowledge distillation, device-specific end-to-end fine-tuning	per-device end-to-end fine-tuning	7	fully device-specific
Chen_GXU_task1_1	Chen2025	21	56.6	pre-trained model	AudioSet	61148	29419156	CP-Mobile	pytorch	train teachers, ensemble teachers, train general student model with knowledge distillation		1	fully device-specific
Chen_GXU_task1_2	Chen2025	22	56.5	pre-trained model		61148	29419156	CP-Mobile	pytorch	train teachers, ensemble teachers, train general student model with knowledge distillation		1	fully device-specific
Chen_GXU_task1_3	Chen2025	26	55.3	pre-trained model	AudioSet	61148	29419156	CP-Mobile	pytorch	train teachers, ensemble teachers, train general student model with knowledge distillation		1	fully device-specific
Han_CSU_task1_1	Han2025a	31	26.4	None		61148	298637	CNN (SepNet)	pytorch	data augmentation, train baseline model	per-device end-to-end fine-tuning	7	fully device-specific
Han_CSU_task1_2	Han2025a	32	25.1	pre-trained model, BEATs		61148	298637	CNN	pytorch	train transformer teachers, knowledge distillation to SepNet student, device-specific end-to-end fine-tuning	per-device end-to-end fine-tuning	7	fully device-specific
Han_CSU_task1_3	Han2025a	29	32.6	pre-trained model, BEATs, EfficientAT		61148	298637	CNN	pytorch	train transformer teachers, knowledge distillation to SepNet student, device-specific end-to-end fine-tuning	per-device end-to-end fine-tuning	7	fully device-specific
Han_CSU_task1_4	Han2025a	30	30.9	pre-trained model, BEATs, EfficientAT		61148	298637	CNN	pytorch	train transformer teachers, knowledge distillation to SepNet student, device-specific end-to-end fine-tuning	per-device end-to-end fine-tuning	7	fully device-specific
Jeong_SEOULTECH_task1_1	Jeong2025	20	56.9	pre-trained model		61148	26059412	CNN, Transformer	pytorch	train general teacher model, ensemble teachers, device-specific fine-tuning	per-device end-to-end fine-tuning	7	fully device-specific
Jeong_SEOULTECH_task1_2	Jeong2025	19	57.9	pre-trained model		61148	26059412	CNN, Transformer	pytorch	train general teacher model, ensemble teachers, device-specific fine-tuning	per-device end-to-end fine-tuning	7	fully device-specific
Karasin_JKU_task1_1	Karasin2025	2	61.4	dataset, pre-trained model	PretrainedSED, MicIRP, CochlScene	61148	29419156	RF-regularized CNN	pytorch	pretrain CP-ResNet teacher on CochlScene, train teachers (CP-ResNet;BEATs) on TAU22, device-specific end-to-end fine-tuning the CP-ResNet teacher, pretrain student model on CochlScene, train general student model with knowledge distillation, device-specific end-to-end fine-tuning with knowledge distillation	per-device end-to-end fine-tuning	7	fully device-specific
Karasin_JKU_task1_2	Karasin2025	4	60.1	dataset, pre-trained model	PretrainedSED, MicIRP, CochlScene	61148	29419156	RF-regularized CNN	pytorch	pretrain CP-ResNet teacher on CochlScene, train teachers (CP-ResNet;BEATs) on TAU22, device-specific end-to-end fine-tuning the CP-ResNet teacher, pretrain student model on CochlScene, train general student model with knowledge distillation, device-specific end-to-end fine-tuning with knowledge distillation	per-device end-to-end fine-tuning	7	fully device-specific
Karasin_JKU_task1_3	Karasin2025	3	60.3	dataset, pre-trained model	MicIRP, CochlScene, PaSST	61148	29419156	RF-regularized CNN	pytorch	pretrain CP-ResNet teacher on CochlScene, train teachers (CP-ResNet;PaSST) on TAU22, device-specific end-to-end fine-tuning the CP-ResNet teacher, pretrain student model on CochlScene, train general student model with knowledge distillation, device-specific end-to-end fine-tuning	per-device end-to-end fine-tuning	7	fully device-specific
Karasin_JKU_task1_4	Karasin2025	1	61.5	dataset, pre-trained model	PretrainedSED, MicIRP, CochlScene	61148	29419156	RF-regularized CNN	pytorch	pretrain CP-ResNet teacher on CochlScene, train teachers (CP-ResNet;BEATs) on TAU22, device-specific end-to-end fine-tuning the CP-ResNet teacher, pretrain student model on CochlScene, train general student model with knowledge distillation, device-specific end-to-end fine-tuning for device s1, device-specific end-to-end fine-tuning with knowledge distillation for the rest of the devices	per-device end-to-end fine-tuning	7	fully device-specific
Krishna_SRIB_task1_1	Gurugubelli2025	23	56.1			61160	27862676	RF-regularized CNN	pytorch	train general model, device-specific end-to-end fine-tuning	per-device end-to-end fine-tuning	7	fully device-specific
Li_NTU_task1_1	Li2025	14	58.7	dataset, micIRP, pre-trained model, PaSST		61160	17050260	RF-regularized CNN	pytorch	train teachers, ensemble teachers, train general student model with knowledge distillation (both stage-wise and output-wise), model soup, device-specific end-to-end fine-tuning	per-device end-to-end fine-tuning, device-IR augmentation	1	fully shared
Li_NTU_task1_2	Li2025	12	58.8	dataset, micIRP, pre-trained model, PaSST	MicIRP	61160	17050260	RF-regularized CNN	pytorch	train teachers, ensemble teachers, train general student model with knowledge distillation (both stage-wise and output-wise), model soup, device-specific end-to-end fine-tuning	per-device end-to-end fine-tuning, DIR	1	fully shared
Luo_CQUPT_task1_1	Luo2025	6	59.6	pre-trained model		61650	28938900	RF-regularized CNN	pytorch	train teachers, ensemble teachers, train general student model with knowledge distillation, device-specific end-to-end fine-tuning	per-device end-to-end fine-tuning	7	fully device-specific
Ramezanee_SUT_task1_1	Ramezanee2025	27	54.6	dataset	MicIRP	31260	28642220	CNN	pytorch	train teachers, ensemble teachers, train general model, device-specific end-to-end fine-tuning, train student models with knowledge distillation	per-device end-to-end fine-tuning	7	fully device-specific
Ramezanee_SUT_task1_2	Ramezanee2025	25	55.5	dataset	MicIRP	31260	28642220	CNN	pytorch	train teachers, ensemble teachers, train general model, device-specific end-to-end fine-tuning, train student models with knowledge distillation	per-device end-to-end fine-tuning	7	fully device-specific
Ramezanee_SUT_task1_3	Ramezanee2025	18	57.9	dataset	MicIRP	31260	28642220	CNN	pytorch	train teachers, ensemble teachers, train general model, device-specific end-to-end fine-tuning, train student models with knowledge distillation	per-device end-to-end fine-tuning	7	fully device-specific
DCASE2025 baseline		28	53.2			61148	29419156	RF-regularized CNN	pytorch	training	per-device end-to-end fine-tuning	7	fully device-specific
Tan_SNTLNTU_task1_1	Tan2025	5	59.9			116342	10902300	GRU-CNN	pytorch	train general model, device-specific end-to-end fine-tuning	per-device end-to-end fine-tuning	7	fully device-specific
Tan_SNTLNTU_task1_2	Tan2025	9	59.0			117210	10902300	GRU-CNN	pytorch	train general model, device-specific end-to-end fine-tuning	per-device end-to-end fine-tuning	7	fully device-specific
Zhang_AITHU-SJTU_task1_1	Zhang2025	13	58.8	pre-trained model	EfficientAT	63748	29982132	CNN	pytorch	train teachers, ensemble teachers, train student using knowledge distillation, pruning		1	fully shared
Zhang_AITHU-SJTU_task1_2	Zhang2025	11	58.9	pre-trained model	EfficientAT	63748	29982132	CNN	pytorch	train teachers, ensemble teachers, train student using knowledge distillation, pruning		1	fully shared
Zhang_AITHU-SJTU_task1_3	Zhang2025	7	59.3	pre-trained model	EfficientAT	63215	29221122	CNN	pytorch	train teachers, ensemble teachers, train student using knowledge distillation, pruning		1	fully shared
Zhang_AITHU-SJTU_task1_4	Zhang2025	8	59.3	pre-trained model	EfficientAT	63215	29221122	CNN	pytorch	train teachers, ensemble teachers, train student using knowledge distillation, pruning		1	fully shared
Zhou_XJTLU_task1_1	Ziyang2025	24	55.5	dataset, embeddings, pre-trained model	AudioSet_balanced	126858	29419648	CNN (TF-SepNet)	pytorch_lighting	train general model, device-specific end-to-end fine-tuning	per-device end-to-end fine-tuning	7	fully device-specific

Technical reports

McCi Submission to DCASE 2025: Training Low-Complexity Acoustic Scene Classification System with Knowledge Distillation and Curriculum

Xuanyan Chen and Wei Xie

School of Computer, Electronics and Information, Guangxi University, Guangxi, China

Chen_GXU_task1_1 Chen_GXU_task1_2 Chen_GXU_task1_3

PDF Code

McCi Submission to DCASE 2025: Training Low-Complexity Acoustic Scene Classification System with Knowledge Distillation and Curriculum

Xuanyan Chen and Wei Xie
School of Computer, Electronics and Information, Guangxi University, Guangxi, China

Abstract

The Task 1 of DCASE 2025 focuses on different aspects of Acoustic Scene Classification(ASC) including recording device mismatch, low complexity constraints, data efficiency and the development of recording-device-specific models. This technical report describes the system we submitted. We first trained several teacher models on the ASC dataset through Self-Distillation and Curriculum Learning techniques.These teacher models included a model pre-trained on the AudioSet. Then we distill the knowledge from the teacher model into the student model via curriculum learning. We used the same inference model (i.e., student model) and data augmentation settings as provided in the baseline system. In experiments, our best system achieved an accuracy of 57.66%.

System characteristics

Sampling rate	32kHz
Data augmentation	freq-mixstyle, time rolling
Sampling rate	32kHz
Features	log-mel energies
Classifier	CP-Mobile
Complexity management	knowledge distillation, precision_16
Number of models at inference	1
Model weight sharing	fully device-specific

PDF

Source code Inference code

Srib Submission for DCASE 2025 Challenge Task-1: Low-Complexity Acoustic Scene Classification with Device Information

Krishna Gurugubelli, Ravi Solanki, Sujith Viswanathan, Madhu Rayappa Kamble, Aditi Deo, Abhinandan Udupa, Ramya Viswanathan and Rajesh Krishna K S

Audio AI Team, Samsung R&D Institute India-Bangalore, Bangalore, India

Krishna_SRIB_task1_1

PDF Code

Srib Submission for DCASE 2025 Challenge Task-1: Low-Complexity Acoustic Scene Classification with Device Information

Krishna Gurugubelli, Ravi Solanki, Sujith Viswanathan, Madhu Rayappa Kamble, Aditi Deo, Abhinandan Udupa, Ramya Viswanathan and Rajesh Krishna K S
Audio AI Team, Samsung R&D Institute India-Bangalore, Bangalore, India

Abstract

This report details our submission for Task 1: Low-Complexity Acoustic Scene Classification with Device Information in the DCASE2025 challenge[1]. Our method builds upon the leading system from the DCASE2023 competition. Specifically, we have explored the CP-Mobile architecture in this work. To improve the generalization across devices, we incorporate several data augmentation strategies, including Freq-Mix-Style, frequency masking, and time rolling. To meet the model complexity requirements of the competition, we have evaluated the model with 16-bit precision. Hence, we have incorporated the mixed precision training to achieve the better performance during inference with 16-bit model. Our results show significant improvements in test accuracy over the baseline, confirming the effectiveness of our approach across all subsets.

System characteristics

Sampling rate	32kHz
Data augmentation	freq-mixstyle, frequency masking, time rolling
Sampling rate	32kHz
Features	log-mel energies
Classifier	RF-regularized CNN
Complexity management	precision_16, network design
Device information	per-device end-to-end fine-tuning
Number of models at inference	7
Model weight sharing	fully device-specific

PDF

Inference code

Hyu Submission for DCASE 2025 Task 1: Low-Complexity Acoustic Scene Classification Using Reparameterizable CNN with Channel-Time-Frequency Attention

Seung-Gyu Han¹, Pil Moo Byun² and Joon-Hyuk Chang^1,2

¹Artificial Intelligence Semiconductor Engineering, Hanyang University, Seoul, Republic of Korea, ²Artificial Intelligence, Hanyang University, Seoul, Republic of Korea

Chang_HYU_task1_1 Chang_HYU_task1_2 Chang_HYU_task1_3 Chang_HYU_task1_4

PDF Code

Hyu Submission for DCASE 2025 Task 1: Low-Complexity Acoustic Scene Classification Using Reparameterizable CNN with Channel-Time-Frequency Attention

Seung-Gyu Han¹, Pil Moo Byun² and Joon-Hyuk Chang^1,2
¹Artificial Intelligence Semiconductor Engineering, Hanyang University, Seoul, Republic of Korea, ²Artificial Intelligence, Hanyang University, Seoul, Republic of Korea

Abstract

This paper presents the Hanyang University team’s submission for the DCASE 2025 Challenge Task 1: Low-Complexity Acoustic Scene Classification with Device Information. The task focuses on developing compact and efficient models that generalize well across both seen and unseen recording devices, under strict constraints on model size and computational cost. To address these challenges, we propose Rep-CTFA, a lightweight convolutional neural network that integrates two key design elements: (1) reparameterizable convolutional blocks with learnable branch scaling coefficients, and (2) a Channel-Time-Frequency Attention (CTFA) module. In addition, we explore input resolution variation by adjusting the hop length and number of mel bins to control time-frequency granularity. Knowledge distillation from a PaSST-based teacher ensemble is used to guide the training of the student model, improving generalization. Finally, we adopt a device-aware fine-tuning scheme that updates lightweight classification heads per device while keeping the shared backbone intact.

System characteristics

Sampling rate	32kHz
Data augmentation	freq-mixstyle, frequency masking, time rolling, DIR
Sampling rate	32kHz
Features	log-mel energies
Classifier	RF-regularized CNN, CTFAttention
Complexity management	precision_16, network design, knowledge distillation
Device information	per-device end-to-end fine-tuning
Number of models at inference	7
Model weight sharing	fully device-specific

PDF

Inference code

Confidence-Aware Ensemble Knowledge Distillation for Low-Complexity Acoustic Scene Classification

Sarang Han¹, Dong Ho Lee², Min Sik Jo¹, Eun Seo Ha¹, Min Ju Chae¹ and Geon Woo Lee¹

¹Intelligence Speech and Processing Language, ChoSun University (CSU) Gwangju, Gwangju, South Korea, ²Institute of Computational Perception (CP), Johannes Kepler University (JKU) Linz, Linz, Austria

Han_CSU_task1_1 Han_CSU_task1_2 Han_CSU_task1_3 Han_CSU_task1_4

PDF Code

Confidence-Aware Ensemble Knowledge Distillation for Low-Complexity Acoustic Scene Classification

Sarang Han¹, Dong Ho Lee², Min Sik Jo¹, Eun Seo Ha¹, Min Ju Chae¹ and Geon Woo Lee¹
¹Intelligence Speech and Processing Language, ChoSun University (CSU) Gwangju, Gwangju, South Korea, ²Institute of Computational Perception (CP), Johannes Kepler University (JKU) Linz, Linz, Austria

Abstract

We propose a confidence-aware ensemble knowledge distillation method for acoustic scene classification under low-complexity and limited-data settings. Our approach utilizes heterogeneous teacher models—BEATs, and EfficientAT—fine-tuned on the DCASE 2025 Task 1 dataset, to guide the training of a lightweight student model, TFSepNet. To improve over naive ensemble distillation, we introduce a confidence-weighted strategy that emphasizes reliable teacher outputs. Experimental results show improved generalization on unseen devices and domains, outperforming single-teacher and uniform ensemble baselines.

System characteristics

Sampling rate	44.1kHz
Data augmentation	MixUp, MixStyle, SpecAug, FiltAug, AddNoise, FrameShift,TimeMask, FreqMask, DIR; MixUp, MixStyle, SpecAug, FiltAug, AddNoise, FrameShift, TimeMask, FreqMask, DIR
Sampling rate	44.1kHz
Features	log-mel spectrogram
Classifier	CNN (SepNet); CNN
Complexity management	network design; knowledge distillation
Device information	per-device end-to-end fine-tuning
Number of models at inference	7
Model weight sharing	fully device-specific

PDF

Source code Inference code

Adaptive Knowledge Distillation Using a Device-Aware Teacher for Low-Complexity Acoustic Scene Classification

Seunggyu Jeong and Seongeon Kim

Department of Artificial Intelligence, Seoul National University of Science and Technology, Seoul, South Korea

Jeong_SEOULTECH_task1_1 Jeong_SEOULTECH_task1_2

PDF Code

Adaptive Knowledge Distillation Using a Device-Aware Teacher for Low-Complexity Acoustic Scene Classification

Seunggyu Jeong and Seongeon Kim
Department of Artificial Intelligence, Seoul National University of Science and Technology, Seoul, South Korea

Abstract

In this technical report, we describe our submission for Task 1, Low-Complexity Device-Robust Acoustic Scene Classification, of the DCASE 2025 Challenge. Our work tackles the dual challenges of strict complexity constraints and robust generalization to both seen and unseen devices, while also leveraging the new rule allowing the use of device labels at test time. Our proposed system is based on a knowledge distillation framework where an efficient CP-MobileNet student learns from a compact, specialized two-teacher ensemble. This ensemble combines a baseline PaSST teacher, trained with standard cross-entropy, and a ’generalization expert’ teacher. This expert is trained using our novel Device-Aware Feature Alignment (DAFA) loss, adapted from prior work, which explicitly structures the feature space for device robustness. To capitalize on the availability of test-time device labels, the distilled student model then undergoes a final device-specific fine-tuning stage. Our proposed system achieves a final accuracy of 57.93% on the development set, demonstrating a significant improvement over the official baseline, particularly on unseen devices.

System characteristics

Sampling rate	44.1kHz
Data augmentation	freq-mixstyle, mixup
Sampling rate	44.1kHz
Features	log-mel energies
Classifier	CNN, Transformer
Complexity management	knowledge distillation
Device information	per-device end-to-end fine-tuning
Number of models at inference	7
Model weight sharing	fully device-specific

PDF

Inference code

Domain-Specific External Data Pre-Training and Device-Aware Distillation for Data-Efficient Acoustic Scene Classification

Dominik Karasin, Ioan-Cristian Olariu, Michael Schöpf and Anna Szymańska

Institute of Computational Perception (CP), Johannes Kepler University (JKU) Linz, Linz, Austria

Karasin_JKU_task1_1 Karasin_JKU_task1_2 Karasin_JKU_task1_3 Karasin_JKU_task1_4

PDF Code

Domain-Specific External Data Pre-Training and Device-Aware Distillation for Data-Efficient Acoustic Scene Classification

Dominik Karasin, Ioan-Cristian Olariu, Michael Schöpf and Anna Szymańska
Institute of Computational Perception (CP), Johannes Kepler University (JKU) Linz, Linz, Austria

Abstract

In this technical report, we present our submission to the DCASE 2025 Challenge Task 1: Low-Complexity Acoustic Scene Classification with Device Information. Our approach centers on a compact CP-Mobile student model distilled via Bayesian ensemble averaging from different combinations of three teacher architectures: CP-ResNet, BEATs, and PaSST—using AudioSet pretrained check-points for the last two. We then fine-tune the student on each recording device to improve per-device classification accuracy. To compensate for the limited 25% train-split, we pre-train both teacher and student on CochlScene and apply data augmentation, of which Device Impulse Response augmentation was particularly effective.

System characteristics

Sampling rate	32kHz
Data augmentation	freq-mixstyle, DIR, time masking, frequency masking, time rolling
Sampling rate	32kHz
Features	log-mel energies
Classifier	RF-regularized CNN
Complexity management	precision_16, network design, knowledge distillation
Device information	per-device end-to-end fine-tuning
Number of models at inference	7
Model weight sharing	fully device-specific

PDF

Source code Inference code

Joint Feature and Output Distillation for Low-Complexity Acoustic Scene Classification

Haowen Li¹, Ziyi Yang¹, Mou Wang², Ee-Leng Tan¹, Junwei Yeow¹, Santi Peksi¹ and Woon-Seng Gan¹

¹Smart Nation TRANS Lab, Nanyang Technological University, Singapore, ²Institute of Acoustics, Chinese Academy of Sciences, Beijing, China

Li_NTU_task1_1 Li_NTU_task1_2

PDF Code

Joint Feature and Output Distillation for Low-Complexity Acoustic Scene Classification

Haowen Li¹, Ziyi Yang¹, Mou Wang², Ee-Leng Tan¹, Junwei Yeow¹, Santi Peksi¹ and Woon-Seng Gan¹
¹Smart Nation TRANS Lab, Nanyang Technological University, Singapore, ²Institute of Acoustics, Chinese Academy of Sciences, Beijing, China

Abstract

This report presents a dual-level knowledge distillation framework with multi-teacher guidance for low-complexity acoustic scene classification (ASC) in DCASE2025 Task 1. We propose a distillation strategy that jointly transfers both soft logits and intermediate feature representations. Specifically, we pre-trained PaSST and CP-ResNet models as teacher models. Logits from teachers are averaged to generate soft targets, while one CP-ResNet is selected for feature-level distillation. This enables the compact student model (CP-Mobile) to capture both semantic distribution and structural information from teacher guidance. Experiments on the TAU Urban Acoustic Scenes 2022 Mobile dataset (development set) demonstrate that our submitted systems achieve up to 59.30% accuracy.

System characteristics

Sampling rate	32kHz
Data augmentation	freq-mixstyle, time rolling, DIR
Sampling rate	32kHz
Features	log-mel energies
Classifier	RF-regularized CNN
Complexity management	knowledge distillation, network design, precision_16; knowledge distillation, network design
Device information	per-device end-to-end fine-tuning, device-IR augmentation; per-device end-to-end fine-tuning, DIR
Number of models at inference	1
Model weight sharing	fully shared

PDF

Source code Inference code

Dynacp: Dynamic Parallel Selective Convolution in Cp-Mobile Under Multi-Teacher Distillation for Acoustic Scene Classification

Yuandong Luo¹, Hongqing Liu¹, Liming Shi² and Lu Gan³

¹Chongqing Key Lab of Mobile Communications Technology, Chongqing University of Posts and Telecommunications, Chongqing, China, ²School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China, ³College of Engineering, Design and Physical Science, Brunel University, London, U.K.

Luo_CQUPT_task1_1

PDF Code

Dynacp: Dynamic Parallel Selective Convolution in Cp-Mobile Under Multi-Teacher Distillation for Acoustic Scene Classification

Yuandong Luo¹, Hongqing Liu¹, Liming Shi² and Lu Gan³
¹Chongqing Key Lab of Mobile Communications Technology, Chongqing University of Posts and Telecommunications, Chongqing, China, ²School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China, ³College of Engineering, Design and Physical Science, Brunel University, London, U.K.

Abstract

This report introduces the acoustic scene classification (ASC) architecture submitted by the Chongqing University of Posts and Telecommunications – Audio Lab (CQUPT-AUL) for DCASE 2025 Task 1. The architecture is a lightweight and efficient network structure, termed as DynaCP. Built upon CP-Mobile, DynaCP dynamically selects between dilated convolutions with pooling or depth-wise convolutions with pooling at different network layers, thereby enhancing multi-scale feature representation with minimal computational overhead, while also alleviating the issue of information sparsity caused by dilated convolutions. To improve classification accuracy, a multi-teacher knowledge distillation approach is employed using pre-trained models of DYMN and MN. Experimental results demonstrate that DynaCP achieves competitive performance while maintaining low computational complexity.

System characteristics

Sampling rate	44.1kHz
Data augmentation	freq-mixstyle, pitch shifting, time rolling
Sampling rate	44.1kHz
Features	log-mel energies
Classifier	RF-regularized CNN
Complexity management	knowledge distillation, precision_16, network design
Device information	per-device end-to-end fine-tuning
Number of models at inference	7
Model weight sharing	fully device-specific

PDF

Inference code

Acoustic Scene Classification with Knowledge Distillation and Device-Specific Fine-Tuning for DCASE 2025

Mohamad Mahdee Ramezanee, Hossein Sharify, Amir Mohamad Mehrani Kia and Behnam Raoufi

Electrical Engineering, Sharif University of Technology, Tehran, Iran

Ramezanee_SUT_task1_1 Ramezanee_SUT_task1_2 Ramezanee_SUT_task1_3

PDF Code

Acoustic Scene Classification with Knowledge Distillation and Device-Specific Fine-Tuning for DCASE 2025

Mohamad Mahdee Ramezanee, Hossein Sharify, Amir Mohamad Mehrani Kia and Behnam Raoufi
Electrical Engineering, Sharif University of Technology, Tehran, Iran

Abstract

The objective of the acoustic scene classification task is to categorize audio recordings into one of ten predetermined environmental sound categories, such as urban parks or metro stations. This report to Task 1 of the DCASE 2025 Challenge, which emphasizes developing data-efficient, low-complexity systems for acoustic scene classification, addressing real-world constraints like limited training data and device mismatches [1]. Our model is designed with a reparameterizable convolutional structure that unifies multiple asymmetric kernels into a single efficient layer during inference, enabling both rich spatial representation and computational efficiency. It further integrates a novel attention-guided pooling strategy and a hybrid normalization scheme to enhance feature discrimination and stability throughout the network. Finally, we utilized ensemble learning of the newly defined teacher models and minimized the KL divergence between the student and teacher models to improve the results.

System characteristics

Sampling rate	32kHz
Data augmentation	freq-mixstyle, frequency masking, time masking, random noise, random gain, DIR
Sampling rate	32kHz
Features	log-mel energies
Classifier	CNN
Complexity management	network design, knowledge distillation, pruning, reparametrization
Device information	per-device end-to-end fine-tuning
Number of models at inference	7
Model weight sharing	fully device-specific

PDF

Source code Inference code

SNTL-Ntu Dcase25 Submission: Acoustic Scene Classification Using CNN-GRU Model Without Knowledge Distillation

Ee-Leng Tan¹, Jun Wei Yeow², Santi Peksi², Haowen Li², Ziyi Yang² and Woon-Seng Gan²

¹Smart Nation TRANS Lab, Nanyang Technological University, Singapore, ²Smart Nation TRANS Lab, Nanyang Technological Univeristy, Singapore, Singapore

Tan_SNTLNTU_task1_1 Tan_SNTLNTU_task1_2

PDF Code

SNTL-Ntu Dcase25 Submission: Acoustic Scene Classification Using CNN-GRU Model Without Knowledge Distillation

Ee-Leng Tan¹, Jun Wei Yeow², Santi Peksi², Haowen Li², Ziyi Yang² and Woon-Seng Gan²
¹Smart Nation TRANS Lab, Nanyang Technological University, Singapore, ²Smart Nation TRANS Lab, Nanyang Technological Univeristy, Singapore, Singapore

Abstract

In this technical report, we present the SNTL-NTU team’s Task 1 submission for the Low-Complexity Acoustic Scene Classification of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2025 challenge [1]. This submission departs from the typical application of knowledge distillation from a teacher to a student model, aiming to achieve high performance with limited complexity. The proposed model is based on a CNN-GRU model and is trained solely using the TAU Urban Acoustic Scene 2022 Mobile development dataset [2], without utilizing any external datasets, except for MicIRP [3], which is used for device impulse response (DIR) augmentation. Two models have been submitted to this challenge with memory usage not more than 117 KB and requiring 10.9M multiply-and-accumulate (MAC) operations. Using the development dataset, the proposed model achieved an accuracy of 60.25%.

System characteristics

Sampling rate	44.1kHz
Data augmentation	freq-mixstyle, DIR, SpecAug
Sampling rate	44.1kHz
Features	log-mel energies
Classifier	GRU-CNN
Complexity management	precision_16, network design
Device information	per-device end-to-end fine-tuning
Number of models at inference	7
Model weight sharing	fully device-specific

PDF

Inference code

Data-Efficient Acoustic Scene Classification via Ensemble Teachers Distillation and Pruning

Shuwei Zhang¹, Bing Han², Anbai Jiang³, Xinhu Zheng², Wei-Qiang Zhang³, Xie Chen², Pingyi Fan³, Cheng Lu⁴, Jia Liu^1,3 and Yanmin Qian²

¹Huakong AI, Beijing, China, ²Shanghai Jiao Tong University, Shanghai, China, ³Tsinghua University, Beijing, China, ⁴North China Electric Power University, Beijing, China

Zhang_AITHU-SJTU_task1_1 Zhang_AITHU-SJTU_task1_2 Zhang_AITHU-SJTU_task1_3 Zhang_AITHU-SJTU_task1_4

PDF Code

Data-Efficient Acoustic Scene Classification via Ensemble Teachers Distillation and Pruning

Shuwei Zhang¹, Bing Han², Anbai Jiang³, Xinhu Zheng², Wei-Qiang Zhang³, Xie Chen², Pingyi Fan³, Cheng Lu⁴, Jia Liu^1,3 and Yanmin Qian²
¹Huakong AI, Beijing, China, ²Shanghai Jiao Tong University, Shanghai, China, ³Tsinghua University, Beijing, China, ⁴North China Electric Power University, Beijing, China

Abstract

The goal of the acoustic scene classification task is to classify recordings into one of the ten predefined acoustic scene classes. In this report, we describe the submission of the THU-SJTU team for Task 1 Data-Efficient Low-Complexity Acoustic Scene Classification of the DCASE 2025 challenge. Our methods are consistent with those of last year. Firstly, we use an architecture named SSCP-Mobile (spatially separable), which enhances the CP-Mobile with spatially separable convolution structure and achieves lower computation expenses and better performance. Then we adopt several pre-trained PaSST models as ensemble teachers to teach CP-Mobile with knowledge distillation. After that, we use model pruning techniques to trim the model to meet the computational and parameter requirements of the competition. Finally, we will use knowledge distillation techniques again to fine-tune the pruned model and further improve its performance. Due to some reasons, our submissions included four systems that contain only general models, but we also attempted to use device type information to increase the performance of the system S1.

System characteristics

Sampling rate	32kHz
Data augmentation	freq-mixstyle, frequency masking, time masking, time rolling
Sampling rate	32kHz
Features	log-mel energies
Classifier	CNN
Complexity management	precision_16, network design, knowledge distillation, pruning
Number of models at inference	1
Model weight sharing	fully shared

PDF

Inference code

Adaptf-Sepnet: Audioset-Driven Adaptive Pre-Training of Tf-Sepnet for Multi-Device Acoustic Scene Classification

Zhou Ziyang¹, Yin Zeyu¹, Cai Yiqiang¹, Li Shengchen¹ and Shao Xi²

¹School of Advanced Technology, Xi'an Jiaotong Liverpool University, Suzhou, China, ²Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China

Zhou_XJTLU_task1_1

PDF Code

Adaptf-Sepnet: Audioset-Driven Adaptive Pre-Training of Tf-Sepnet for Multi-Device Acoustic Scene Classification

Zhou Ziyang¹, Yin Zeyu¹, Cai Yiqiang¹, Li Shengchen¹ and Shao Xi²
¹School of Advanced Technology, Xi'an Jiaotong Liverpool University, Suzhou, China, ²Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China

Abstract

This technical report presents our submission to DCASE 2025 Challenge Task 1: Low-Complexity Acoustic Scene Classification with Device Information. We propose a multi-device framework that leverages device-specific models trained with knowledge distillation techniques and enhanced through AudioSet pre-training. Our approach utilizes TF-SepNet as the backbone architecture, pre-trained on the large-scale AudioSet dataset to learn robust acoustic representations. For each of the known devices, a dedicated model is trained. At inference time, the system identifies the device source of the audio clip and selects the corresponding pre-trained model for classification. Evaluated on the test set, our device-specific system achieves an overall accuracy of 59.5%.

System characteristics

Sampling rate	32kHz
Data augmentation	mixup, freq-mixstyle, DIR
Sampling rate	32kHz
Features	log-mel spectrogram
Classifier	CNN (TF-SepNet)
Complexity management	network design, weight quantization, knowledge distillation
Device information	per-device end-to-end fine-tuning
Number of models at inference	7
Model weight sharing	fully device-specific

PDF

Inference code

Content

Task description

Teams ranking

Systems ranking

System complexity

Generalization performance

Class-wise performance

Device-wise performance

Cities

System characteristics

General characteristics

Machine learning characteristics

Technical reports

McCi Submission to DCASE 2025: Training Low-Complexity Acoustic Scene Classification System with Knowledge Distillation and Curriculum

McCi Submission to DCASE 2025: Training Low-Complexity Acoustic Scene Classification System with Knowledge Distillation and Curriculum

Abstract

System characteristics

Srib Submission for DCASE 2025 Challenge Task-1: Low-Complexity Acoustic Scene Classification with Device Information

Srib Submission for DCASE 2025 Challenge Task-1: Low-Complexity Acoustic Scene Classification with Device Information

Abstract

System characteristics

Hyu Submission for DCASE 2025 Task 1: Low-Complexity Acoustic Scene Classification Using Reparameterizable CNN with Channel-Time-Frequency Attention

Hyu Submission for DCASE 2025 Task 1: Low-Complexity Acoustic Scene Classification Using Reparameterizable CNN with Channel-Time-Frequency Attention

Abstract

System characteristics

Confidence-Aware Ensemble Knowledge Distillation for Low-Complexity Acoustic Scene Classification

Confidence-Aware Ensemble Knowledge Distillation for Low-Complexity Acoustic Scene Classification

Abstract

System characteristics

Adaptive Knowledge Distillation Using a Device-Aware Teacher for Low-Complexity Acoustic Scene Classification

Adaptive Knowledge Distillation Using a Device-Aware Teacher for Low-Complexity Acoustic Scene Classification

Abstract

System characteristics

Domain-Specific External Data Pre-Training and Device-Aware Distillation for Data-Efficient Acoustic Scene Classification

Domain-Specific External Data Pre-Training and Device-Aware Distillation for Data-Efficient Acoustic Scene Classification

Abstract

System characteristics

Joint Feature and Output Distillation for Low-Complexity Acoustic Scene Classification

Joint Feature and Output Distillation for Low-Complexity Acoustic Scene Classification

Abstract

System characteristics

Dynacp: Dynamic Parallel Selective Convolution in Cp-Mobile Under Multi-Teacher Distillation for Acoustic Scene Classification

Dynacp: Dynamic Parallel Selective Convolution in Cp-Mobile Under Multi-Teacher Distillation for Acoustic Scene Classification

Abstract

System characteristics

Acoustic Scene Classification with Knowledge Distillation and Device-Specific Fine-Tuning for DCASE 2025

Acoustic Scene Classification with Knowledge Distillation and Device-Specific Fine-Tuning for DCASE 2025

Abstract

System characteristics

SNTL-Ntu Dcase25 Submission: Acoustic Scene Classification Using CNN-GRU Model Without Knowledge Distillation

SNTL-Ntu Dcase25 Submission: Acoustic Scene Classification Using CNN-GRU Model Without Knowledge Distillation

Abstract

System characteristics

Data-Efficient Acoustic Scene Classification via Ensemble Teachers Distillation and Pruning

Data-Efficient Acoustic Scene Classification via Ensemble Teachers Distillation and Pruning

Abstract

System characteristics

Adaptf-Sepnet: Audioset-Driven Adaptive Pre-Training of Tf-Sepnet for Multi-Device Acoustic Scene Classification

Adaptf-Sepnet: Audioset-Driven Adaptive Pre-Training of Tf-Sepnet for Multi-Device Acoustic Scene Classification

Abstract

System characteristics