Task description

This task aims to build a foley sound synthesis system that can generate plausible audio signals fitting into given categories of foley sound. The foley sound categories are composed of sound events and environmental background sounds. The challenge has two subproblems – the development of models with and without external resources. Participants are expected to submit a system for one of the two problems, and each problem is evaluated independently. Submissions will be evaluated by Frechet Audio Distance (FAD), followed by a subjective test.

Systems ranking

Track A

A big THANK YOU to the DCASE community members and the contestants who spent several hours rating other teams' anonymized sounds for the perceptual evaluation stage (see column '# Categories Rated by Team Members' in the FAD table).

Perceptual Evaluation Score

The weighted average of the three ratings were based on a ratio of audio quality : category fit : diversity that was 2:2:1.

Rank	Submission Information		Weighted Average Score of Audio Quality, Category Fit, and Diversity									Audio Quality (MOS score w/ 10 steps)								Category Fit (MOS score w/ 10 steps)								Diversity (MOS score w/ 10 steps, weighted 0.5)
Rank	Submission Code	Technical Report	Official Rank	Average Score	Dog Bark	Footstep	Gun Shot	Keyboard	Moving Motor Vehicle	Rain	Sneeze/Cough	Average Score	Dog Bark	Footstep	Gun Shot	Keyboard	Moving Motor Vehicle	Rain	Sneeze/Cough	Average Score	Dog Bark	Footstep	Gun Shot	Keyboard	Moving Motor Vehicle	Rain	Sneeze/Cough	Average Score	Dog Bark	Footstep	Gun Shot	Keyboard	Moving Motor Vehicle	Rain	Sneeze/Cough
	DCASE2023_baseline_task7	DCASE2023baseline2023	6	3.810	2.688	4.160	3.237	5.150	3.862	4.175	3.400	3.831	2.930	4.158	3.504	5.137	3.543	4.115	3.432	3.789	2.447	4.162	2.969	5.163	4.182	4.235	3.368
	Chon_Gaudio_task7_trackA_1	ChonGLI2023	2	6.967	7.984	6.865	7.255	6.989	6.881	6.243	6.553	6.657	7.612	6.455	6.814	6.814	6.446	5.928	6.528	7.154	8.223	7.082	7.573	7.157	7.131	6.306	6.606	7.214	8.250	7.250	7.500	7.000	7.250	6.750	6.500
	Yi_SURREY_task7_trackA_1	YiSURREY2023	1	7.056	7.742	6.466	6.189	7.433	7.448	6.441	7.675	6.723	7.309	6.143	5.532	7.243	7.315	6.067	7.454	7.578	8.297	6.646	6.689	8.089	8.181	6.911	8.233	6.679	7.500	6.750	6.500	6.500	6.250	6.250	7.000
	Guan_HEU_task7_trackA_2	GuanHEU2023	4	5.157	4.877	4.450	6.413	5.479	5.822	5.201	3.856	4.670	3.800	4.164	5.800	5.339	5.365	4.972	3.250	5.293	5.142	3.836	7.482	5.232	6.315	5.656	3.389	5.857	6.500	6.250	5.500	6.250	5.750	4.750	6.000
	Scheibler_LINE_task7_trackA_1	ScheiblerLINE2023	3	6.887	7.333	6.832	7.317	7.199	6.474	5.222	7.834	6.355	6.479	6.263	6.771	6.886	6.131	4.780	7.180	7.327	7.479	7.192	7.896	7.861	7.054	5.150	8.655	7.071	8.750	7.250	7.250	6.500	6.000	6.250	7.500

FAD Score

Rank	Submission Information			Evaluation Dataset										Development Dataset
Rank	Submission Code	Technical Report	# Categories Rated by Team Members	Official Rank	FAD Rank	Average FAD	Dog Bark (FAD)	Footstep (FAD)	Gun Shot (FAD)	Keyboard (FAD)	Moving Motor Vehicle (FAD)	Rain (FAD)	Sneeze/Cough (FAD)	Average FAD	Dog Bark (FAD)	Footstep (FAD)	Gun Shot (FAD)	Keyboard (FAD)	Moving Motor Vehicle (FAD)	Rain (FAD)	Sneeze/Cough (FAD)
	DCASE2023_baseline_task7	DCASE2023baseline2023		6	6	9.702	13.412	8.108	7.952	5.230	16.107	13.338	3.771	8.701	13.614	6.826	6.152	5.065	11.239	14.449	3.563
	Chon_Gaudio_task7_trackA_1	ChonGLI2023	7	2	3	5.540	11.456	5.959	3.021	4.090	6.173	5.738	2.340	5.522	11.464	4.575	3.782	6.190	5.814	4.746	2.083
	Lee_maum_task7_trackA_1	Leemaum2023	4	9	9	12.937	9.265	6.924	10.451	6.488	37.748	7.778	11.903	11.331	9.716	4.858	8.672	5.227	29.206	10.450	11.187
	Lee_maum_task7_trackA_2	Leemaum2023	4	10	10	12.946	10.549	7.747	7.643	9.922	38.558	6.585	9.620	10.900	10.854	5.751	5.588	7.413	29.562	8.140	8.992
	Lee_maum_task7_trackA_3	Leemaum2023	4	8	8	12.429	11.719	6.903	7.287	9.292	35.209	6.787	9.804	10.586	12.056	5.742	5.420	7.242	26.474	8.043	9.126
	Lee_maum_task7_trackA_4	Leemaum2023	4	7	7	9.883	9.287	6.910	7.881	6.603	22.310	6.750	9.436	8.964	9.700	5.566	6.037	5.370	19.305	7.946	8.827
	Yi_SURREY_task7_trackA_1	YiSURREY2023	7	1	2	5.025	3.621	5.104	5.748	3.038	9.801	5.964	1.901	4.051	3.355	3.434	5.796	3.483	4.674	5.994	1.621
	Guan_HEU_task7_trackA_1	GuanHEU2023	7	5	5	8.623	5.583	10.143	8.428	5.403	17.984	7.561	5.258	7.941	5.893	9.118	7.485	7.706	12.818	7.874	4.692
	Guan_HEU_task7_trackA_2	GuanHEU2023	7	4	4	7.799	5.685	7.685	8.532	4.165	17.258	7.795	3.475	7.015	6.020	7.297	7.628	4.049	12.216	8.446	3.452
	Scheibler_LINE_task7_trackA_1	ScheiblerLINE2023	6	3	1	4.777	3.679	8.073	3.655	2.775	7.422	5.225	2.609	4.156	3.726	5.713	3.226	3.415	5.453	5.308	2.253

Track B

Perceptual Evaluation Score

In the case that multiple systems were submitted by one team, only the system with the highest FAD score per team was perceptually evaluated. The weighted average of the three ratings were based on a ratio of audio quality : category fit : diversity that was 2:2:1.

Rank	Submission Information		Weighted Average Score of Audio Quality, Category Fit, and Diversity									Audio Quality (MOS score w/ 10 steps)								Category Fit (MOS score w/ 10 steps)								Diversity (MOS score w/ 10 steps, weighted 0.5)
Rank	Submission Code	Technical Report	Official Rank	Average Score	Dog Bark	Footstep	Gun Shot	Keyboard	Moving Motor Vehicle	Rain	Sneeze/Cough	Average Score	Dog Bark	Footstep	Gun Shot	Keyboard	Moving Motor Vehicle	Rain	Sneeze/Cough	Average Score	Dog Bark	Footstep	Gun Shot	Keyboard	Moving Motor Vehicle	Rain	Sneeze/Cough	Average Score	Dog Bark	Footstep	Gun Shot	Keyboard	Moving Motor Vehicle	Rain	Sneeze/Cough
	DCASE2023_baseline_task7	DCASE2023baseline2023	18	3.810	2.688	4.160	3.237	5.150	3.862	4.175	3.400	3.831	2.930	4.158	3.504	5.137	3.543	4.115	3.432	3.789	2.447	4.162	2.969	5.163	4.182	4.235	3.368
	Kamath_NUS_task7_trackB_2	KamathNUS2023	3	4.647	4.807	4.073	5.010	4.276	5.248	4.013	5.102	3.988	3.789	3.554	4.346	3.911	4.642	3.378	4.295	4.612	4.979	3.629	5.054	4.029	5.727	3.406	5.459	6.036	6.500	6.000	6.250	5.500	5.500	6.500	6.000
	Chang_HYU_task7_trackB_1	ChangHYU2023	1	6.515	5.659	7.111	6.557	7.384	6.155	7.042	5.699	6.085	4.882	6.738	5.879	7.296	6.069	6.860	4.873	6.845	6.014	7.288	7.013	7.789	6.442	7.370	6.000	6.714	6.500	7.500	7.000	6.750	5.750	6.750	6.750
	Jung_KT_task7_trackB_2	JungKT2023	2	5.534	5.321	5.033	6.022	5.614	6.021	5.902	4.826	5.082	4.432	4.933	5.579	5.139	5.623	5.600	4.270	5.610	5.371	4.775	6.100	5.646	6.554	5.906	4.920	6.286	7.000	5.750	6.750	6.500	5.750	6.500	5.750
	Lee_MARG_task7_trackB_1	LeeMARG2023	4	4.427	3.273	4.843	3.941	5.409	4.942	4.210	4.374	3.929	2.530	4.204	3.531	5.311	4.542	3.735	3.650	4.443	3.153	4.654	3.696	5.336	5.312	4.040	4.910	5.393	5.000	6.500	5.250	5.750	5.000	5.500	4.750

FAD Score

Rank	Submission Information			Evaluation Dataset										Development Dataset
Rank	Submission Code	Technical Report	# Categories Rated by Team Members	Official Rank	FAD Rank	Average FAD	Dog Bark (FAD)	Footstep (FAD)	Gun Shot (FAD)	Keyboard (FAD)	Moving Motor Vehicle (FAD)	Rain (FAD)	Sneeze/Cough (FAD)	Average FAD	Dog Bark< br>(FAD)	Footstep (FAD)	Gun Shot (FAD)	Keyboard (FAD)	Moving Motor Vehicle (FAD)	Rain (FAD)	Sneeze/Cough (FAD)
	DCASE2023_baseline_task7	DCASE2023baseline2023		18	18	9.702	13.412	8.108	7.952	5.230	16.107	13.338	3.771	8.701	13.614	6.826	6.152	5.065	11.239	14.449	3.563
	Kamath_NUS_task7_trackB_1	KamathNUS2023	7	15	15	9.081	6.468	6.348	10.665	5.656	24.674	6.498	3.259	7.341	6.455	4.875	7.922	4.521	16.567	8.023	3.026
	Kamath_NUS_task7_trackB_2	KamathNUS2023	7	3	6	6.754	3.870	7.223	7.561	3.884	13.564	7.045	4.129	5.348	3.438	5.906	5.648	4.192	7.234	7.383	3.632
	Pillay_CMU_task7_trackB_1	PillayCMU2023	4	22	22	12.034	14.607	6.656	16.268	5.279	16.471	9.451	15.506	11.257	14.436	5.505	12.523	5.355	13.252	12.766	14.964
	Qianbin_BIT_task7_trackB_1	QianbinBIT2023	4	10	10	7.154	10.681	5.679	6.960	4.283	11.485	9.502	1.489	6.280	10.729	3.106	5.613	3.269	8.837	10.854	1.555
	Lee_maum_task7_trackB_1	Leemaum2023	4	26	26	12.862	9.692	6.948	9.263	6.341	37.965	8.098	11.729	11.267	10.197	4.868	7.472	5.289	29.329	10.744	10.973
	Lee_maum_task7_trackB_2	Leemaum2023	4	25	25	12.858	9.754	7.411	7.458	9.687	38.361	6.905	10.429	10.849	10.057	5.335	5.516	7.141	29.502	8.564	9.829
	Lee_maum_task7_trackB_3	Leemaum2023	4	23	23	12.276	11.651	7.373	7.606	9.407	34.061	6.267	9.566	10.366	11.890	6.002	5.418	7.333	25.451	7.558	8.913
	Lee_maum_task7_trackB_4	Leemaum2023	4	20	20	9.964	9.701	6.837	7.789	6.591	22.998	6.825	9.008	9.143	10.218	5.634	5.868	5.353	20.414	8.190	8.323
	Chang_HYU_task7_trackB_1	ChangHYU2023	4	1	7	6.898	4.677	5.736	6.407	4.753	18.859	5.892	1.965	4.422	4.317	3.597	5.311	2.432	10.177	3.398	1.722
	Chang_HYU_task7_trackB_2	ChangHYU2023	4	12	12	7.356	5.098	5.877	8.000	4.623	19.926	5.796	2.169	4.871	4.948	3.448	6.538	2.457	11.320	3.502	1.885
	Xie_SJTU_task7_trackB_1	XieSJTU2023	6	13	13	7.407	8.035	6.987	8.185	3.495	13.565	9.267	2.315	6.050	7.564	4.761	6.237	2.176	9.853	9.592	2.167
	Xie_SJTU_task7_trackB_2	XieSJTU2023	6	9	9	6.998	6.817	6.894	7.815	3.495	12.536	9.265	2.164	6.232	6.809	5.236	6.877	2.176	9.587	10.983	1.958
	Xie_SJTU_task7_trackB_3	XieSJTU2023	6	8	8	6.992	7.017	6.949	7.913	3.600	11.621	9.350	2.492	6.458	6.991	5.300	7.286	2.569	9.716	11.071	2.271
	Xie_SJTU_task7_trackB_4	XieSJTU2023	6	11	11	7.177	6.660	7.763	8.199	3.703	11.443	9.817	2.654	6.904	6.598	6.079	7.992	3.718	9.456	11.941	2.546
	QianXu_BIT_NUDT_task7_trackB_1	QianXuBIT2023	4	21	21	10.644	17.956	6.526	10.180	4.901	14.348	5.616	14.979	8.817	18.385	6.301	6.729	3.130	7.759	5.229	14.186
	QianXu_BIT_NUDT_task7_trackB_2	QianXuBIT2023	4	17	17	9.645	12.148	5.899	10.771	5.380	14.004	5.534	13.777	7.705	12.672	5.735	6.473	3.083	7.751	5.205	13.018
	QianXu_BIT_NUDT_task7_trackB_3	QianXuBIT2023	4	19	19	9.959	13.526	6.064	10.615	5.574	16.127	5.864	11.944	7.857	14.248	5.662	6.395	3.400	8.971	5.184	11.136
	QianXu_BIT_NUDT_task7_trackB_4	QianXuBIT2023	4	24	24	12.319	12.883	12.139	8.442	8.173	22.671	14.680	7.243	12.601	13.295	12.775	7.077	10.839	18.117	19.511	6.595
	Bai_JLESS_task7_trackB_1	BaiJLESS2023	0	27	27	13.583	15.958	8.663	18.485	6.728	24.094	15.193	5.958	12.437	17.510	8.497	16.824	6.956	18.737	12.874	5.662
	Chun_Chosun_task7_trackB_2	ChunChosun2023	5	14	14	8.351	8.690	7.265	10.764	5.602	13.941	9.512	2.684	7.376	8.382	6.203	8.294	3.748	10.974	11.562	2.467
	Wendner_JKU_task7_trackB_1	WendnerJKU2023	5	28	28	15.736	8.979	9.950	15.354	12.564	31.160	21.753	10.388	15.669	10.093	9.682	11.984	13.334	26.435	28.391	9.763
	Jung_KT_task7_trackB_1	JungKT2023	7	7	4	5.480	2.784	4.370	4.667	3.555	17.511	3.899	1.577	3.373	2.771	2.514	2.960	2.246	8.776	2.947	1.397
	Jung_KT_task7_trackB_2	JungKT2023	7	2	1	5.023	3.348	3.990	3.495	4.074	14.861	3.529	1.865	3.181	3.087	2.580	2.560	2.255	7.540	2.626	1.617
	Jung_KT_task7_trackB_3	JungKT2023	7	6	3	5.230	2.616	3.739	6.322	4.089	14.172	4.304	1.371	3.088	2.477	2.588	3.722	2.220	6.867	2.349	1.395
	Jung_KT_task7_trackB_4	JungKT2023	7	5	2	5.026	4.854	3.103	4.790	3.665	13.604	3.727	1.435	3.215	4.673	2.045	3.614	2.450	6.018	2.322	1.380
	Lee_MARG_task7_trackB_1	LeeMARG2023	4	4	5	6.409	6.947	4.563	10.657	3.900	11.602	5.491	1.699	4.766	7.778	3.712	8.208	3.584	4.359	4.386	1.332
	Chung_KAIST_task7_trackB_1	ChungKAIST2023	5	16	16	9.192	10.389	6.832	7.572	5.188	15.653	13.348	5.359	7.841	11.783	6.283	6.668	5.168	10.830	9.498	4.655

System characteristics

Summary of the submitted system characteristics.

Track A

Rank	Submission Code	Technical Report	System input	ML method	Phase reconstruction	Acoustic feature	System Complexity	Data Augmentation	Subsystem Count
6	DCASE2023_baseline_task7	DCASE2023baseline2023	sound event label	VQ-VAE, PixelSNAIL	HiFi-GAN	spectrogram	269992
2	Chon_Gaudio_task7_trackA_1	ChonGLI2023	sound event label	diffusion model	modified HiFi-GAN	spectrogram	642000000	mixup, time stretching
9	Lee_maum_task7_trackA_1	Leemaum2023	sound event label	VAE, GAN, flow, VITS, PhaseAug, Avocodo	HiFi-GAN	Gaussian latent variables	92319922	PhaseAug
10	Lee_maum_task7_trackA_2	Leemaum2023	sound event label	VAE, GAN, flow, VITS, PhaseAug, Avocodo	HiFi-GAN	Gaussian latent variables	92319922	PhaseAug
8	Lee_maum_task7_trackA_3	Leemaum2023	sound event label	VAE, GAN, flow, VITS, PhaseAug, Avocodo	HiFi-GAN	Gaussian latent variables	92319922	PhaseAug
7	Lee_maum_task7_trackA_4	Leemaum2023	sound event label	VAE, GAN, flow, VITS, PhaseAug, Avocodo, ensemble	HiFi-GAN	Gaussian latent variables	369279688	PhaseAug	4
1	Yi_SURREY_task7_trackA_1	YiSURREY2023	sound event label	diffusion model, VQ-VAE	HiFi-GAN	spectrogram	1173847474		2
5	Guan_HEU_task7_trackA_1	GuanHEU2023	sound event label, caption	AudioLDM			421000000
4	Guan_HEU_task7_trackA_2	GuanHEU2023	sound event label, caption	AudioLDM, Baseline			421269992
3	Scheibler_LINE_task7_trackA_1	ScheiblerLINE2023	sound event label	VQ-VAE, diffusion model	HiFi-GAN	log-mel spectrogram	977116210

Track B

Rank	Submission Code	Technical Report	System input	ML method	Phase reconstruction	Acoustic feature	System Complexity	Data Augmentation	Subsystem Count
18	DCASE2023_baseline_task7	DCASE2023baseline2023	sound event label	VQ-VAE, PixelSNAIL	HiFi-GAN	spectrogram	269992
15	Kamath_NUS_task7_trackB_1	KamathNUS2023	sound event label	StyleGAN2	phase gradient heap integration	log-magnitude spectrogram	62010138
3	Kamath_NUS_task7_trackB_2	KamathNUS2023	sound event label	StyleGAN2	phase gradient heap integration	log-magnitude spectrogram	376959933	time shifting, sound wrapping	7
22	Pillay_CMU_task7_trackB_1	PillayCMU2023	sound event label	VQ-VAE, PixelSNAIL	HiFi-GAN	spectrogram	103316216	time masking, frequency masking	3
10	Qianbin_BIT_task7_trackB_1	QianbinBIT2023	sound event label	VQ-VAE, PixelSNAIL, Bit-diffusion	HiFi-GAN	spectrogram	112857385		2
26	Lee_maum_task7_trackB_1	Leemaum2023	sound event label	VAE, GAN, flow, VITS, PhaseAug, Avocodo	HiFi-GAN	Gaussian latent variables	92319922	PhaseAug
25	Lee_maum_task7_trackB_2	Leemaum2023	sound event label	VAE, GAN, flow, VITS, PhaseAug, Avocodo	HiFi-GAN	Gaussian latent variables	92319922	PhaseAug
23	Lee_maum_task7_trackB_3	Leemaum2023	sound event label	VAE, GAN, flow, VITS, PhaseAug, Avocodo	HiFi-GAN	Gaussian latent variables	92319922	PhaseAug
20	Lee_maum_task7_trackB_4	Leemaum2023	sound event label	VAE, GAN, flow, VITS, PhaseAug, Avocodo, ensemble	HiFi-GAN	Gaussian latent variables	369279688	PhaseAug	4
1	Chang_HYU_task7_trackB_1	ChangHYU2023	sound event label	diffusion model	HiFi-GAN	log-mel spectrogram	23374056
12	Chang_HYU_task7_trackB_2	ChangHYU2023	sound event label	diffusion model	HiFi-GAN	log-mel spectrogram	23374056
13	Xie_SJTU_task7_trackB_1	XieSJTU2023	sound event label	VQ-VAE, Transformer	HiFi-GAN	spectrogram	28224194
9	Xie_SJTU_task7_trackB_2	XieSJTU2023	sound event label	VQ-VAE, Transformer, TransformerDecoder	HiFi-GAN	spectrogram	40843458	mixup	3
8	Xie_SJTU_task7_trackB_3	XieSJTU2023	sound event label	VQ-VAE, Transformer, TransformerDecoder, TrnsformerEncder Discriminator	HiFi-GAN	spectrogram	44037827	mixup	3
11	Xie_SJTU_task7_trackB_4	XieSJTU2023	sound event label	VQ-VAE, Transformer, TransformerDecoder, TrnsformerEncder Discriminator	HiFi-GAN	spectrogram	44037827	mixup	3
21	QianXu_BIT_NUDT_task7_trackB_1	QianXuBIT2023	sound	diffusion model		spectrogram	113668609
17	QianXu_BIT_NUDT_task7_trackB_2	QianXuBIT2023	sound	diffusion model		spectrogram	113668609
19	QianXu_BIT_NUDT_task7_trackB_3	QianXuBIT2023	sound	diffusion model		spectrogram	113668609
24	QianXu_BIT_NUDT_task7_trackB_4	QianXuBIT2023	sound	diffusion model		spectrogram	113668609	wavelet domain denoise
27	Bai_JLESS_task7_trackB_1	BaiJLESS2023	sound event label	CVAE-GAN	HiFi-GAN	spectrogram	8760000	gain, pitch shifting, time shifting, peak normalization	7
14	Chun_Chosun_task7_trackB_2	ChunChosun2023	sound event label	VQ-VAE, PixelSNAIL	HiFi-GAN	spectrogram	386598842		2
28	Wendner_JKU_task7_trackB_1	WendnerJKU2023	sound event label	diffusion model, ensemble			7167405	gain reduction, time shifting	7
7	Jung_KT_task7_trackB_1	JungKT2023	sound event label, random noise	C-SupConGAN	HiFi-GAN	mel spectrogram	21398259	fade in/out, time masking
2	Jung_KT_task7_trackB_2	JungKT2023	sound event label, random noise	C-SupConGAN	HiFi-GAN	mel spectrogram	21398259	fade in/out, time masking
6	Jung_KT_task7_trackB_3	JungKT2023	sound event label, random noise	C-SupConGAN	HiFi-GAN	mel spectrogram	21398259	fade in/out, time masking
5	Jung_KT_task7_trackB_4	JungKT2023	sound event label, random noise	C-SupConGAN	HiFi-GAN	mel spectrogram	21398259	fade in/out, time masking
4	Lee_MARG_task7_trackB_1	LeeMARG2023	sound event label	VQ-VAE, PixelSNAIL, StyleGAN2-ADA	HiFi-GAN, Griffin-Lim	spectrogram	116202572	time stretching, time shifting, RoomSimulator, TanhDistortion, resample, time masking, pitch shift	6
16	Chung_KAIST_task7_trackB_1	ChungKAIST2023	sound event label	diffusion model			87330433

Wav files used for evaluation experiment

DCASE2023 Task 7 wav files used for evaluation, repository (4.0 GB)
version 2.0

Technical reports

JLESS Submission to DCASE2023 Task7: Foley Sound Synthesis Using Non-Autoagressive Generative Model

Siwei Huang, Jisheng Bai, Yafei Jia, Jianfeng Chen

School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, China, LianFeng Acoustic Technologies Co., Ltd. Xi'an, China

Bai_JLESS_task7_trackB_1

System input	sound event label
Machine learning method	CVAE-GAN
Phase reconstruction method	HiFi-GAN
Acoustic features	spectrogram
Data augmentation	gain, pitch shifting, time shifting, peak normalization
Subsystem count	7
System comprexity	8760000 parameters

Content

Task description

Systems ranking

Track A

Perceptual Evaluation Score

FAD Score

Track B

Perceptual Evaluation Score

FAD Score

System characteristics

Track A

Track B

Wav files used for evaluation experiment

Technical reports

JLESS Submission to DCASE2023 Task7: Foley Sound Synthesis Using Non-Autoagressive Generative Model

JLESS Submission to DCASE2023 Task7: Foley Sound Synthesis Using Non-Autoagressive Generative Model

Abstract

System characteristics

HYU Submission For The DCASE 2023 Task 7: Diffusion Probabilistic Model With Adversarial Training For Foley Sound Synthesis

HYU Submission For The DCASE 2023 Task 7: Diffusion Probabilistic Model With Adversarial Training For Foley Sound Synthesis

Abstract

System characteristics

FALL-E: Gaudio Foley Synthesis System

FALL-E: Gaudio Foley Synthesis System

Abstract

System characteristics

High-Quality Foley Sound Synthesis Using Monte Carlo Dropout

High-Quality Foley Sound Synthesis Using Monte Carlo Dropout

Abstract

System characteristics

Foley Sound Synthesis In Waveform Domain With Diffusion Model

Foley Sound Synthesis In Waveform Domain With Diffusion Model

Abstract

System characteristics

Foley Sound Synthesis With AudioLDM For DCASE2023 Task 7

Foley Sound Synthesis With AudioLDM For DCASE2023 Task 7

Abstract

System characteristics

Foley Sound Synthesis Based On GAN Using Contrastive Learning Without Label Information

Foley Sound Synthesis Based On GAN Using Contrastive Learning Without Label Information

Abstract

System characteristics

DCASE Task-7: StyleGAN2-Based Foley Sound Synthesis

DCASE Task-7: StyleGAN2-Based Foley Sound Synthesis

Abstract

System characteristics

Foley Sound Synthesis at the DCASE 2023 Challenge

Foley Sound Synthesis at the DCASE 2023 Challenge

Abstract

System characteristics

Conditional Foley Sound Synthesis With Limited Data: Two-Stage Data Augmentation Approach With StyleGAN2-ADA

Conditional Foley Sound Synthesis With Limited Data: Two-Stage Data Augmentation Approach With StyleGAN2-ADA

Abstract

System characteristics

VIFS: An End-To-End Variational Inference For Foley Sound Synthesis

VIFS: An End-To-End Variational Inference For Foley Sound Synthesis

Abstract

System characteristics

DCASE Task 7: Foley Sound Synthesis

DCASE Task 7: Foley Sound Synthesis

Abstract

System characteristics

Auto-Bit for DCASE2023 Task7 Technical Reports: Assemble System of BitDiffusion and PixelSNAIL

Auto-Bit for DCASE2023 Task7 Technical Reports: Assemble System of BitDiffusion and PixelSNAIL

Abstract

System characteristics

From Noise To Sound: Audio Synthesis Via Diffusion Models

From Noise To Sound: Audio Synthesis Via Diffusion Models

Abstract

System characteristics

Class-Conditioned Latent Diffusion Model For DCASE 2023 Foley Sound Synthesis Challenge

Class-Conditioned Latent Diffusion Model For DCASE 2023 Foley Sound Synthesis Challenge

Abstract

System characteristics

Audio Diffusion For Foley Sound Synthesis

Audio Diffusion For Foley Sound Synthesis

Abstract

System characteristics

The X-LANCE System For DCASE2023 Challenge Task 7: Foley Sound Synthesis Track B

The X-LANCE System For DCASE2023 Challenge Task 7: Foley Sound Synthesis Track B