The submission deadline is May 15th 2023 23:59 Anywhere on Earth (AoE)
Introduction
Challenge submission consists of a submission package (one zip package) containing system outputs, system meta information, and technical report (pdf file).
Submission process shortly:
- Participants run their system with an evaluation dataset, and produce the system output in the specified format. Participants are allowed to submit 4 different system outputs per task or subtask.
- Participants create a meta-information file to go along the system output to describe the system used to produce this particular output. Meta information file has a predefined format to help the automatic handling of the challenge submissions. Information provided in the meta file will be later used to produce challenge results. Participants should fill in all meta information and make sure meta information file follows defined formatting.
- Participants describe their system in a technical report in sufficient detail. There is a template provided for the technical report.
- Participants prepare the submission package (zip-file). The submission package contains system outputs, a maximum of 4 per task, systems meta information, and the technical report.
- Participants submit the submission package and the technical report to DCASE2023 Challenge.
Please read carefully the requirements for the files included in the submission package!
Submission system
The submission system is now available:
- Create a user account and login
- Go to the "All Conferences" tab in the system and type DCASE to filter the list
- Select "2023 Challenge on Detection and Classification of Acoustic Scenes and Events"
- Create a new submission
The technical report in the submission package must contain at least the title, authors, and abstract. An updated camera-ready version of the technical report can be submitted separately until 22 May 2023 (AOE).
Note: the submission system does not any send a confirmation email. You can check that your submission has been taken into account in your author console. A confirmation email will be sent to all participants once the submissions are closed.
By submitting to the challenge, participants agree for the system output to be evaluated and to be published together with the results and the technical report on the DCASE Challenge website under CC-BY license.
Submission package
Participants are instructed to pack their system output(s), system meta information, and technical report into one zip-package. Example package:
Please prepare your submission zip-file as the provided example. Follow the same file structure and fill meta information with a similar structure as the one in *.meta.yaml
-files. The zip-file should contain system outputs for all tasks/subtasks, maximum of 4 submissions per task/subtask, separate meta information for each system, and technical report(s) covering all submitted systems.
If you submit similar systems for multiple tasks, you can describe everything in one technical report. If your approaches for different tasks are significantly different, prepare one technical report for each and include it in the corresponding task folder.
More detailed instructions for constructing the package can be found in the following sections. The technical report template is available here.
Scripts for checking the content of the submission package is provided for selected tasks, please validate your submission package accordingly.
For task 1, use validator code from repository
For task 4, use validator script task4/validate_submissions.py
from the example submission package
For task 3, you can submit up to 4 systems per one of the two task tracks, up to 4 systems for models using audio-only input, and up to 4 systems for models using audio and video input. To make easier the distinction between the two tracks, please use task_3a for audio-only systems and task_3b for audiovisual systems. If you submit systems of both types, you can describe them in a single report, or even better a separate on systems of each type.
Submission label
A submission label is used to index all your submissions (systems per tasks). To avoid overlapping labels among all submitted systems, use the following way to form your label:
[Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number][subtask letter (optional)]_[index number of your submission (1-4)]
For example, the baseline systems would have the following labels:
Martin_TAU_task1_1
Dohi_HIT_task2_1
Politis_TAU_task3a_1
Shimada_SONY_task3b_1
Ronchini_INR_task4a_1
Martin_TAU_task4b_1
Morfi_QMUL_task5_1
Gontier_INR_task6a_1
Xie_TAU_task6b_1
Choi_GLI_task7a_1
Choi_GLI_task7b_1
A script for checking the content of the submission package will be provided for selected tasks. In that case, please validate your submission package accordingly.
Package structure
Make sure your zip-package follows provided file naming convention and directory structure:
Zip-package root │ └───task1 Task 1 submissions │ │ Martin_TAU_task1.technical_report.pdf Technical report covering all subtasks │ │ │ └───Martin_TAU_task1_1 System 1 submission files │ │ Martin_TAU_task1_1.meta.yaml System 1 meta information │ │ Martin_TAU_task1_1.output.csv System 1 output │ : │ └───Martin_TAU_task1_4 System 4 submission files │ Martin_TAU_task1_4.meta.yaml System 4 meta information │ Martin_TAU_task1_4.output.csv System 4 output │ └───task2 Task 2 submissions │ │ Dohi_HIT_task2_1.technical_report.pdf Technical report │ │ │ └───Dohi_HIT_task2_1 System 1 submission files │ │ Dohi_HIT_task2_1.meta.yaml System 1 meta information │ │ anomaly_score_bandsaw_section_00_test.csv System 1 output for each section and domain in the evaluation dataset │ │ anomaly_score_grinder_section_00_test.csv │ │ anomaly_score_shaker_section_00_test.csv │ : : │ │ anomaly_score_Vacuum_section_00_test.csv │ │ decision_result_bandsaw_section_00_test.csv │ │ decision_result_grinder_section_00_test.csv │ │ decision_result_shaker_section_00_test.csv │ : : │ │ decision_result_Vacuum_section_00_test.csv │ │ │ └───Dohi_HIT_task2_4 System 4 submission files │ Dohi_HIT_task2_4.meta.yaml System 4 meta information │ anomaly_score_bandsaw_section_00_test.csv System 1 output for each section and domain in the evaluation dataset │ anomaly_score_grinder_section_00_test.csv │ anomaly_score_shaker_section_00_test.csv │ : │ anomaly_score_Vacuum_section_00_test.csv │ decision_result_bandsaw_section_00_test.csv │ decision_result_grinder_section_00_test.csv │ decision_result_shaker_section_00_test.csv │ : │ decision_result_Vacuum_section_00_test.csv │ └───task3 Task 3 submissions │ │ Politis-Shimada_TAU-SONY_task3.technical_report.pdf Technical report │ │ Politis_TAU_task3a.technical_report.pdf (Optional) Technical report only for audio-only system (Track A) │ │ Shimada_SONY_task3b.technical_report.pdf (Optional) Technical report only for audiovisual system (Track B) │ │ │ └───Politis_TAU_task3a_1 Track A (audio-only) System 1 submission files │ │ Politis_TAU_task3_1.meta.yaml Track A (audio-only) System 1 meta information │ └─────Politis_TAU_task3_1 Track A (audio-only) System 1 output files in a folder | | mix001.csv | | ... │ : │ │ │ └───Politis_TAU_task3a_4 Track A (audio-only) System 4 submission files │ | Politis_TAU_task3_4.meta.yaml Track A (audio-only) System 4 meta information │ └─────Politis_TAU_task3_4 Track A (audio-only) System 4 output files in a folder | | mix001.csv | | ... | | │ └───Shimada_SONY_task3b_1 Track B (audiovisual) System 1 submission files │ │ Shimada_SONY_task3b_1.meta.yaml Track B (audiovisual) System 1 meta information │ └─────Shimada_SONY_task3b_1 Track B (audiovisual) System 1 output files in a folder | | mix001.csv | | ... │ : │ │ │ └───Shimada_SONY_task3b_4 Track B (audiovisual) System 4 submission files (audiovisual) │ | Shimada_SONY_task3b_4.meta.yaml Track B (audiovisual) System 4 meta information (audiovisual) │ └─────Shimada_SONY_task3b_4 Track B (audiovisual) System 4 output files in a folder (audiovisual) | | mix001.csv | | ... │ └───task4 Task 4 submissions │ │ Ronchini-Martin_PM-TAU_task4.technical_report.pdf Technical report (joint report for subtasks A and B) │ │ Ronchini_PM_task4a.technical_report.pdf (optional) Technical report for subtask A only │ │ Martin_TAU_task4b.technical_report.pdf (optional) Technical report for subtask B only │ │ validate_submissions.py Submission validation code │ │ readme.md Instructions how to use the submission validation code │ │ │ └───Ronchini_PM_task4a_1 Subtask A System 1 submission files │ │ Ronchini_PM_task4a_1.meta.yaml Subtask A System 1 meta information │ │ Ronchini_PM_task4a_1.output.csv Subtask A System 1 output │ : │ │ │ └───Ronchini_PM_task4a_4 Subtask A System 4 submission files │ | Ronchini_PM_task4a_4.meta.yaml Subtask A System 4 meta information │ | Ronchini_PM_task4a_4.output.csv Subtask A System 4 output | | │ └───Martin_TAU_task4b_1 Subtask B System 1 submission files │ | Martin_TAU_task4b_1.meta.yaml Subtask B System 1 meta information │ | Martin_TAU_task4b_1.output.csv Subtask B System 1 output │ : │ └───Martin_TAU_task4b_4 Subtask B System 4 submission files │ Martin_TAU_task4b_4.meta.yaml Subtask B System 4 meta information │ Martin_TAU_task4b_4.output.csv Subtask B System 4 output │ └───task5 Task 5 submissions │ │ Morfi_QMUL_task5.technical_report.pdf Technical report │ │ │ └───Morfi_QMUL_task5_1 System 1 submission files │ │ Morfi_QMUL_task5_1.meta.yaml System 1 meta information │ │ Morfi_QMUL_task5_1.output.csv System 1 output │ : │ │ │ └───Morfi_QMUL_task5_4 System 4 submission files │ Morfi_QMUL_task5_4.meta.yaml System 4 meta information │ Morfi_QMUL_task5_4.output.csv System 4 output │ └───task6 Task 6 submissions │ │ Gontier_INR_task6_1.technical_report.pdf Technical report (joint report for subtasks A and B) │ │ Gontier_INR_task6a_1.technical_report.pdf (optional) Technical report for subtask A system only │ │ Xie_TAU_task6b_1.technical_report.pdf (optional) Technical report for subtask B system only │ │ │ └───Gontier_INR_task6a_1 Subtask A System 1 submission files │ │ Gontier_INR_task6a_1.meta.yaml Subtask A System 1 meta information │ │ Gontier_INR_task6a_1.output.csv Subtask A System 1 output │ : │ │ │ └───Gontier_INR_task6a_4 Subtask A System 4 submission files │ │ Gontier_INR_task6a_4.meta.yaml Subtask A System 4 meta information │ │ Gontier_INR_task6a_4.output.csv Subtask A System 4 output │ │ │ └───Xie_TAU_task6b_1 Subtask B System 1 submission files │ │ Xie_TAU_task6b_1.meta.yaml Subtask B System 1 meta information │ │ Xie_TAU_task6b_1.output.csv Subtask B System 1 output │ : │ │ │ └───Xie_TAU_task6b_4 Subtask B System 4 submission files │ Xie_TAU_task6b_4.meta.yaml Subtask B System 4 meta information │ Xie_TAU_task6b_4.output.csv Subtask B System 4 output │ └───task7 Task 7 submissions │ Choi_GLI_task7.technical_report.pdf Technical report │ └───Choi_GLI_task7a_1 Track A (with external resources) System 1 submission files │ Choi_GLI_task7a_1.meta.yaml Track A (with external resources) System 1 meta information : └───Choi_GLI_task7a_4 Track A (with external resources) System 4 submission files │ Choi_GLI_task7a_4.meta.yaml Track A (with external resources) System 4 meta information │ └───Choi_GLI_task7b_1 Track B (without external resources) System 1 submission files │ Choi_GLI_task7b_1.meta.yaml Track B (without external resources) System 1 meta information └───Choi_GLI_task7b_4 Track B (without external resources) System 4 submission files Choi_GLI_task7b_4.meta.yaml Track B (without external resources) System 4 meta information
System outputs
Participants must submit the results for the provided evaluation datasets.
-
Follow the system output format specified in the task description.
-
Tasks are independent. You can participate in a single task or multiple tasks.
-
Multiple submissions for the same task are allowed (maximum 4 per task). Use a running index in the submission label, and give more detailed names for the submitted systems in the system meta information files. Please mark carefully the connection between the submitted systems and system parameters description in the technical report (for example by referring to the systems by using the submission label or system name given in the system meta information file).
-
Submitted system outputs will be published online on the DCASE2023 website later to allow future evaluations.
Meta information
In order to enable fast processing of the submissions and meta analysis of submitted systems, participants should provide meta information presented in a structured and correctly formatted YAML-file. Participants are advised to fill in the meta information carefully while making sure all asked information is correctly provided.
A complete meta file will help us notice possible errors before officially publishing the results (for example unexpectedly large difference in performance between development and evaluation set) and allow contacting the authors in case we consider it necessary. Please note that task organizers may ask you to update the meta file after the challenge submission deadline.
See the example meta files below for each baseline system. These examples are also available in the example submission package. Meta file structure is mostly the same for all tasks, only the metrics collected in results->development_dataset
-section differ per challenge task.
Example meta information file for Task 1 baseline system task1/Martin_TAU_task1_1/Martin_TAU_task1_1.meta.yaml
:
# Submission information
submission:
# Submission label
# Label is used to index submissions.
# Generate your label following way to avoid
# overlapping codes among submissions:
# [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
label: Martin_TAU_task1_1
# Submission name
# This name will be used in the results tables when space permits
name: DCASE2022 baseline system
# Submission name abbreviated
# This abbreviated name will be used in the results table when space is tight.
# Use maximum 10 characters.
abbreviation: Baseline
# Authors of the submitted system. Mark authors in
# the order you want them to appear in submission lists.
# One of the authors has to be marked as corresponding author,
# this will be listed next to the submission in the results tables.
authors:
# First author
- lastname: Martín Morató
firstname: Irene
email: irene.martinmorato@tuni.fi # Contact email address
corresponding: true # Mark true for one of the authors
# Affiliation information for the author
affiliation:
abbreviation: TAU
institute: Tampere University
department: Computing Sciences # Optional
location: Tampere, Finland
# Second author
- lastname: Heittola
firstname: Toni
email: toni.heittola@tuni.fi # Contact email address
# Affiliation information for the author
affiliation:
abbreviation: TAU
institute: Tampere University
department: Computing Sciences # Optional
location: Tampere, Finland
# Third author
- lastname: Mesaros
firstname: Annamaria
email: annamaria.mesaros@tuni.fi
# Affiliation information for the author
affiliation:
abbreviation: TAU
institute: Tampere University
department: Computing Sciences
location: Tampere, Finland
# Fourth author
- lastname: Virtanen
firstname: Tuomas
email: tuomas.virtanen@tuni.fi
# Affiliation information for the author
affiliation:
abbreviation: TAU
institute: Tampere University
department: Computing Sciences
location: Tampere, Finland
# System information
system:
# System description, meta data provided here will be used to do
# meta analysis of the submitted system.
# Use general level tags, when possible use the tags provided in comments.
# If information field is not applicable to the system, use "!!null".
description:
# Audio input / sampling rate
# e.g. 16kHz, 22.05kHz, 44.1kHz, 48.0kHz
input_sampling_rate: 44.1kHz
# Acoustic representation
# one or multiple labels, e.g. MFCC, log-mel energies, spectrogram, CQT, raw waveform, ...
acoustic_features: log-mel energies
# Embeddings
# e.g. VGGish, OpenL3, ...
embeddings: !!null
# Data augmentation methods
# e.g. mixup, time stretching, block mixing, pitch shifting, ...
data_augmentation: !!null
# Machine learning
# In case using ensemble methods, please specify all methods used (comma separated list).
# one or multiple, e.g. GMM, HMM, SVM, MLP, CNN, RNN, CRNN, ResNet, ensemble, ...
machine_learning_method: CNN
# Ensemble method subsystem count
# In case ensemble method is not used, mark !!null.
# e.g. 2, 3, 4, 5, ...
ensemble_method_subsystem_count: !!null
# Decision making methods
# e.g. "average", "majority vote", "maximum likelihood", ...
decision_making: !!null
# External data usage method
# e.g. "directly", "embeddings", "pre-trained model", ...
external_data_usage: embeddings
# Method for handling the complexity restrictions
# e.g. "weight quantization", "sparsity", "pruning", ...
complexity_management: weight quantization
# System training/processing pipeline stages
# e.g. "pretraining", "training" (from scratch), "pruning", "weight quantization", ...
pipeline: pretraining, training, adaptation, pruning, weight quantization
# Machine learning framework
# e.g. keras/tensorflow, pytorch, matlab, ...
framework: keras/tensorflow
# System complexity, meta data provided here will be used to evaluate
# submitted systems from the computational load perspective.
complexity:
# Total model size in bytes. Calculated as [parameter count]*[bit per parameter]/8
total_model_size: 46512 # B
# Total amount of parameters used in the acoustic model.
# For neural networks, this information is usually given before training process
# in the network summary.
# For other than neural networks, if parameter count information is not directly
# available, try estimating the count as accurately as possible.
# In case of ensemble approaches, add up parameters for all subsystems.
# In case embeddings are used, add up parameter count of the embedding
# extraction networks and classification network
# Use numerical value.
total_parameters: 46512
# Total amount of non-zero parameters in the acoustic model.
# Calculated with same principles as "total_parameters".
# Use numerical value.
total_parameters_non_zero: 46512
# Model size calculated using NeSsi, as instructed in task description page.
# Use numerical value
memory_use: 65280 # B
# MACS
# Required for the submission ranking!
macs: 29234920
energy_consumption:
# Energy consumption while training the model. Unit is kWh.
training: 0.302 #kWh
# Energy consumption while producing output for all files in the evaluation dataset. Unit is kWh.
inference: 0.292 #kWh
# Baseline system's energy consumption while producing output for all files in the evaluation dataset. Unit is kWh.
# Run baseline code to get this value. Value is used to normalize training and inference values from your system.
baseline_inference: 0.292 #kWh
# List of external datasets used in the submission.
# Development dataset is used here only as example, list only external datasets
external_datasets:
# Below an example how to fill the field as a list of datasets
# Dataset name
#- name: TAU Urban Acoustic Scenes 2022 Mobile, Development dataset
# # Dataset access url
# url: https://zenodo.org/record/6337421
# # Total audio length in minutes
# total_audio_length: 2400 # minutes
# URL to the source code of the system [optional]
source_code: https://github.com/marmoi/dcase2022_task1_baseline
# System results
results:
development_dataset:
# System results for development dataset with provided the cross-validation setup.
# Full results are not mandatory, however, they are highly recommended
# as they are needed for through analysis of the challenge submissions.
# If you are unable to provide all results, also incomplete
# results can be reported.
# Overall metrics
overall:
logloss: 1.575
accuracy: 42.9 # mean of class-wise accuracies
# Class-wise metrics
class_wise:
airport:
logloss: 1.534
accuracy: 39.4
bus:
logloss: 1.758
accuracy: 29.3
metro:
logloss: 1.382
accuracy: 47.9
metro_station:
logloss: 1.672
accuracy: 36.0
park:
logloss: 1.448
accuracy: 58.9
public_square:
logloss: 2.265
accuracy: 20.8
shopping_mall:
logloss: 1.385
accuracy: 51.4
street_pedestrian:
logloss: 1.822
accuracy: 30.1
street_traffic:
logloss: 1.025
accuracy: 70.6
tram:
logloss: 1.462
accuracy: 44.6
# Device-wise
device_wise:
a:
logloss: 1.109
accuracy: !!null
b:
logloss: 1.439
accuracy: !!null
c:
logloss: 1.374
accuracy: !!null
s1:
logloss: 1.621
accuracy: !!null
s2:
logloss: 1.559
accuracy: !!null
s3:
logloss: 1.531
accuracy: !!null
s4:
logloss: 1.813
accuracy: !!null
s5:
logloss: 1.800
accuracy: !!null
s6:
logloss: 1.931
accuracy: !!null
Example meta information file for Task 2 baseline system task2/Dohi_HIT_task2_1/Dohi_HIT_task2_1.meta.yaml
:
# Submission information
submission:
# Submission label
# Label is used to index submissions.
# Generate your label following way to avoid overlapping codes among submissions:
# [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
label: Dohi_HIT_task2_1
# Submission name
# This name will be used in the results tables when space permits.
name: DCASE2023 baseline system
# Submission name abbreviated
# This abbreviated name will be used in the results table when space is tight.
# Use a maximum of 10 characters.
abbreviation: Baseline
# Authors of the submitted system.
# Mark authors in the order you want them to appear in submission lists.
# One of the authors has to be marked as corresponding author, this will be listed next to the submission in the results tables.
authors:
# First author
- lastname: Dohi
firstname: Kota
email: kota.dohi.gr@hitachi.com # Contact email address
corresponding: true # Mark true for one of the authors
# Affiliation information for the author
affiliation:
institution: Hitachi, Ltd.
department: Research and Development Group # Optional
location: Tokyo, Japan
# Second author
- lastname: Imoto
firstname: Keisuke
email: keisuke.imoto@ieee.org
# Affiliation information for the author
affiliation:
institution: Doshisha University
location: Kyoto, Japan
# Third author
- lastname: Koizumi
firstname: Yuma
email: koizumi.yuma@ieee.org
# Affiliation information for the author
affiliation:
institution: Google LLC
location: Tokyo, Japan
# System information
system:
# System description, metadata provided here will be used to do a meta-analysis of the submitted system.
# Use general level tags, when possible use the tags provided in comments.
# If information field is not applicable to the system, use "!!null".
description:
# Audio input
# Please specify all sampling rates (comma-separated list).
# e.g. 16kHz, 22.05kHz, 44.1kHz
input_sampling_rate: 16kHz
# Data augmentation methods
# Please specify all methods used (comma-separated list).
# e.g. mixup, time stretching, block mixing, pitch shifting, ...
data_augmentation: !!null
# Front-end (preprocessing) methods
# Please specify all methods used (comma-separated list).
# e.g. HPSS, WPE, NMF, NN filter, RPCA, ...
front_end: !!null
# Acoustic representation
# one or multiple labels, e.g. MFCC, log-mel energies, spectrogram, CQT, raw waveform, ...
acoustic_features: log-mel energies
# Embeddings
# Please specify all pre-trained embedings used (comma-separated list).
# one or multiple, e.g. VGGish, OpenL3, ...
embeddings: !!null
# Machine learning
# In case using ensemble methods, please specify all methods used (comma-separated list).
# e.g. AE, VAE, GAN, GMM, k-means, OCSVM, normalizing flow, CNN, LSTM, random forest, ensemble, ...
machine_learning_method: AE
# Method for aggregating predictions over time
# Please specify all methods used (comma-separated list).
# e.g. average, median, maximum, minimum, ...
aggregation_method: average
# Method for domain generalizatoin and domain adaptation
# Please specify all methods used (comma-separated list).
# e.g. fine-tuning, invariant feature extraction, ...
domain_adaptation_method: !!null
domain_generalization_method: !!null
# Ensemble method subsystem count
# In case ensemble method is not used, mark !!null.
# e.g. 2, 3, 4, 5, ...
ensemble_method_subsystem_count: !!null
# Decision making in ensemble
# e.g. average, median, maximum, minimum, ...
decision_making: !!null
# Usage of the attribute information in the file names and attribute csv files
# Please specify all usages (comma-separated list).
# e.g. interpolation, extrapolation, condition ...
attribute_usage: !!null
# External data usage method
# Please specify all usages (comma-separated list).
# e.g. simulation of anomalous samples, embeddings, pre-trained model, ...
external_data_usage: !!null
# Usage of the development dataset
# Please specify all usages (comma-separated list).
# e.g. development, pre-training, fine-tuning
development_data_usage: development
# System complexity, metadata provided here may be used to evaluate submitted systems from the computational load perspective.
complexity:
# Total amount of parameters used in the acoustic model.
# For neural networks, this information is usually given before training process in the network summary.
# For other than neural networks, if parameter count information is not directly available, try estimating the count as accurately as possible.
# In case of ensemble approaches, add up parameters for all subsystems.
# In case embeddings are used, add up parameter count of the embedding extraction networks and classification network.
# Use numerical value.
total_parameters: 269992
# List of external datasets used in the submission.
# Development dataset is used here only as an example, list only external datasets
external_datasets:
# Dataset name
- name: DCASE 2023 Challenge Task 2 Development Dataset
# Dataset access URL
url: https://zenodo.org/record/7690157
# URL to the source code of the system [optional, highly recommended]
# Reproducibility will be used to evaluate submitted systems.
source_code: https://github.com/nttcslab/dcase2023_task2_baseline_ae
# System results
results:
development_dataset:
# System results for development dataset.
# Full results are not mandatory, however, they are highly recommended as they are needed for a thorough analysis of the challenge submissions.
# If you are unable to provide all results, also incomplete results can be reported.
# AUC for all domains [%]
# No need to round numbers
ToyCar:
auc_source: 70.10
auc_target: 46.89
pauc: 52.47
ToyTrain:
auc_source: 57.93
auc_target: 57.02
pauc: 48.57
fan:
auc_source: 80.19
auc_target: 36.18
pauc: 59.04
gearbox:
auc_source: 60.31
auc_target: 60.69
pauc: 53.22
bearing:
auc_source: 65.92
auc_target: 55.75
pauc: 50.42
slider:
auc_source: 70.31
auc_target: 48.77
pauc: 56.37
valve:
auc_source: 55.35
auc_target: 50.69
pauc: 51.18
Example meta information file for Task 3 baseline system task3/Politis_TAU_task3a_1/Politis_TAU_task3a_1.meta.yaml
:
# Submission information
submission:
# Submission label
# Label is used to index submissions, to avoid overlapping codes among submissions
# use following way to form your label:
# [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
label: Politis_TAU_task3a_1
# Submission name
# This name will be used in the results tables when space permits
name: DCASE2023 Audio-only Ambisonic baseline
# Submission name abbreviated
# This abbreviated name will be used in the results table when space is tight, maximum 10 characters
abbreviation: FOA_AO_base
# Submission authors in order, mark one of the authors as corresponding author.
authors:
# First author
- lastname: Politis
firstname: Archontis
email: archontis.politis@tuni.fi # Contact email address
corresponding: true # Mark true for one of the authors
# Affiliation information for the author
affiliation:
abbreviation: TAU
institute: Tampere University
department: Audio Research Group
location: Tampere, Finland
# Second author
- lastname: Shimada
firstname: Kazuki
email: kazuki.shimada@sony.com # Contact email address
# Affiliation information for the author
affiliation:
abbreviation: SONY
institute: SONY
department:
location: Tokyo, Japan
# System information
system:
# System description, meta data provided here will be used to do
# meta analysis of the submitted system. Use general level tags, if possible use the tags provided in comments.
# If information field is not applicable to the system, use "!!null".
description:
# Model type (audio-only or audiovisual track)
model_type: Audio # Audio or Audiovisual
# Audio input
input_format: Ambisonic # Ambisonic or Microphone Array or both
input_sampling_rate: 24kHz
# Acoustic representation
acoustic_features: mel spectra, intensity vector # e.g one or multiple [phase and magnitude spectra, mel spectra, GCC-PHAT, TDOA, intensity vector ...]
visual_features: !!null
# Data augmentation methods
data_augmentation: !!null # [time stretching, block mixing, pitch shifting, ...]
# Machine learning
# In case of using ensemble methods, please specify all methods used (comma separated list).
machine_learning_method: CRNN, MHSA # e.g one or multiple [GMM, HMM, SVM, kNN, MLP, CNN, RNN, CRNN, NMF, MHSA, random forest, ensemble, ...]
#List external datasets in case of use for training
external_datasets: !!null #AudioSet, ImageNet...
#List here pre-trained models in case of use
pre_trained_models: !!null #AST, PANNs...
# System complexity, meta data provided here will be used to evaluate
# submitted systems from the computational load perspective.
complexity:
# Total amount of parameters used in the acoustic model. For neural networks, this
# information is usually given before training process in the network summary.
# For other than neural networks, if parameter count information is not directly available,
# try estimating the count as accurately as possible.
# In case of ensemble approaches, add up parameters for all subsystems.
total_parameters: 500000
# URL to the source code of the system [optional]
source_code: https://github.com/sharathadavanne/seld-dcase2023
# System results
results:
development_dataset:
# System result for development dataset on the provided testing split.
# Overall score
overall:
ER_20: 0.71
F_20: 21.0
LE_CD: 29.3
LR_CD: 46.0
Example meta information file for Task 4 baseline system task4/Ronchini_PM_task4a_1/Ronchini_PM_task4a_1.meta.yaml
:
# Submission information
submission:
# Submission label
# Label is used to index submissions, to avoid overlapping codes among submissions
# use following way to form your label:
# [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
label: Turpault_INR_task4_1
# Submission name
# This name will be used in the results tables when space permits
name: DCASE2022 baseline system
# Submission name abbreviated
# This abbreviated name will be used in the results table when space is tight, maximum 10 characters
abbreviation: Baseline
# Submission authors in order, mark one of the authors as corresponding author.
authors:
# First author
- lastname: Turpault
firstname: Nicolas
email: nicolas.turpault@inria.fr # Contact email address
corresponding: true # Mark true for one of the authors
# Affiliation information for the author
affiliation:
abbreviation: INR
institute: Inria Nancy Grand-Est
department: Department of Natural Language Processing & Knowledge Discovery
location: Nancy, France
# Second author
- lastname: Serizel
firstname: Romain
email: romain.serizel@loria.fr # Contact email address
# Affiliation information for the author
affiliation:
abbreviation: ULO
institute: University of Lorraine, Loria
department: Department of Natural Language Processing & Knowledge Discovery
location: Nancy, France
#...
# System information
system:
# System description, meta data provided here will be used to do
# meta analysis of the submitted system. Use general level tags, if possible use the tags provided in comments.
# If information field is not applicable to the system, use "!!null".
description:
# Audio input
input_channels: mono # e.g. one or multiple [mono, binaural, left, right, mixed, ...]
input_sampling_rate: 16 # In kHz
# Acoustic representation
acoustic_features: log-mel energies # e.g one or multiple [MFCC, log-mel energies, spectrogram, CQT, ...]
# Data augmentation methods
data_augmentation: !!null # [time stretching, block mixing, pitch shifting, ...]
# Machine learning
# In case using ensemble methods, please specify all methods used (comma separated list).
machine_learning_method: CRNN # e.g one or multiple [GMM, HMM, SVM, kNN, MLP, CNN, RNN, CRNN, NMF, random forest, ensemble, ...]
# Ensemble method subsystem count
# In case ensemble method is not used, mark !!null.
ensemble_method_subsystem_count: !!null # [2, 3, 4, 5, ... ]
# Decision making methods
decision_making: !!null # [majority vote, ...]
# Semi-supervised method used to exploit both labelled and unlabelled data
machine_learning_semi_supervised: mean-teacher student # e.g one or multiple [pseudo-labelling, mean-teacher student...]
# Segmentation method
segmentation_method: !!null # E.g. [RBM, attention layers...]
# Post-processing, followed by the time span (in ms) in case of smoothing
post-processing: median filtering (93ms) # [median filtering, time aggregation...]
# System complexity, meta data provided here will be used to evaluate
# submitted systems from the computational load perspective.
complexity:
# Total amount of parameters used in the acoustic model. For neural networks, this
# information is usually given before training process in the network summary.
# For other than neural networks, if parameter count information is not directly available,
# try estimating the count as accurately as possible.
# In case of ensemble approaches, add up parameters for all subsystems.
total_parameters: 1112420
MACS: 44.683 G
# Approximate training time followed by the hardware used
trainining_time: 3h (1 GTX 1080 Ti)
# Model size in MB
model_size: 4.5
#Report here the energy consumption measured with e.g. codecarbon
energy_consumption:
training: 1.717
test: 0.030
#Energy consumption of the baseline (10 epochs) on your hardware
baseline: 0.02
# The training subsets used to train the model. Followed the amount of data (number of clips) used per subset.
subsets: # [weak (xx), unlabel_in_domain (xx), synthetic (xx), FUSS (xx)...]
#List here the external datasets you used for training
external_datasets: #AudioSet, ImageNet...
#List here the pre-trained models you used
pre_trained_models: #AST, PANNs...
# URL to the source code of the system [optional, highly recommended]
source_code: https://github.com/turpaultn/dcase20_task4/tree/public_branch/baseline
# System results
results:
# Full results are not mandatory, but for through analysis of the challenge submissions recommended.
# If you cannot provide all results, also incomplete results can be reported.
development_dataset:
# System result for development dataset with provided the cross-validation setup.
overall:
PSDS1: 0.420
PSDS2: 0.610
Example meta information file for Task 4 baseline system task4/Martin_TAU_task4b_1/Martin_TAU_task4b_1.meta.yaml
:
# Submission information
submission:
# Submission label
# Label is used to index submissions.
# Generate your label following way to avoid
# overlapping codes among submissions:
# [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
label: Martin_TAU_task4b_1
# Submission name
# This name will be used in the results tables when space permits
name: DCASE2023 task4b baseline system
# Submission name abbreviated
# This abbreviated name will be used in the results table when space is tight.
# Use maximum 10 characters.
abbreviation: Baseline_task4b
# Authors of the submitted system. Mark authors in
# the order you want them to appear in submission lists.
# One of the authors has to be marked as corresponding author,
# this will be listed next to the submission in the results tables.
authors:
# First author
- lastname: Martín Morató
firstname: Irene
email: irene.martinmorato@tuni.fi # Contact email address
corresponding: true # Mark true for one of the authors
# Affiliation information for the author
affiliation:
abbreviation: TAU
institute: Tampere University
department: Computing Sciences # Optional
location: Tampere, Finland
# Second author
- lastname: Mesaros
firstname: Annamaria
email: annamaria.mesaros@tuni.fi
# Affiliation information for the author
affiliation:
abbreviation: TAU
institute: Tampere University
department: Computing Sciences
location: Tampere, Finland
# Third author
- lastname: Heittola
firstname: Toni
email: toni.heittola@tuni.fi # Contact email address
# Affiliation information for the author
affiliation:
abbreviation: TAU
institute: Tampere University
department: Computing Sciences # Optional
location: Tampere, Finland
# System information
system:
# System description, meta data provided here will be used to do
# meta analysis of the submitted system.
# Use general level tags, when possible use the tags provided in comments.
# If information field is not applicable to the system, use "!!null".
description:
# Audio input / sampling rate
input_channels: mono
# e.g. 16kHz, 22.05kHz, 44.1kHz, 48.0kHz
input_sampling_rate: 44.1kHz
# Acoustic representation
# one or multiple labels, e.g. MFCC, log-mel energies, spectrogram, CQT, raw waveform, ...
acoustic_features: mel energies
# Data augmentation methods
# time stretching, block mixing, pitch shifting, ...
data_augmentation: !!null
# Machine learning
# In case using ensemble methods, please specify all methods used (comma separated list).
# one or multiple, e.g. GMM, HMM, SVM, MLP, CNN, RNN, CRNN, ResNet, ensemble, ...
machine_learning_method: CRNN
# Ensemble method subsystem count
# In case ensemble method is not used, mark !!null.
# e.g. 2, 3, 4, 5, ...
ensemble_method_subsystem_count: !!null
# Decision making methods
# e.g. average, majority vote, maximum likelihood, ...
decision_making: !!null
# Semi-supervised method used to exploit both labelled and unlabelled data
# e.g one or multiple [pseudo-labelling, mean-teacher student...]
machine_learning_semi_supervised: !!null
# Segmentation method
# E.g. [RBM, attention layers...]
segmentation_method: !!null
# Post-processing, followed by the time span (in ms) in case of smoothing
# [median filtering, time aggregation...]
post-processing: !!null
# The training subsets used to train the model. Followed the amount of data (number of clips) used per subset.
# [weak (xx), unlabel_in_domain (xx), synthetic (xx), FUSS (xx)...]
subsets: !!null
#List here the external datasets you used for training
#AudioSet, ImageNet...
external_datasets: !!null
#List here the pre-trained models you used
#AST, PANNs..
pre_trained_models: !!null.
# URL to the source code of the system [optional]
source_code: https://github.com/marmoi/dcase2023_task4b_baseline
# System results
results:
development_dataset:
# System results for development dataset with provided the cross-validation setup.
# Full results are not mandatory, however, they are highly recommended
# as they are needed for through analysis of the challenge submissions.
# If you are unable to provide all results, also incomplete
# results can be reported.
# Overall metrics
overall:
ER_m: 0.487
F1_m: 70.34 # segment-based 1 second for all the test folds
F1_M: 35.83
F1_MO: 42.87
Example meta information file for Task 5 baseline system task5/Morfi_QMUOL_task5_1/Morfi_QMUOL_task5_1.meta.yaml
:
# Submission information
submission:
# Submission label
# Label is used to index submissions, to avoid overlapping codes among submissions
# use the following way to form your label:
# [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
label: Morfi_QMUL_task5_1
# Submission name
# This name will be used in the results tables when space permits
name: Cross-correlation baseline
# Submission name abbreviated
# This abbreviated name will be used in the results table when space is tight, maximum 10 characters
abbreviation: xcorr_base
# Submission authors in order, mark one of the authors as corresponding author.
authors:
# First author
- lastname: Morfi
firstname: Veronica
email: g.v.morfi@qmul.ac.uk # Contact email address
corresponding: true # Mark true for one of the authors
# Affiliation information for the author
affiliation:
abbreviation: QMUL
institute: Queen Mary University of London
department: Centre for Digital Music
location: London, UK
# Second author
- lastname: Stowell
firstname: Dan
email: dan.stowell@qmul.ac.uk # Contact email address
# Affiliation information for the author
affiliation:
abbreviation: QMUL
institute: Queen Mary University of London
department: Centre for Digital Music
location: London, UK
#...
# System information
system:
# SED system description, meta data provided here will be used to do
# meta analysis of the submitted system. Use general level tags, if possible use the tags provided in comments.
# If information field is not applicable to the system, use "!!null".
description:
# Audio input
input_sampling_rate: any # In kHz
# Acoustic representation
acoustic_features: spectrogram # e.g one or multiple [MFCC, log-mel energies, spectrogram, CQT, PCEN, ...]
# Data augmentation methods
data_augmentation: !!null # [time stretching, block mixing, pitch shifting, ...]
# Embeddings
# e.g. VGGish, OpenL3, ...
embeddings: !!null
# Machine learning
# In case using ensemble methods, please specify all methods used (comma separated list).
machine_learning_method: template matching # e.g one or multiple [GMM, HMM, SVM, kNN, MLP, CNN, RNN, CRNN, NMF, random forest, ensemble, transformer, ...]
# the system adaptation for "few shot" scenario.
# For example, if machine_learning_method is "CNN", the few_shot_method might use one of [fine tuning, prototypical, MAML] in addition to the standard CNN architecture.
few_shot_method: template matching # e.g [fine tuning, prototypical, MAML, nearest neighbours...]
# External data usage method
# e.g. directly, embeddings, pre-trained model, ...
external_data_usage: !!null
# Ensemble method subsystem count
# In case ensemble method is not used, mark !!null.
ensemble_method_subsystem_count: !!null # [2, 3, 4, 5, ... ]
# Decision making methods (for ensemble)
decision_making: !!null # [majority vote, ...]
# Post-processing, followed by the time span (in ms) in case of smoothing
post-processing: peak picking, threshold # [median filtering, time aggregation...]
# System complexity, meta data provided here will be used to evaluate
# submitted systems from the computational load perspective.
complexity:
# Total amount of parameters used in the acoustic model. For neural networks, this
# information is usually given before training process in the network summary.
# For other than neural networks, if parameter count information is not directly available,
# try estimating the count as accurately as possible.
# In case of ensemble approaches, add up parameters for all subsystems.
total_parameters: !!null # note that for simple template matching, the "parameters"==the pixel count of the templates, plus 1 for each param such as thresholding.
# Approximate training time followed by the hardware used
trainining_time: !!null
# Model size in MB
model_size: !!null
# URL to the source code of the system [optional, highly recommended]
source_code:
# List of external datasets used in the submission.
# A previous DCASE development dataset is used here only as example! List only external datasets
external_datasets:
# Dataset name
- name: !!null
# Dataset access url
url: !!null
# Total audio length in minutes
total_audio_length: !!null # minutes
# System results
results:
# Full results are not mandatory, but for through analysis of the challenge submissions recommended.
# If you cannot provide all result details, also incomplete results can be reported.
validation_set:
overall:
F-score: 2.01 # percentile
# Per-dataset
dataset_wise:
HV:
F-score: 1.22 #percentile
PB:
F-score: 5.84 #percentile
Example meta information file for Task 6 baseline system task6/Gontier_INR_task6a_1/Gontier_INR_task6a_1.meta.yaml
:
# Submission information for task 6, subtask A
submission:
# Submission label
# Label is used to index submissions.
# Generate your label following way to avoid
# overlapping codes among submissions:
# [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
label: gontier_inr_task6a_1
#
# Submission name
# This name will be used in the results tables when space permits
name: DCASE2022 baseline system
#
# Submission name abbreviated
# This abbreviated name will be used in the results table when space is tight.
# Use maximum 10 characters.
abbreviation: Baseline
# Authors of the submitted system. Mark authors in
# the order you want them to appear in submission lists.
# One of the authors has to be marked as corresponding author,
# this will be listed next to the submission in the results tables.
authors:
# First author
- lastname: Gontier
firstname: Felix
email: felix.gontier@inria.fr # Contact email address
corresponding: true # Mark true for one of the authors
# Affiliation information for the author
affiliation:
abbreviation: INRIA
institute: INRIA
department: Multispeech # Optional
location: Nancy, France
# Second author...
# System information
system:
# System description, meta data provided here will be used to do
# meta analysis of the submitted system.
# Use general level tags, when possible use the tags provided in comments.
# If information field is not applicable to the system, use "!!null".
description:
# Audio input / sampling rate
# e.g. 16kHz, 22.05kHz, 44.1kHz, 48.0kHz
input_sampling_rate: 16kHz
# Acoustic representation
# Here you should indicate what can or audio representation
# you used. If your system used hand-crafted features (e.g.
# mel band energies), then you can do:
#
# `acoustic_features: mel energies`
#
# Else, if you used some pre-trained audio feature extractor,
# you can indicate the name of the system, for example:
#
# `acoustic_features: audioset`
acoustic_features: VGGish
# Word embeddings
# Here you can indicate how you treated word embeddings.
# If your method learned its own word embeddings (i.e. you
# did not used any pre-trained word embeddings) then you can
# do:
#
# `word_embeddings: learned`
#
# Else, specify the pre-trained word embeddings that you used
# (e.g. Word2Vec, BERT, etc).
word_embeddings: BART
# Data augmentation methods
# e.g. mixup, time stretching, block mixing, pitch shifting, ...
data_augmentation: !!null
# Method scheme
# Here you should indicate the scheme of the method that you
# used. For example:
machine_learning_method: encoder-decoder
# Learning scheme
# Here you should indicate the learning scheme.
# For example, you could specify either
# supervised, self-supervised, or even
# reinforcement learning.
learning_scheme: supervised
# Ensemble
# Here you should indicate if you used ensemble
# of systems or not.
ensemble: No
# Audio modelling
# Here you should indicate the type of system used for
# audio modelling. For example, if you used some stacked CNNs, then
# you could do:
#
# audio_modelling: cnn
#
# If you used some pre-trained system for audio modelling,
# then you should indicate the system used (e.g. COALA, COLA,
# transfomer).
audio_modelling: transformer
# Word modelling
# Similarly, here you should indicate the type of system used
# for word modelling. For example, if you used some RNNs,
# then you could do:
#
# word_modelling: rnn
#
# If you used some pre-trained system for word modelling,
# then you should indicate the system used (e.g. transfomer).
word_modelling: transformer
# Loss function
# Here you should indicate the loss fuction that you employed.
loss_function: crossentropy
# Optimizer
# Here you should indicate the name of the optimizer that you
# used.
optimizer: adamw
# Learning rate
# Here you should indicate the learning rate of the optimizer
# that you used.
leasrning_rate: 1e-5
# Gradient clipping
# Here you should indicate if you used any gradient clipping.
# You do this by indicating the value used for clipping. Use
# 0 for no clipping.
gradient_clipping: 0
# Gradient norm
# Here you should indicate the norm of the gradient that you
# used for gradient clipping. This field is used only when
# gradient clipping has been employed.
gradient_norm: !!null
# Metric monitored
# Here you should report the monitored metric
# for optimizing your method. For example, did you
# monitored the loss on the validation data (i.e. validation
# loss)? Or you monitored the SPIDEr metric? Maybe the training
# loss?
metric_monitored: validation_loss
# System complexity, meta data provided here will be used to evaluate
# submitted systems from the computational load perspective.
complexity:
# Total amount of parameters used in the acoustic model.
# For neural networks, this information is usually given before training process
# in the network summary.
# For other than neural networks, if parameter count information is not directly
# available, try estimating the count as accurately as possible.
# In case of ensemble approaches, add up parameters for all subsystems.
# In case embeddings are used, add up parameter count of the embedding
# extraction networks and classification network
# Use numerical value (do not use comma for thousands-separator).
total_parameters: 140000000
# List of external datasets used in the submission.
# Development dataset is used here only as example, list only external datasets
external_datasets:
# Dataset name
- name: Clotho
# Dataset access url
url: https://doi.org/10.5281/zenodo.3490683
# Has audio:
has_audio: Yes
# Has images
has_images: No
# Has video
has_video: No
# Has captions
has_captions: Yes
# Number of captions per audio
nb_captions_per_audio: 5
# Total amount of examples used
total_audio_length: 24430
# Used for (e.g. audio_modelling, word_modelling, audio_and_word_modelling)
used_for: audio_and_word_modelling
# URL to the source code of the system [optional]
source_code: https://github.com/audio-captioning/dcase-2021-baseline
# System results
results:
development_evaluation:
# System results for development evaluation split.
# Full results are not mandatory, however, they are highly recommended
# as they are needed for through analysis of the challenge submissions.
# If you are unable to provide all results, also incomplete
# results can be reported.
bleu1: 0.555
bleu2: 0.358
bleu3: 0.239
bleu4: 0.156
rougel: 0.364
meteor: 0.164
cider: 0.358
spice: 0.109
spider: 0.233
Example meta information file for Task 6 baseline system task6/Drossos_TAU_task6b_1/Xie_TAU_task6b_1.meta.yaml
:
# Submission information for task 6 - subtask B
submission:
# Submission label
# Label is used to index submissions.
# Generate your label following way to avoid
# overlapping codes among submissions:
# [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
label: xie_tau_task6b_1
#
# Submission name
# This name will be used in the results tables when space permits
name: DCASE2022 baseline system
#
# Submission name abbreviated
# This abbreviated name will be used in the result table when space is tight.
# Use maximum 10 characters.
abbreviation: Baseline
# Authors of the submitted system. Mark authors in
# the order you want them to appear in submission lists.
# One of the authors has to be marked as corresponding author,
# this will be listed next to the submission in the results tables.
authors:
# First author
- lastname: Xie
firstname: Huang
email: huang.xie@tuni.fi # Contact email address
corresponding: true # Mark true for one of the authors
# Affiliation information for the author
affiliation:
abbreviation: TAU
institute: Tampere University
department: Computing Sciences # Optional
location: Tampere, Finland
# Second author
- lastname: Lipping
firstname: Samuel
email: samuel.lipping@tuni.fi # Contact email address
# Affiliation information for the author
affiliation:
abbreviation: TAU
institute: Tampere University
department: Computing Sciences # Optional
location: Tampere, Finland
# Third author
- lastname: Virtanen
firstname: Tuomas
email: tuomas.virtanen@tuni.fi
# Affiliation information for the author
affiliation:
abbreviation: TAU
institute: Tampere University
department: Computing Sciences
location: Tampere, Finland
# System information
system:
# System description, meta-data provided here will be used to do
# meta analysis of the submitted system.
# Use general level tags, when possible use the tags provided in comments.
# If information field is not applicable to the system, use "!!null".
description:
# Audio input / sampling rate
# e.g. 16kHz, 22.05kHz, 44.1kHz, 48.0kHz
input_sampling_rate: 44.1kHz
# Acoustic representation
# Here you should indicate what can or audio representation
# you used. If your system used hand-crafted features (e.g.
# mel band energies), then you can do:
#
# `acoustic_features: mel energies`
#
# Else, if you used some pre-trained audio feature extractor,
# you can indicate the name of the system, for example:
#
# `acoustic_features: audioset`
acoustic_features: log-mel energies
# Word embeddings
# Here you can indicate how you treated word embeddings.
# If your method learned its own word embeddings (i.e. you
# did not use any pre-trained word embeddings) then you can
# do:
#
# `word_embeddings: learned`
#
# Else, specify the pre-trained word embeddings that you used
# (e.g. Word2Vec, BERT, etc).
word_embeddings: Word2Vec
# Data augmentation methods
# e.g. mixup, time stretching, block mixing, pitch shifting, ...
data_augmentation: !!null
# Method scheme
# Here you should indicate the scheme of the method that you
# used. For example:
machine_learning_method: cross-modal alignment
# Learning scheme
# Here you should indicate the learning scheme.
# For example, you could specify either
# supervised, self-supervised, or even
# reinforcement learning.
learning_scheme: self-supervised
# Ensemble
# Here you should indicate if you used ensemble
# of systems or not.
ensemble: No
# Audio modelling
# Here you should indicate the type of system used for
# audio modelling. For example, if you used some stacked CNNs, then
# you could do:
#
# audio_modelling: cnn
#
# If you used some pre-trained system for audio modelling,
# then you should indicate the system used (e.g. COALA, COLA,
# transformer).
audio_modelling: crnn
# Word modelling
# Similarly, here you should indicate the type of system used
# for word modelling. For example, if you used some RNNs,
# then you could do:
#
# word_modelling: rnn
#
# If you used some pre-trained system for word modelling,
# then you should indicate the system used (e.g. transformer).
word_modelling: word2vec
# Loss function
# Here you should indicate the loss function that you employed.
loss_function: triplet loss
# Optimizer
# Here you should indicate the name of the optimizer that you
# used.
optimizer: adam
# Learning rate
# Here you should indicate the learning rate of the optimizer
# that you used.
learning_rate: 1e-3
# Gradient clipping
# Here you should indicate if you used any gradient clipping.
# You do this by indicating the value used for clipping. Use
# 0 for no clipping.
gradient_clipping: 0
# Gradient norm
# Here you should indicate the norm of the gradient that you
# used for gradient clipping. This field is used only when
# gradient clipping has been employed.
gradient_norm: !!null
# Metric monitored
# Here you should report the monitored metric
# for optimizing your method. For example, did you
# monitored the loss on the validation data (i.e. validation
# loss)? Or you monitored the SPIDEr metric? Maybe the training
# loss?
metric_monitored: validation_loss
# System complexity, meta-data provided here will be used to evaluate
# submitted systems from the computational load perspective.
complexity:
# Total amount of parameters used in the acoustic model.
# For neural networks, this information is usually given before training process
# in the network summary.
# For other than neural networks, if parameter count information is not directly
# available, try estimating the count as accurately as possible.
# In case of ensemble approaches, add up parameters for all subsystems.
# In case embeddings are used, add up parameter count of the embedding
# extraction networks and classification network
# Use numerical value (do not use comma for thousands-separator).
total_parameters: 732354
# List of datasets used for training the system.
# Development-training data is used here only as example.
training_datasets:
- name: Clotho-development
# Dataset access url
url: https://doi.org/10.5281/zenodo.4783391
# Has audio:
has_audio: Yes
# Has images
has_images: No
# Has video
has_video: No
# Has captions
has_captions: Yes
# Number of captions per audio
nb_captions_per_audio: 5
# Number of audio clips per caption
nb_clips_per_caption: 1
# Total amount durations (in seconds) of audio used
total_audio_length: 86353
# Total amount of captions used
total_captions: 3839
# List of datasets used for validating the system, for example, optimizing hyperparameter.
# Development-validation data is used here only as example.
validation_datasets:
- name: Clotho-validation
# Dataset access url
url: https://doi.org/10.5281/zenodo.4783391
# Has audio:
has_audio: Yes
# Has images
has_images: No
# Has video
has_video: No
# Has captions
has_captions: Yes
# Number of captions per audio
nb_captions_per_audio: 5
# Number of audio clips per caption
nb_clips_per_caption: 1
# Total amount durations (in seconds) of audio used
total_audio_length: 23636
# Total amount of captions used
total_captions: 1045
# List of external datasets used in the submission.
# Development dataset is used here only as example, list only external datasets
external_datasets:
# Dataset name
- name: Clotho
# Dataset access url
url: https://doi.org/10.5281/zenodo.4783391
# Has audio:
has_audio: Yes
# Has images
has_images: No
# Has video
has_video: No
# Has captions
has_captions: Yes
# Number of captions per audio
nb_captions_per_audio: 5
# Number of audio clips per caption
nb_clips_per_caption: 1
# Total amount durations (in seconds) of audio used
total_audio_length: 133442
# Total amount of captions used
total_captions: 29645
# Used for (e.g. audio_modelling, word_modelling, audio_and_word_modelling)
used_for: audio_and_word_modelling
# URL to the source code of the system [optional]
source_code: https://github.com/xieh97/dcase2022-audio-retrieval
# System results
results:
development_testing:
# System results for development testing split.
# Full results are not mandatory, however, they are highly recommended
# as they are needed for through analysis of the challenge submissions.
# If you are unable to provide all results, also incomplete
# results can be reported.
R@1: 0.03
R@5: 0.11
R@10: 0.19
mAP@10: 0.07
Example meta information file for Task 7 baseline system task7/Choi_GLI_task7a_1/Choi_GLI_task7a_1.meta.yaml
:
# Submission information
submission:
# Submission label
# Label is used to index submissions.
# Generate your label following way to avoid overlapping codes among submissions:
# [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task7_track[track]_[index number of your submission (1-4)]
label: Choi_GLI_task7_trackA_1
# Submission name
# This name will be used in the results tables when space permits.
name: DCASE2023 baseline system
# Submission name abbreviated
# This abbreviated name will be used in the results table when space is tight.
# Use a maximum of 10 characters.
abbreviation: Baseline
# Authors of the submitted system.
# Mark authors in the order you want them to appear in submission lists.
# One of the authors has to be marked as corresponding author, this will be listed next to the submission in the results tables.
authors:
# First author
- lastname: Keunwoo
firstname: Choi
email: keunwoo@gaudiolab.com # Contact email address
corresponding: true # Mark true for one of the authors
# Affiliation information for the author
affiliation:
institution: Gaudio Lab, Inc.
department: AI Research # Optional
location: Seoul, Korea
# Second author
- lastname: Jaekwon
firstname: Im
email: jaekwon@gaudiolab.com
# Affiliation information for the author
affiliation:
institution: Gaudio Lab, Inc./Korea Advanced Institute of Science & Technology (KAIST)
department: AI Research/Graduate School of Culture Technology # Optional
location: Seoul, Korea/Daejeon, Korea
# Third author
- lastname: Laurie
firstname: Heller
email: laurieheller@cmu.edu
# Affiliation information for the author
affiliation:
institution: Carnegie Mellon University
department: Psychology # Optional
location: Pittsburgh, US
# System results
results:
# Google Colab URL to generate sounds for evaluation [mandatory]
# The sounds must be unique and must be generated by the code supplied in the colab.
colab_url: https://colab.research.google.com/drive/1FzbBf_FqWKu59i97ITibJbdPAqSzeMD4?usp=sharing
development_dataset:
# System results for development dataset
# Full results are not mandatory, however, they are highly recommended as they are needed for a thorough analysis of the challenge submissions.
# If you are unable to provide all results, also incomplete results can be reported.
# Average FAD
average:
FAD: 9.702
# Class-wise FAD
class_wise:
DogBark:
FAD: 13.411
Footstep:
FAD: 8.109
GunShot:
FAD: 7.951
Keyboard:
FAD: 5.230
MovingMotorVehicle:
FAD: 16.108
Rain:
FAD: 13.337
Sneeze/Cough:
FAD: 3.770
# URL to the source code of the system [optional]
source_code: https://github.com/DCASE2023-Task7-Foley-Sound-Synthesis/dcase2023_task7_baseline
# System information
system:
# System description, metadata provided here will be used to do a meta-analysis of the submitted system.
# Use general level tags, when possible use the tags provided in comments.
# If information field is not applicable to the system, use "!!null".
description:
# System input
# Please specify all system input used (comma-separated list).
input: sound event label
# Machine learning methods
# In case using ensemble methods, please specify all methods used (comma-separated list).
# e.g. AE, VAE, GAN, Transformer, diffusion model, ensemble...
machine_learning_method: VQ-VAE, PixelSNAIL
phase_reconstruction: HiFi-GAN
# Generated acoustic feature input to phase reconstructor
# One or multiple labels, e.g. MFCC, log-mel energies, spectrogram, CQT, ...
acoustic_feature: spectrogram
# System training/processing pipeline stages
# e.g. "pretraining", "encoding" (from scratch), ,"weight quantization", "decoding", "phase reconstruction", ...
pipeline: pretraining, encoding, weight quantization, decoding, phase reconstruction
# Data augmentation methods
# Please specify all methods used (comma-separated list).
# e.g. mixup, time stretching, block mixing, pitch shifting, ...
data_augmentation: !!null
# Ensemble method subsystem count
# In case ensemble method is not used, mark !!null.
# e.g. 2, 3, 4, 5, ...
ensemble_method_subsystem_count: !!null
# System complexity
complexity:
# Total amount of parameters used in the acoustic model(s) and phase reconstruction method(s).
# For neural networks, this information is usually given before training process in the network summary.
# For other than neural networks, if parameter count information is not directly available, try estimating the count as accurately as possible.
# In case of ensemble approaches, add up parameters for all subsystems.
# In case embeddings are used, add up parameter count of the embedding extraction networks and phase reconstruction methods.
# Use numerical value.
total_parameters: 269992
# List of ALL external audio datasets used in the submission. (only for track A)
# Development dataset is used here only as an example, list only external datasets
# If multiple external audio datasets are used, please copy the lines after [# Dataset name] and list information on all the audio datasets.
# e.g. AudioSet, ESC-50, URBAN-SED, Clotho, ...
external_audio_datasets:
# Dataset name
- name: DCASE2023 Challenge Task 7 Development Dataset
# Dataset access URL
url: https://drive.google.com/drive/folders/1GzfZvYVdbgDXnykOR93C3LCchPYBPh5I
# Total audio length in minutes
total_audio_length: 100
# List of ALL external pre-trained models used in the submission. (only for track A)
# If multiple external pre-trained models are used, please copy the lines after [# Model name] and list information on all the pre-trained models.
# e.g. PANNs, VGGish, AST, BYOL-A, ...
external_models:
# Model name
- name: HiFi-GAN
# Access URL for pre-trained model
url: https://drive.google.com/drive/folders/1-eEYTB5Av9jNql0WGBlRoi-WH2J7bp5Y
# How to use pre-trained model
# e.g. encoder, decoder, weight quantization, vocoder, ... (comma-separated list)
usage: vocoder
# URL to the source code of the system [optional, highly recommended]
# Reproducibility will be used to evaluate submitted systems.
source_code: https://github.com/DCASE2023-Task7-Foley-Sound-Synthesis/dcase2023_task7_baseline
# Questionnaire
questionnaire:
# Do you agree to allow the DCASE distribution of 700 audio samples (100 samples * 7 audio categories) to evaluator(s) for the subjective evaluation? [mandatory]
# The audio samples will not be distributed for any purpose other than subjective evaluation without other explicit permissions.
distribute_audio_samples: Yes
# Do you give permission to publish 700 audio samples (100 samples * 7 audio categories) used in the evaluation on the challenge result page?
# This is very important from the perspective of reproducible research, and we strongly encourage you to allow it.
# This does not mean that the copyright of audio samples is transferred to the DCASE community or task 7 organizers.
publish_audio_samples: Yes
# Do you agree to allow the DCASE use of 100 audio samples per category in a future version of this DCASE competition? (not required for competition entry, optional).
# This may be used in future baseline comparisons or classification challenges related to this Foley challenge.
# This does not mean that the copyright of audio samples is transferred to the DCASE community or task 7 organizers.
use_audio_samples: Yes
Technical report
All participants are expected to submit a technical report about the submitted system, to help the DCASE community better understand how the algorithm works.
Technical reports are not peer-reviewed. The technical reports will be published on the challenge website together with all other information about the submitted system. For the technical report, it is not necessary to follow closely the scientific publication structure (for example there is no need for extensive literature review). The report should however contain a sufficient description of the system.
Please report the system performance using the provided cross-validation setup or development set, according to the task. For participants taking part in multiple tasks, one technical report covering all tasks is sufficient, if the systems have only small differences. Describe the task-specific parameters in the report.
Participants can also submit the same report as a scientific paper to DCASE2023 Workshop. In this case, the paper must respect the structure of a scientific publication, and be prepared according to the provided Workshop paper instructions and template. Please note that the template is slightly different, and you will have to create a separate submission to the DCASE2023 Workshop track in the submission system. Please refer to the workshop webpage for more details. DCASE2023 Workshop papers will be peer-reviewed.
Template
Reports are in format 4+1 pages. Papers are maximum 5 pages, including all text, figures, and references, with the 5th page containing only references. The templates for technical report are available here: