The submission deadline is June 15th 2021 23:59 Anywhere on Earth (AoE)

Introduction

Challenge submission consists of a submission package (one zip package) containing system outputs, system meta information, and technical report (pdf file).

Submission process shortly:

Participants run their system with an evaluation dataset, and produce the system output in the specified format. Participants are allowed to submit 4 different system outputs per task or subtask.
Participants create a meta-information file to go along the system output to describe the system used to produce this particular output. Meta information file has a predefined format to help the automatic handling of the challenge submissions. Information provided in the meta file will be later used to produce challenge results. Participants should fill in all meta information and make sure meta information file follows defined formatting.
Participants describe their system in a technical report in sufficient detail. There is a template provided for the technical report.
Participants prepare the submission package (zip-file). The submission package contains system outputs, a maximum of 4 per task, systems meta information, and the technical report.
Participants submit the submission package and the technical report to DCASE2021 Challenge.

Please read carefully the requirements for the files included in the submission package!

Submission system

The submission system is now available:

Submission system

Create a user account and login
Go to the "All Conferences" tab in the system and type DCASE to filter the list
Select "2021 Challenge on Detection and Classification of Acoustic Scenes and Events"
Create a new submission

The challenge deadline is 15 June 2021 (AOE).

The technical report in the submission package must contain at least the title, authors, and abstract. An updated camera-ready version of the technical report can be submitted separately until 17 June 2021 (AOE).

Note: the submission system does not any send a confirmation email. You can check that your submission has been taken into account in your author console. A confirmation email will be sent to all participants once the submissions are closed.

By submitting to the challenge, participants agree for the system output to be evaluated and to be published together with the results and the technical report on the DCASE Challenge website under CC-BY license.

Submission package

Participants are instructed to pack their system output(s), system meta information, and technical report into one zip-package. Example package:

DCASE2021 challenge submission example package (228 kB)
(.zip)

Please prepare your submission zip-file as the provided example. Follow the same file structure and fill meta information with a similar structure as the one in *.meta.yaml -files. The zip-file should contain system outputs for all tasks/subtasks, maximum of 4 submissions per task/subtask, separate meta information for each system, and technical report(s) covering all submitted systems.

If you submit similar systems for multiple tasks, you can describe everything in one technical report. If your approaches for different tasks are significantly different, prepare one technical report for each and include it in the corresponding task folder.

More detailed instructions for constructing the package can be found in the following sections. The technical report template is available here.

A script for checking the content of the submission package is provided for selected tasks. In that case, please validate your submission package accordingly.

For task 1, use validator code from repository

DCASE2021 Task 1 submission validator

Submission label

A submission label is used to index all your submissions (systems per tasks). To avoid overlapping labels among all submitted systems, use the following way to form your label:

[Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number][subtask letter (optional)]_[index number of your submission (1-4)]

For example, the baseline systems would have the following labels:

Martin_TAU_task1a_1
Wang_TAU_task1b_1
Kawaguchi_HIT_task2_1
Politis_TAU_task3_1
Turpault_INR_task4_1
Morfi_QMUL_task5_1
Drossos_TAU_task6_1

Package structure

Make sure your zip-package follows provided file naming convention and directory structure:

Zip-package root
│  
└───task1                                           Task 1 submissions
│   │   Heittola_TAU_task1a.technical_report.pdf    Technical report covering all subtasks
│   │   Martin_TAU_task1a.technical_report.pdf      (optional) Technical report for subtask A system only
│   │   Wang_TAU_task1b.technical_report.pdf        (optional) Technical report for subtask B system only
│   │
│   └───Martin_TAU_task1a_1                         Subtask A System 1 submission files
│   │       Martin_TAU_task1a_1.meta.yaml           Subtask A System 1 meta information
│   │       Martin_TAU_task1a_1.output.csv          Subtask A System 1 output
│   :
│   └───Martin_TAU_task1a_4                         Subtask A System 4 submission files
│   │       Martin_TAU_task1a_4.meta.yaml           Subtask A System 4 meta information
│   │       Martin_TAU_task1a_4.output.csv          Subtask A System 4 output
│   │            
│   └───Wang_TAU_task1b_1                           Subtask B System 1 submission files
│   │       Wang_TAU_task1b_1.meta.yaml             Subtask B System 1 meta information
│   │       Wang_TAU_task1b_1.output.csv            Subtask B System 1 output
│   :
│   └───Wang_TAU_task1b_4                           Subtask B System 4 submission files
│           Wang_TAU_task1b_4.meta.yaml             Subtask B System 4 meta information
│           Wang_TAU_task1b_4.output.csv            Subtask B System 4 output
│                    
└───task2                                           Task 2 submissions
│   │   Kawaguchi_HIT_task2.technical_report.pdf      Technical report                       
│   │
│   └───Kawaguchi_HIT_task2_1                         System 1 submission files
│   │     Kawaguchi_HIT_task2_1.meta.yaml             System 1 meta information
│   │     anomaly_score_fan_section_03_source_test.csv            System 1 output for each section and domain in the evaluation dataset   
│   │     anomaly_score_fan_section_03_target_test.csv        
│   │     anomaly_score_fan_section_04_source_test.csv        
│   │     anomaly_score_fan_section_04_target_test.csv           
│   :     :
│   │     anomaly_score_valve_section_05_target_test.csv           
│   │     decision_result_fan_section_03_source_test.csv           
│   │     decision_result_fan_section_03_target_test.csv           
│   │     decision_result_fan_section_04_source_test.csv           
│   │     decision_result_fan_section_04_target_test.csv           
│   :     :
│   │     decision_result_valve_section_05_target_test.csv           
│   │
│   └───Kawaguchi_HIT_task2_4                         System 4 submission files
│         Kawaguchi_HIT_task2_1.meta.yaml             System 4 meta information
│         anomaly_score_fan_section_03_source_test.csv            System 4 output for each section and domain in the evaluation dataset   
│         anomaly_score_fan_section_03_target_test.csv        
│         anomaly_score_fan_section_04_source_test.csv        
│         anomaly_score_fan_section_04_target_test.csv           
│         :
│         anomaly_score_valve_section_05_target_test.csv           
│         decision_result_fan_section_03_source_test.csv           
│         decision_result_fan_section_03_target_test.csv           
│         decision_result_fan_section_04_source_test.csv           
│         decision_result_fan_section_04_target_test.csv           
│         :
│         decision_result_valve_section_05_target_test.csv    
│   
└───task3                                           Task 3 submissions
│   │   Politis_TAU_task3.technical_report.pdf      Technical report
│   │
│   └───Politis_TAU_task3_1                         System 1 submission files
│   │     Politis_TAU_task3_1.meta.yaml             System 1 meta information
│   │     Politis_TAU_task3_1                       System 1 output files in a folder (200 files in total)
│   :
│   │
│   └───Politis_TAU_task3_4                         System 4 submission files
│         Politis_TAU_task3_4.meta.yaml             System 4 meta information
│         Politis_TAU_task3_4                       System 4 output files in a folder (200 files in total)
│
└───task4                                           Task 4 submissions (not all the 3 scenarios are needed)
│   │   Turpault_task4_SED.technical_report.pdf     Technical report for a SED submission
│   │   Turpault_task4_SS_SED.technical_report.pdf  Technical report for a SS+SED submission
│   │   Wisdom_task4_SS.technical_report.pdf        Technical report for a SS submission  
│   │   validate_submissions.py                     Submission validation code           
│   │   readme.md                                   Instructions how to use the submission validation code
│   │
│   └───Turpault_INR_SED_task4_1                    SED System 1 submission files
│   │     Turpault_INR_task4_SED_1.meta.yaml        SED System 1 meta information
│   │     Turpault_INR_task4_SED_1.output.csv       SED System 1 output
│   :
│   │
│   └───Turpault_INR_SED_task4_4                    SED System 4 submission files
│   │     Turpault_INR_task4_SED_1.meta.yaml        SED System 4 meta information
│   │     Turpault_INR_task4_SED_1.output.csv       SED System 4 output
│   │
│   └───Turpault_INR_SS_SED_task4_1                 SS+SED System 1 submission files
│   │     Turpault_INR_task4_SS_SED_1.meta.yaml     SS+SED System 4 meta information
│   │     Turpault_INR_task4_SS_SED_1.output.csv    SS+SED System 4 output
│   :
│   │
│   └───Turpault_INR_SS_SED_task4_4                 SS+SED System 1 submission files
│   │     Turpault_INR_task4_SS_SED_4.meta.yaml     SS+SED System 4 meta information
│   │     Turpault_INR_task4_SS_SED_4.output.csv    SS+SED System 4 output
│   │
│   └───Wisdom_GOO_SS_task4_1                       SS System 1 submission files
│   │     Wisdom_GOO_task4_SS_1.meta.yaml           SS System 1 meta information
│   │     Wisdom_GOO_task4_SS_1.output.csv          SS System 1 output
│   :
│   │
│   └───Wisdom_GOO_SED_task4_4                      SS System 4 submission files
│   │     Wisdom_GOO_task4_SS_4.meta.yaml           SS System 4 meta information
│   │     Wisdom_GOO_task4_SS_4.output.csv          SS System 4 output
│
└───task5                                           Task 5 submissions
│   │   Morfi_QMUL_task5.technical_report.pdf       Technical report
│   │
│   └───Morfi_QMUL_task5_1                          System 1 submission files
│   │     Morfi_QMUL_task5_1.meta.yaml              System 1 meta information
│   │     Morfi_QMUL_task5_1.output.csv             System 1 output
│   :
│   │
│   └───Morfi_QMUL_task5_4                          System 4 submission files
│         Morfi_QMUL_task5_4.meta.yaml              System 4 meta information
│         Morfi_QMUL_task5_4.output.csv             System 4 output
│
└───task6                                           Task 6 submissions
    │   Drossos_TAU_task6_1.technical_report.pdf    Technical report
    │
    └───Drossos_TAU_task6_1                         System 1 submission files
    │     Drossos_TAU_task6_1.meta.yaml             System 1 meta information
    │     Drossos_TAU_task6_1.output.csv            System 1 output
    :
    │
    └───Drossos_TAU_task6_4                         System 4 submission files
          Drossos_TAU_task6_4.meta.yaml             System 4 meta information
          Drossos_TAU_task6_4.output.csv            System 4 output

System outputs

Participants must submit the results for the provided evaluation datasets.

Follow the system output format specified in the task description.
Tasks are independent. You can participate in a single task or multiple tasks.
Multiple submissions for the same task are allowed (maximum 4 per task). Use a running index in the submission label, and give more detailed names for the submitted systems in the system meta information files. Please mark carefully the connection between the submitted systems and system parameters description in the technical report (for example by referring to the systems by using the submission label or system name given in the system meta information file).
Submitted system outputs will be published online on the DCASE2021 website later to allow future evaluations.

Meta information

In order to enable fast processing of the submissions and meta analysis of submitted systems, participants should provide meta information presented in a structured and correctly formatted YAML-file. Participants are advised to fill in the meta information carefully while making sure all asked information is correctly provided.

A complete meta file will help us notice possible errors before officially publishing the results (for example unexpectedly large difference in performance between development and evaluation set) and allow contacting the authors in case we consider it necessary. Please note that task organizers may ask you to update the meta file after the challenge submission deadline.

See the example meta files below for each baseline system. These examples are also available in the example submission package. Meta file structure is mostly the same for all tasks, only the metrics collected in results->development_dataset-section differ per challenge task.

Task 1A - Low-Complexity Acoustic Scene Classification with Multiple Devices

Example meta information file for Task 1 baseline system task1/Martin_TAU_task1a_1/Martin_TAU_task1a_1.meta.yaml:

# Submission information
submission:
  # Submission label
  # Label is used to index submissions.
  # Generate your label following way to avoid
  # overlapping codes among submissions:
  # [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
  label: Martin_TAU_task1a_1

  # Submission name
  # This name will be used in the results tables when space permits
  name: DCASE2021 baseline system

  # Submission name abbreviated
  # This abbreviated name will be used in the results table when space is tight.
  # Use maximum 10 characters.
  abbreviation: Baseline

  # Authors of the submitted system. Mark authors in
  # the order you want them to appear in submission lists.
  # One of the authors has to be marked as corresponding author,
  # this will be listed next to the submission in the results tables.
  authors:
    # First author
    - lastname: Martín Morató
      firstname: Irene
      email: irene.martinmorato@tuni.fi           # Contact email address
      corresponding: true                         # Mark true for one of the authors

      # Affiliation information for the author
      affiliation:
        abbreviation: TAU
        institute: Tampere University
        department: Computing Sciences            # Optional
        location: Tampere, Finland

    # Second author
    - lastname: Heittola
      firstname: Toni
      email: toni.heittola@tuni.fi                # Contact email address

      # Affiliation information for the author
      affiliation:
        abbreviation: TAU
        institute: Tampere University
        department: Computing Sciences            # Optional
        location: Tampere, Finland

    # Third author
    - lastname: Mesaros
      firstname: Annamaria
      email: annamaria.mesaros@tuni.fi

      # Affiliation information for the author
      affiliation:
        abbreviation: TAU
        institute: Tampere University
        department: Computing Sciences
        location: Tampere, Finland

    # Fourth author
    - lastname: Virtanen
      firstname: Tuomas
      email: tuomas.virtanen@tuni.fi

      # Affiliation information for the author
      affiliation:
        abbreviation: TAU
        institute: Tampere University
        department: Computing Sciences
        location: Tampere, Finland

# System information
system:
  # System description, meta data provided here will be used to do
  # meta analysis of the submitted system.
  # Use general level tags, when possible use the tags provided in comments.
  # If information field is not applicable to the system, use "!!null".
  description:

    # Audio input / sampling rate
    # e.g. 16kHz, 22.05kHz, 44.1kHz, 48.0kHz
    input_sampling_rate: 44.1kHz

    # Acoustic representation
    # one or multiple labels, e.g. MFCC, log-mel energies, spectrogram, CQT, raw waveform, ...
    acoustic_features: log-mel energies

    # Embeddings
    # e.g. VGGish, OpenL3, ...
    embeddings: !!null

    # Data augmentation methods
    # e.g. mixup, time stretching, block mixing, pitch shifting, ...
    data_augmentation: !!null

    # Machine learning
    # In case using ensemble methods, please specify all methods used (comma separated list).
    # one or multiple, e.g. GMM, HMM, SVM, MLP, CNN, RNN, CRNN, ResNet, ensemble, ...
    machine_learning_method: CNN

    # Ensemble method subsystem count
    # In case ensemble method is not used, mark !!null.
    # e.g. 2, 3, 4, 5, ...
    ensemble_method_subsystem_count: !!null

    # Decision making methods
    # e.g. average, majority vote, maximum likelihood, ...
    decision_making: !!null

    # External data usage method
    # e.g. directly, embeddings, pre-trained model, ...
    external_data_usage: embeddings

    # Method for handling the complexity restrictions
    # e.g. weight quantization, sparsity, ...
    complexity_management: weight quantization

  # System complexity, meta data provided here will be used to evaluate
  # submitted systems from the computational load perspective.
  complexity:
    # Total amount of parameters used in the acoustic model.
    # For neural networks, this information is usually given before training process
    # in the network summary.
    # For other than neural networks, if parameter count information is not directly
    # available, try estimating the count as accurately as possible.
    # In case of ensemble approaches, add up parameters for all subsystems.
    # In case embeddings are used, add up parameter count of the embedding
    # extraction networks and classification network
    # Use numerical value (do not use comma for thousands-separator).
    total_parameters: 46246

    # Total amount of non-zero parameters in the acoustic model.
    # Calculated with same principles as "total_parameters".
    # Use numerical value (do not use comma for thousands-separator).
    total_parameters_non_zero: 46246

    # Model size calculated as instructed in task description page.
    # Use numerical value, unit is KB
    model_size: 90.3 # KB

  # List of external datasets used in the submission.
  # Development dataset is used here only as example, list only external datasets
  external_datasets:
    # Dataset name
    - name: TAU Urban Acoustic Scenes 2020 Mobile, Development dataset

      # Dataset access url
      url: https://doi.org/10.5281/zenodo.3819968

      # Total audio length in minutes
      total_audio_length: 3840            # minutes

  # URL to the source code of the system [optional]
  source_code: https://github.com/marmoi/dcase2021_task1a_baseline

# System results
results:
  development_dataset:
    # System results for development dataset with provided the cross-validation setup.
    # Full results are not mandatory, however, they are highly recommended
    # as they are needed for through analysis of the challenge submissions.
    # If you are unable to provide all results, also incomplete
    # results can be reported.

    # Overall metrics
    overall:
      logloss: 1.461
      accuracy: 46.9    # mean of class-wise accuracies

    # Class-wise metrics
    class_wise:
      airport:
        logloss: 1.497
        accuracy: 31.1
      bus:
        logloss: 1.475
        accuracy: 40.1
      metro:
        logloss: 1.457
        accuracy: 48.1
      metro_station:
        logloss: 2.060
        accuracy: 29.6
      park:
        logloss: 1.217
        accuracy: 63.6
      public_square:
        logloss: 1.738
        accuracy: 36.0
      shopping_mall:
        logloss: 1.136
        accuracy: 61.3
      street_pedestrian:
        logloss: 1.522
        accuracy: 47.1
      street_traffic:
        logloss: 1.145
        accuracy: 68.0
      tram:
        logloss: 1.360
        accuracy: 44.3

    # Device-wise
    device_wise:
      a:
        logloss: !!null
        accuracy: 63.9
      b:
        logloss: !!null
        accuracy: 52.2
      c:
        logloss: !!null
        accuracy: 56.3
      s1:
        logloss: !!null
        accuracy: 44.2
      s2:
        logloss: !!null
        accuracy: 43.9
      s3:
        logloss: !!null
        accuracy: 44.5
      s4:
        logloss: !!null
        accuracy: 38.5
      s5:
        logloss: !!null
        accuracy: 40.6
      s6:
        logloss: !!null
        accuracy: 38.2

Task 1B - Audio-Visual Scene Classification

Example meta information file for Task 1 baseline system task1/Wang_TAU_task1b_1/Wang_TAU_task1b_1.meta.yaml:

# Submission information
submission:
  # Submission label
  # Label is used to index submissions.
  # Generate your label following way to avoid
  # overlapping codes among submissions:
  # [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
  label: Wang_TAU_task1b_1

  # Submission name
  # This name will be used in the results tables when space permits
  name: DCASE2021 baseline system

  # Submission name abbreviated
  # This abbreviated name will be used in the results table when space is tight.
  # Use maximum 10 characters.
  abbreviation: Baseline

  # Authors of the submitted system. Mark authors in
  # the order you want them to appear in submission lists.
  # One of the authors has to be marked as corresponding author,
  # this will be listed next to the submission in the results tables.
  authors:
    # First author
    - lastname: Wang
      firstname: Shanshan
      email: shanshan.wang@tuni.fi                # Contact email address
      corresponding: true                         # Mark true for one of the authors

      # Affiliation information for the author
      affiliation:
        abbreviation: TAU
        institute: Tampere University
        department: Computing Sciences            # Optional
        location: Tampere, Finland

    # Second author
    - lastname: Heittola
      firstname: Toni
      email: toni.heittola@tuni.fi                # Contact email address

      # Affiliation information for the author
      affiliation:
        abbreviation: TAU
        institute: Tampere University
        department: Computing Sciences            # Optional
        location: Tampere, Finland

    # Third author
    - lastname: Mesaros
      firstname: Annamaria
      email: annamaria.mesaros@tuni.fi

      # Affiliation information for the author
      affiliation:
        abbreviation: TAU
        institute: Tampere University
        department: Computing Sciences
        location: Tampere, Finland

    # Fourth author
    - lastname: Virtanen
      firstname: Tuomas
      email: tuomas.virtanen@tuni.fi

      # Affiliation information for the author
      affiliation:
        abbreviation: TAU
        institute: Tampere University
        department: Computing Sciences
        location: Tampere, Finland

# System information
system:
  # System description, meta data provided here will be used to do
  # meta analysis of the submitted system.
  # Use general level tags, when possible use the tags provided in comments.
  # If information field is not applicable to the system, use "!!null".
  description:
    # Audio input / channels
    # one or multiple: e.g. mono, binaural, left, right, mixed, ...
    input_channels: mono

    # Audio input / sampling rate
    # e.g. 16kHz, 22.05kHz, 44.1kHz, 48.0kHz
    input_sampling_rate: 48.0kHz

    # Acoustic representation
    # one or multiple labels, e.g. MFCC, log-mel energies, spectrogram, CQT, raw waveform, ...
    acoustic_features: log-mel energies

    # Embeddings
    # e.g. VGGish, OpenL3, ...
    audio_embeddings: OpenL3
    visual_embeddings: OpenL3

    # Data augmentation methods
    # e.g. mixup, time stretching, block mixing, pitch shifting, ...
    data_augmentation: !!null

    # Machine learning
    # In case using ensemble methods, please specify all methods used (comma separated list).
    # one or multiple, e.g. GMM, HMM, SVM, MLP, CNN, RNN, CRNN, ResNet, ensemble, ...
    machine_learning_method: CNN

    # Ensemble method subsystem count
    # In case ensemble method is not used, mark !!null.
    # e.g. 2, 3, 4, 5, ...
    ensemble_method_subsystem_count: !!null

    # How information from modalities are combined
    # e.g. audio only, video only, early fusion, late fusion
    modality_combination: early fusion

    # Decision making methods
    # e.g. average, majority vote, maximum likelihood, ...
    decision_making: maximum likelihood

    # External data usage method
    # e.g. directly, embeddings, pre-trained model, ...
    external_data_usage: embeddings

  # System complexity, meta data provided here will be used to evaluate
  # submitted systems from the computational load perspective.
  complexity:
    # Total amount of parameters used in the model.
    # For neural networks, this information is usually given before training process
    # in the network summary.
    # For other than neural networks, if parameter count information is not directly
    # available, try estimating the count as accurately as possible.
    # In case of ensemble approaches, add up parameters for all subsystems.
    # In case embeddings are used, add up parameter count of the embedding
    # extraction networks and classification network
    # Use numerical value.
    total_parameters: 14553134
    # total parameter is 14553134
    # audio-visual model parameter is 34,186
    # audio model parameters: 338634
    # video model parameters: 338634
    # audio embedding extraction from OpenL3 9150660
    # video embedding extraction from OpenL3 4691020


    # Amount of parameters used in the acoustic model. Indicated the same way than total_parameters.
    # Use numerical value (do not use comma for thousands-separator).
    total_parameters_audio: 9489294

    # Amount of parameters used in the visual model. Indicated the same way than total_parameters
    # Use numerical value (do not use comma for thousands-separator).
    total_parameters_visual: 5029654

  # List of external datasets used in the submission.
  # Development dataset is used here only as example, list only external datasets
  external_datasets:
    # Dataset name
    - name: TAU Urban Audio-Visual Scenes 2021, Development dataset

      # Dataset access url
      url: https://zenodo.org/record/4477542#.YK3yipMza3A

      # Total audio length in minutes
      total_audio_length: 2040            # minutes

  # URL to the source code of the system [optional]
  source_code: https://github.com/shanwangshan/TAU-urban-audio-visual-scenes

# System results
results:
  development_dataset:
    # System results for development dataset with provided the cross-validation setup.
    # Full results are not mandatory, however, they are highly recommended
    # as they are needed for through analysis of the challenge submissions.
    # If you are unable to provide all results, also incomplete
    # results can be reported.

    # Overall metrics
    overall:
      logloss: 0.658
      accuracy: 77.0    # mean of class-wise accuracies

    # Class-wise metrics
    class_wise:
      airport:
        logloss: 0.963
        accuracy: 66.8
      bus:
        logloss: 0.396
        accuracy: 85.9
      metro:
        logloss: 0.541
        accuracy: 80.4
      metro_station:
        logloss: 0.565
        accuracy: 80.8
      park:
        logloss: 0.710
        accuracy: 77.2
      public_square:
        logloss: 0.732
        accuracy: 71.1
      shopping_mall:
        logloss: 0.839
        accuracy: 72.6
      street_pedestrian:
        logloss: 0.877
        accuracy: 72.7
      street_traffic:
        logloss: 0.296
        accuracy: 89.6
      tram:
        logloss: 0.659
        accuracy: 73.1

Task 2 - Unsupervised Anomalous Sound Detection for Machine Condition Monitoring under Domain Shifted Conditions

Example meta information file for Task 2 baseline system task2/Kawaguchi_HIT_task2_1/Kawaguchi_HIT_task2_1.meta.yaml:

# Submission information
submission:

  # Submission label
  # Label is used to index submissions.
  # Generate your label following way to avoid overlapping codes among submissions:
  # [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
  label: Kawaguchi_HIT_task2_1

  # Submission name
  # This name will be used in the results tables when space permits.
  name: DCASE2021 baseline system

  # Submission name abbreviated
  # This abbreviated name will be used in the results table when space is tight.
  # Use a maximum of 10 characters.
  abbreviation: Baseline

  # Authors of the submitted system. 
  # Mark authors in the order you want them to appear in submission lists.
  # One of the authors has to be marked as corresponding author, this will be listed next to the submission in the results tables.
  authors:

    # First author
    - lastname: Kawaguchi
      firstname: Yohei
      email: yohei.kawaguchi.xk@hitachi.com                # Contact email address
      corresponding: true                         # Mark true for one of the authors

      # Affiliation information for the author
      affiliation:
        institution: Hitachi, Ltd.
        department: Research and Development Group            # Optional
        location: Tokyo, Japan

    # Second author
    - lastname: Imoto
      firstname: Keisuke
      email: keisuke.imoto@ieee.org

      # Affiliation information for the author
      affiliation:
        institution: Doshisha University
        location: Kyoto, Japan

    # Third author
    - lastname: Koizumi
      firstname: Yuma
      email: koizumi.yuma@ieee.org

      # Affiliation information for the author
      affiliation:
        institution: Google LLC
        location: Tokyo, Japan

# System information
system:

  # System description, metadata provided here will be used to do a meta-analysis of the submitted system.
  # Use general level tags, when possible use the tags provided in comments.
  # If information field is not applicable to the system, use "!!null".
  description:

    # Audio input
    # Please specify all sampling rates (comma-separated list).
    # e.g. 16kHz, 22.05kHz, 44.1kHz
    input_sampling_rate: 16kHz

    # Data augmentation methods
    # Please specify all methods used (comma-separated list).
    # e.g. mixup, time stretching, block mixing, pitch shifting, ...
    data_augmentation: !!null

    # Front-end (preprocessing) methods
    # Please specify all methods used (comma-separated list).
    # e.g. HPSS, WPE, NMF, NN filter, RPCA, ...
    front_end: !!null

    # Acoustic representation
    # one or multiple labels, e.g. MFCC, log-mel energies, spectrogram, CQT, raw waveform, ...
    acoustic_features: log-mel energies

    # Embeddings
    # Please specify all pre-trained embedings used (comma-separated list).
    # one or multiple, e.g. VGGish, OpenL3, ...
    embeddings: !!null
    
    # Machine learning
    # In case using ensemble methods, please specify all methods used (comma-separated list).
    # e.g. AE, VAE, GAN, GMM, k-means, OCSVM, normalizing flow, CNN, LSTM, random forest, ensemble, ...
    machine_learning_method: AE

    # Method for aggregating predictions over time
    # Please specify all methods used (comma-separated list).
    # e.g. average, median, maximum, minimum, ...
    aggregation_method: average

    # Method for domain adaptation
    # Please specify all methods used (comma-separated list).
    # e.g. fine-tuning, AdaFlow, ...
    domain_adaptation_method: !!null

    # Ensemble method subsystem count
    # In case ensemble method is not used, mark !!null.
    # e.g. 2, 3, 4, 5, ...
    ensemble_method_subsystem_count: !!null

    # Decision making in ensemble
    # e.g. average, median, maximum, minimum, ...
    decision_making: !!null

    # External data usage method
    # Please specify all usages (comma-separated list).
    # e.g. simulation of anomalous samples, embeddings, pre-trained model, ...
    external_data_usage: !!null

    # Usage of the development dataset
    # Please specify all usages (comma-separated list).
    # e.g. development, pre-training, fine-tuning
    development_data_usage: development

  # System complexity, metadata provided here may be used to evaluate submitted systems from the computational load perspective.
  complexity:

    # Total amount of parameters used in the acoustic model.
    # For neural networks, this information is usually given before training process in the network summary.
    # For other than neural networks, if parameter count information is not directly available, try estimating the count as accurately as possible.
    # In case of ensemble approaches, add up parameters for all subsystems.
    # In case embeddings are used, add up parameter count of the embedding extraction networks and classification network.
    # Use numerical value.
    total_parameters: 269992

  # List of external datasets used in the submission.
  # Development dataset is used here only as an example, list only external datasets
  external_datasets:
  
    # Dataset name
    - name: DCASE 2021 Challenge Task 2 Development Dataset

      # Dataset access URL
      url: https://zenodo.org/record/4562016

  # URL to the source code of the system [optional, highly recommended]
  # Reproducibility will be used to evaluate submitted systems.
  source_code: https://github.com/y-kawagu/dcase2021_task2_baseline_ae

# System results
results:
  development_dataset:

    # System results for development dataset.
    # Full results are not mandatory, however, they are highly recommended as they are needed for a thorough analysis of the challenge submissions.
    # If you are unable to provide all results, also incomplete results can be reported.

    # Harmonic mean of AUCs over all sections (00, 01, and 02) and domains [%]
    # No need to round numbers
    ToyCar:
      harmonic_mean_auc: 62.49
      harmonic_mean_pauc: 52.36

    ToyConveyor:
      harmonic_mean_auc: 61.71
      harmonic_mean_pauc: 53.81

    fan:
      harmonic_mean_auc: 63.24
      harmonic_mean_pauc: 53.38

    gearbox:
      harmonic_mean_auc: 65.97
      harmonic_mean_pauc: 52.76

    pump:
      harmonic_mean_auc: 61.92
      harmonic_mean_pauc: 54.41

    slider:
      harmonic_mean_auc: 66.74
      harmonic_mean_pauc: 55.94

    valve:
      harmonic_mean_auc: 53.41
      harmonic_mean_pauc: 50.54

Task 3 - Sound Event Localization and Detection with Directional Interference

Example meta information file for Task 3 baseline system task3/Politis_TAU_task3_1/Politis_TAU_task3_1.meta.yaml:

# Submission information
submission:
  # Submission label
  # Label is used to index submissions, to avoid overlapping codes among submissions
  # use following way to form your label:
  # [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
  label: Politis_TAU_task3_1

  # Submission name
  # This name will be used in the results tables when space permits
  name: DCASE2021 Ambisonic example

  # Submission name abbreviated
  # This abbreviated name will be used in the results table when space is tight, maximum 10 characters
  abbreviation: FOA_base

  # Submission authors in order, mark one of the authors as corresponding author.
  authors:
    # First author
    - lastname: Politis
      firstname: Archontis
      email: archontis.politis@tuni.fi                  # Contact email address
      corresponding: true                             	# Mark true for one of the authors

      # Affiliation information for the author
      affiliation:
        abbreviation: TAU
        institute: Tampere University
        department: Audio Research Group
        location: Tampere, Finland

    # Second author
    - lastname: Adavanne
      firstname: Sharath
      email: sharath.adavanne@tuni.fi                   # Contact email address


      # Affiliation information for the author
      affiliation:
        abbreviation: TAU
        institute: Tampere University
        department: Audio Research Group
        location: Tampere, Finland

    # Third author
    - lastname: Virtanen
      firstname: Tuomas
      email: tuomas.virtanen@tuni.fi                   # Contact email address

      # Affiliation information for the author
      affiliation:
        abbreviation: TAU
        institute: Tampere University
        department: Audio Research Group
        location: Tampere, Finland



# System information
system:
  # System description, meta data provided here will be used to do
  # meta analysis of the submitted system. Use general level tags, if possible use the tags provided in comments.
  # If information field is not applicable to the system, use "!!null".
  description:

    # Audio input
    input_format: Ambisonic                  	# e.g. Ambisonic or Microphone Array or both
    input_sampling_rate: 24kHz          	#

    # Acoustic representation
    acoustic_features: mel spectra, intensity vector   # e.g one or multiple [phase and magnitude spectra, mel spectra, GCC-PHAT, TDOA, intensity vector ...]

    # Data augmentation methods
    data_augmentation: !!null             	# [time stretching, block mixing, pitch shifting, ...]

    # Machine learning
    # In case using ensemble methods, please specify all methods used (comma separated list).
    machine_learning_method: CRNN          	# e.g one or multiple [GMM, HMM, SVM, kNN, MLP, CNN, RNN, CRNN, NMF, random forest, ensemble, ...]


  # System complexity, meta data provided here will be used to evaluate
  # submitted systems from the computational load perspective.
  complexity:

    # Total amount of parameters used in the acoustic model. For neural networks, this
    # information is usually given before training process in the network summary.
    # For other than neural networks, if parameter count information is not directly available,
    # try estimating the count as accurately as possible.
    # In case of ensemble approaches, add up parameters for all subsystems.
    total_parameters: 116118

  # URL to the source code of the system [optional]
  source_code: https://github.com/sharathadavanne/seld-dcase2021

# Evaluation setup used for training your final submitted model
evaluation_setup:
  # List the folds used for training and validating your submitted evaluation model. For instance the baseline SELDnet was trained and validated with the following folds from the dataset
  training_folds: 2, 3, 4, 5, 6
  validation_folds: 1

# System results
results:

  development_dataset:
    # System result for development dataset with the provided evaluation setup.

    # Overall score 
    overall:
      ER_20: 0.69
      F_20: 33.9
      LE_CD: 24.1
      LR_CD: 43.9

Task 4 - Sound Event Detection and Separation in Domestic Environments

Example meta information file for Task 4 baseline system task4/Turpault_INR_task4_SED_1/Turpault_INR_task4_SED_1.meta.yaml:

# Submission information
submission:
  # Submission label
  # Label is used to index submissions, to avoid overlapping codes among submissions
  # use following way to form your label:
  # [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
  label: Turpault_INR_task4_SED_1

  # Submission name
  # This name will be used in the results tables when space permits
  name: DCASE2020 SED baseline system

  # Submission name abbreviated
  # This abbreviated name will be used in the results table when space is tight, maximum 10 characters
  abbreviation: SED Baseline

  # Submission authors in order, mark one of the authors as corresponding author.
  authors:
    # First author
    - lastname: Turpault
      firstname: Nicolas
      email: nicolas.turpault@inria.fr                # Contact email address
      corresponding: true                             # Mark true for one of the authors

      # Affiliation information for the author
      affiliation:
        abbreviation: INR
        institute: Inria Nancy Grand-Est
        department: Department of Natural Language Processing & Knowledge Discovery
        location: Nancy, France

    # Second author
    - lastname: Serizel
      firstname: Romain
      email: romain.serizel@loria.fr                  # Contact email address


      # Affiliation information for the author
      affiliation:
        abbreviation: ULO
        institute: University of Lorraine, Loria
        department: Department of Natural Language Processing & Knowledge Discovery
        location: Nancy, France

    # Third author
    -   firstname: John
        lastname: Hershey

      # Affiliation information for the author
        affiliation:
          abbreviation: GOO
          institue: Google, Inc.
          department: AI Perception
          Location: Cambridge, United States

    # Fourth author
    - firstname: Scott
      lastname: Wisdom

    # Affiliation information for the author
      affiliation:
        abbreviation: GOO
        institue: Google, Inc.
        department: AI Perception
        Location: Cambridge, United States

    # Fifth author
    - firstname: Hakan
      lastname: Erdogan

    # Affiliation information for the author
      affiliation:
        abbreviation: GOO
        institue: Google, Inc.
        department: AI Perception
        Location: Cambridge, United States

        #...



# System information
system:
  # System description, meta data provided here will be used to do
  # meta analysis of the submitted system. Use general level tags, if possible use the tags provided in comments.
  # If information field is not applicable to the system, use "!!null".
  description:

    # Audio input
    input_channels: mono                  # e.g. one or multiple [mono, binaural, left, right, mixed, ...]
    input_sampling_rate: 16               # In kHz

    # Acoustic representation
    acoustic_features: log-mel energies   # e.g one or multiple [MFCC, log-mel energies, spectrogram, CQT, ...]

    # Data augmentation methods
    data_augmentation: !!null             # [time stretching, block mixing, pitch shifting, ...]

    # Machine learning
    # In case using ensemble methods, please specify all methods used (comma separated list).
    machine_learning_method: CRNN      # e.g one or multiple [GMM, HMM, SVM, kNN, MLP, CNN, RNN, CRNN, NMF, random forest, ensemble, ...]

    # Ensemble method subsystem count
    # In case ensemble method is not used, mark !!null.
    ensemble_method_subsystem_count: !!null # [2, 3, 4, 5, ... ]

    # Decision making methods
    decision_making: !!null                 # [majority vote, ...]

    # Semi-supervised method used to exploit both labelled and unlabelled data
    machine_learning_semi_supervised: mean-teacher student         # e.g one or multiple [pseudo-labelling, mean-teacher student...]

    # Segmentation method
    segmentation_method: !!null					            # E.g. [RBM, attention layers...]

    # Post-processing, followed by the time span (in ms) in case of smoothing
    post-processing: median filtering (93ms)				# [median filtering, time aggregation...]

  # System complexity, meta data provided here will be used to evaluate
  # submitted systems from the computational load perspective.
  complexity:

    # Total amount of parameters used in the acoustic model. For neural networks, this
    # information is usually given before training process in the network summary.
    # For other than neural networks, if parameter count information is not directly available,
    # try estimating the count as accurately as possible.
    # In case of ensemble approaches, add up parameters for all subsystems.
    total_parameters: 1112420
    # Approximate training time followed by the hardware used
    trainining_time: 3h (1 GTX 1080 Ti)
    # Model size in MB
    model_size: 4.5

  # The training subsets used to train the model. Followed the amount of data (number of clips) used per subset.
  subsets: 				# [weak (xx), unlabel_in_domain (xx), synthetic (xx), FUSS (xx)...]

  # URL to the source code of the system [optional, highly recommended]
  source_code: https://github.com/turpaultn/dcase20_task4/tree/public_branch/baseline

# System results
results:
  # Full results are not mandatory, but for through analysis of the challenge submissions recommended.
  # If you cannot provide all results, also incomplete results can be reported.

  development_dataset:
    # System result for development dataset with provided the cross-validation setup.
    overall:
      PSDS1: 0.420
      PSDS2: 0.610

Task 5 - Few-shot Bioacoustic Event Detection

Example meta information file for Task 5 baseline system task5/Morfi_QMUOL_task5_1/Morfi_QMUOL_task5_1.meta.yaml:

# Submission information
submission:
  # Submission label
  # Label is used to index submissions, to avoid overlapping codes among submissions
  # use the following way to form your label:
  # [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
  label: Nolasco_QMUL_task5_1

  # Submission name
  # This name will be used in the results tables when space permits
  name: Cross-correlation baseline

  # Submission name abbreviated
  # This abbreviated name will be used in the results table when space is tight, maximum 10 characters
  abbreviation: xcorr_base

  # Submission authors in order, mark one of the authors as corresponding author.
  authors:
    # First author
    - lastname: Nolasco
      firstname: Ines
      email: i.dealmeidanolasco@qmul.ac.uk                    # Contact email address
      corresponding: true                             # Mark true for one of the authors

      # Affiliation information for the author
      affiliation:
        abbreviation: QMUL
        institute: Queen Mary University of London
        department: Centre for Digital Music
        location: London, UK

    # Second author
    - lastname: Stowell
      firstname: Dan
      email: dan.stowell@qmul.ac.uk                  # Contact email address

      # Affiliation information for the author
      affiliation:
        abbreviation: QMUL
        institute: Queen Mary University of London
        department: Centre for Digital Music
        location: London, UK

        #...


# System information
system:
  # SED system description, meta data provided here will be used to do
  # meta analysis of the submitted system. Use general level tags, if possible use the tags provided in comments.
  # If information field is not applicable to the system, use "!!null".
  description:

    # Audio input
    input_sampling_rate: any               # In kHz

    # Acoustic representation
    acoustic_features: spectrogram   # e.g one or multiple [MFCC, log-mel energies, spectrogram, CQT, PCEN, ...]

    # Data augmentation methods
    data_augmentation: !!null             # [time stretching, block mixing, pitch shifting, ...]

    # Embeddings
    # e.g. VGGish, OpenL3, ...
    embeddings: !!null

    # Machine learning
    # In case using ensemble methods, please specify all methods used (comma separated list).
    machine_learning_method: template matching         # e.g one or multiple [GMM, HMM, SVM, kNN, MLP, CNN, RNN, CRNN, NMF, random forest, ensemble, transformer, ...]
    # the system adaptation for "few shot" scenario.
    # For example, if machine_learning_method is "CNN", the few_shot_method might use one of [fine tuning, prototypical, MAML] in addition to the standard CNN architecture.
    few_shot_method: template matching         # e.g [fine tuning, prototypical, MAML, nearest neighbours...]

    # External data usage method
    # e.g. directly, embeddings, pre-trained model, ...
    external_data_usage: !!null

    # Ensemble method subsystem count
    # In case ensemble method is not used, mark !!null.
    ensemble_method_subsystem_count: !!null # [2, 3, 4, 5, ... ]

    # Decision making methods (for ensemble)
    decision_making: !!null                 # [majority vote, ...]

    # Post-processing, followed by the time span (in ms) in case of smoothing
    post-processing: peak picking, threshold				# [median filtering, time aggregation...]

  # System complexity, meta data provided here will be used to evaluate
  # submitted systems from the computational load perspective.
  complexity:

    # Total amount of parameters used in the acoustic model. For neural networks, this
    # information is usually given before training process in the network summary.
    # For other than neural networks, if parameter count information is not directly available,
    # try estimating the count as accurately as possible.
    # In case of ensemble approaches, add up parameters for all subsystems.
    total_parameters: !!null    # note that for simple template matching, the "parameters"==the pixel count of the templates, plus 1 for each param such as thresholding. 
    # Approximate training time followed by the hardware used
    trainining_time: !!null
    # Model size in MB
    model_size: !!null


  # URL to the source code of the system [optional, highly recommended]
  source_code:   

  # List of external datasets used in the submission.
  # A previous DCASE development dataset is used here only as example! List only external datasets
  external_datasets:
    # Dataset name
    - name: !!null
      # Dataset access url
      url: !!null
      # Total audio length in minutes
      total_audio_length: !!null            # minutes

# System results 
results:
  # Full results are not mandatory, but for through analysis of the challenge submissions recommended.
  # If you cannot provide all result details, also incomplete results can be reported.
  validation_set:
    overall:
      F-score: 2.01 # percentile

    # Per-dataset
    dataset_wise:
      HV:
        F-score: 1.22 #percentile
      PB:
        F-score: 5.84 #percentile

Task 6 - Automated Audio Captioning

Example meta information file for Task 6 baseline system task6/Drossos_TAU_task6_1/Drossos_TAU_task6_1.meta.yaml:

# Submission information for task 6
submission:
  # Submission label
  # Label is used to index submissions.
  # Generate your label following way to avoid
  # overlapping codes among submissions:
  # [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
  label: drossos_tau_task6_1
  #
  # Submission name
  # This name will be used in the results tables when space permits
  name: DCASE2021 baseline system
  #
  # Submission name abbreviated
  # This abbreviated name will be used in the results table when space is tight.
  # Use maximum 10 characters.
abbreviation: Baseline

  # Authors of the submitted system. Mark authors in
  # the order you want them to appear in submission lists.
  # One of the authors has to be marked as corresponding author,
  # this will be listed next to the submission in the results tables.
  authors:
    # First author
    - lastname: Drossos
      firstname: Konstantinos
      email: konstantinos.drossos@tuni.fi         # Contact email address
      corresponding: true                         # Mark true for one of the authors

      # Affiliation information for the author
      affiliation:
        abbreviation: TAU
        institute: Tampere University
        department: Computing Sciences            # Optional
        location: Tampere, Finland

    # Second author
    - lastname: Lipping
      firstname: Samuel
      email: samuel.lipping@tuni.fi                # Contact email address

      # Affiliation information for the author
      affiliation:
        abbreviation: TAU
        institute: Tampere University
        department: Computing Sciences            # Optional
        location: Tampere, Finland

    # Third author
    - lastname: Virtanen
      firstname: Tuomas
      email: tuomas.virtanen@tuni.fi

      # Affiliation information for the author
      affiliation:
        abbreviation: TAU
        institute: Tampere University
        department: Computing Sciences
        location: Tampere, Finland

# System information
system:
  # System description, meta data provided here will be used to do
  # meta analysis of the submitted system.
  # Use general level tags, when possible use the tags provided in comments.
  # If information field is not applicable to the system, use "!!null".
  description:

    # Audio input / sampling rate
    # e.g. 16kHz, 22.05kHz, 44.1kHz, 48.0kHz
    input_sampling_rate: 44.1kHz

    # Acoustic representation
    # Here you should indicate what can or audio representation
    # you used. If your system used hand-crafted features (e.g.
    # mel band energies), then you can do:
    #
    # `acoustic_features: mel energies`
    #
    # Else, if you used some pre-trained audio feature extractor, 
    # you can indicate the name of the system, for example:
    #
    # `acoustic_features: audioset`
    acoustic_features: log-mel energies

    # Word embeddings
    # Here you can indicate how you treated word embeddings.
    # If your method learned its own word embeddings (i.e. you
    # did not used any pre-trained word embeddings) then you can
    # do:
    #
    # `word_embeddings: learned`
    #  
    # Else, specify the pre-trained word embeddings that you used
    # (e.g. Word2Vec, BERT, etc).
    word_embeddings: one-hot

    # Data augmentation methods
    # e.g. mixup, time stretching, block mixing, pitch shifting, ...
    data_augmentation: !!null

    # Method scheme
    # Here you should indicate the scheme of the method that you
    # used. For example:
    machine_learning_method: encoder-decoder

    # Learning scheme
    # Here you should indicate the learning scheme. 
    # For example, you could specify either
    # supervised, self-supervised, or even 
    # reinforcement learning. 
    learning_scheme: supervised

    # Ensemble
    # Here you should indicate if you used ensemble
    # of systems or not.
    ensemble: No

    # Audio modelling
    # Here you should indicate the type of system used for
    # audio modelling. For example, if you used some stacked CNNs, then
    # you could do:
    #
    # audio_modelling: cnn
    #
    # If you used some pre-trained system for audio modelling,
    # then you should indicate the system used (e.g. COALA, COLA,
    # transfomer).
    audio_modelling: cnn

    # Word modelling
    # Similarly, here you should indicate the type of system used
    # for word modelling. For example, if you used some RNNs,
    # then you could do: 
    #
    # word_modelling: rnn
    #
    # If you used some pre-trained system for word modelling,
    # then you should indicate the system used (e.g. transfomer).
    word_modelling: rnn

    # Loss function
    # Here you should indicate the loss fuction that you employed.
    loss_function: crossentropy

    # Optimizer
    # Here you should indicate the name of the optimizer that you
    # used. 
    optimizer: adam

    # Learning rate
    # Here you should indicate the learning rate of the optimizer
    # that you used.
    leasrning_rate: 1e-3

    # Gradient clipping
    # Here you should indicate if you used any gradient clipping. 
    # You do this by indicating the value used for clipping. Use
    # 0 for no clipping.
    gradient_clipping: 0

    # Gradient norm
    # Here you should indicate the norm of the gradient that you
    # used for gradient clipping. This field is used only when 
    # gradient clipping has been employed.
    gradient_norm: !!null

    # Metric monitored
    # Here you should report the monitored metric
    # for optimizing your method. For example, did you
    # monitored the loss on the validation data (i.e. validation
    # loss)? Or you monitored the SPIDEr metric? Maybe the training
    # loss?
    metric_monitored: validation_loss

  # System complexity, meta data provided here will be used to evaluate
  # submitted systems from the computational load perspective.
  complexity:
    # Total amount of parameters used in the acoustic model.
    # For neural networks, this information is usually given before training process
    # in the network summary.
    # For other than neural networks, if parameter count information is not directly
    # available, try estimating the count as accurately as possible.
    # In case of ensemble approaches, add up parameters for all subsystems.
    # In case embeddings are used, add up parameter count of the embedding
    # extraction networks and classification network
    # Use numerical value (do not use comma for thousands-separator).
    total_parameters: 46246

  # List of external datasets used in the submission.
  # Development dataset is used here only as example, list only external datasets
  external_datasets:
    # Dataset name
    - name: Clotho

      # Dataset access url
      url: https://doi.org/10.5281/zenodo.3490683

      # Has audio:
      has_audio: Yes

      # Has images
      has_images: No

      # Has video
      has_video: No

      # Has captions
      has_captions: Yes

      # Number of captions per audio
      nb_captions_per_audio: 5

      # Total amount of examples used
      total_audio_length: 24430

      # Used for (e.g. audio_modelling, word_modelling, audio_and_word_modelling)
      used_for: audio_and_word_modelling

  # URL to the source code of the system [optional]
      source_code: https://github.com/audio-captioning/dcase-2021-baseline

# System results
results:
  development_evaluation:
    # System results for development evaluation split.
    # Full results are not mandatory, however, they are highly recommended
    # as they are needed for through analysis of the challenge submissions.
    # If you are unable to provide all results, also incomplete
    # results can be reported.
    bleu1: 0.378
    bleu2: 0.119
    bleu3: 0.050
    bleu4: 0.017
    rougel: 0.263
    meteor: 0.078
    cider: 0.075
    spice: 0.028
    spider: 0.051

Technical report

All participants are expected to submit a technical report about the submitted system, to help the DCASE community better understand how the algorithm works.

Technical reports are not peer-reviewed. The technical reports will be published on the challenge website together with all other information about the submitted system. For the technical report, it is not necessary to follow closely the scientific publication structure (for example there is no need for extensive literature review). The report should however contain a sufficient description of the system.

Please report the system performance using the provided cross-validation setup or development set, according to the task. For participants taking part in multiple tasks, one technical report covering all tasks is sufficient, if the systems have only small differences. Describe the task-specific parameters in the report.

Participants can also submit the same report as a scientific paper to DCASE 2021 Workshop. In this case, the paper must respect the structure of a scientific publication, and be prepared according to the provided Workshop paper instructions and template. Please note that the template is slightly different, and you will have to create a separate submission to the DCASE2021 Workshop track in the submission system. Please refer to the workshop webpage for more details. DCASE2021 Workshop papers will be peer-reviewed.

Template

Reports are in format 4+1 pages. Papers are maximum 5 pages, including all text, figures, and references, with the 5th page containing only references. The templates for technical report are available here:

Latex template (138 KB)
version 1.0 (.zip)

Word template (36 KB)
version 1.0 (.docx)

Sample PDF produced with Latex template (149 KB)
version 1.0 (.pdf)

Content

Introduction

Submission system

Submission package

Submission label

Package structure

System outputs

Meta information

Task 1A - Low-Complexity Acoustic Scene Classification with Multiple Devices

Task 1B - Audio-Visual Scene Classification

Task 2 - Unsupervised Anomalous Sound Detection for Machine Condition Monitoring under Domain Shifted Conditions

Task 3 - Sound Event Localization and Detection with Directional Interference

Task 4 - Sound Event Detection and Separation in Domestic Environments

Task 5 - Few-shot Bioacoustic Event Detection

Task 6 - Automated Audio Captioning

Technical report

Template