The submission deadline is June 15th 2026 23:59 Anywhere on Earth (AoE)

Introduction

Challenge submission consists of a submission package (one zip package) containing system outputs, system meta information, and technical report (pdf file).

Submission process shortly:

Participants run their system with an evaluation dataset, and produce the system output in the specified format. Participants are allowed to submit 4 different system outputs per task or subtask.
Participants create a meta-information file to go along the system output to describe the system used to produce this particular output. Meta information file has a predefined format to help the automatic handling of the challenge submissions. Information provided in the meta file will be later used to produce challenge results. Participants should fill in all meta information and make sure meta information file follows defined formatting.
Participants describe their system in a technical report in sufficient detail. A template will be provided for the document.
Participants prepare the submission package (zip-file). The submission package contains system outputs, a maximum of 4 per task, systems meta information, and the technical report.
Participants submit the submission package and the technical report to DCASE2026 Challenge.

Please read carefully the requirements for the files included in the submission package!

Submission system

The submission system is now available:

Submission system

The technical report in the submission package must contain at least the title, authors, and abstract. An updated camera-ready version of the technical report can be submitted separately until 22 June 2026 (AOE).

By submitting to the challenge, participants agree for the system output to be evaluated and to be published together with the results and the technical report on the DCASE Challenge website under CC-BY license.

Submission package

Participants are instructed to pack their system output(s), system meta information, and technical report into one zip-package. Example package:

DCASE2026 challenge submission example package
(.zip)

Please prepare your submission zip-file as the provided example. Follow the same file structure and fill meta information with a similar structure as the one in *.meta.yaml -files. The zip-file should contain system outputs for all tasks/subtasks, a maximum of 4 submissions per task/subtask, separate meta information for each system, and technical report(s) covering all submitted systems.

If you submit similar systems for multiple tasks, you can describe everything in one technical report. If your approaches for different tasks differ significantly, prepare a separate report for each and include it in the corresponding task folder.

More detailed instructions for constructing the package can be found in the following sections. The technical report template is available here.

Scripts for checking the content of the submission package are provided for selected tasks, please validate your submission package accordingly.

Submission label

A submission label is used to index all your submissions (systems per tasks). To avoid overlapping labels among all submitted systems, use the following way to form your label:

[Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number][subtask letter (optional)]_[index number of your submission (1-4)]

For example, the baseline systems would have the following labels:

Font_UPF_task1_1
Nishida_HIT_task2_1
Roman_QMUL_task3a_1
Roman_QMUL_task3b_1
Nguyen_NTT_task4_1
He_CUHK_task5_1
Munakata_LY_task6_1
Mulimani_AAU_task7_1

A script for checking the content of the submission package will be provided for selected tasks. In that case, please validate your submission package accordingly.

Package structure

Make sure your zip-package follows the provided file naming convention and directory structure:

Zip-package root
│  
└───task1                                                  Task 1 submissions
│   │   Font_UPF_task1_1.technical_report.pdf              Technical report covering all subtasks
│   │
│   └───Font_UPF_task1_1                                   System 1 submission files
│   │       Font_UPF_task1_1.meta.yaml                     System 1 meta information
│   │       Font_UPF_task1_1.output.csv                    System 1 output
│   :
│   └───Font_UPF_task1_4                                   System 4 submission files
│           Font_UPF_task1_4.meta.yaml                     System 4 meta information
│           Font_UPF_task1_4.output.csv                    System 4 output
│
└───task2                                                  Task 2 submissions
│   │   Nishida_HIT_task2_1.technical_report.pdf           Technical report                       
│   │
│   └───Nishida_HIT_task2_1                                         System 1 submission files
│   │       Nishida_HIT_task2_1.meta.yaml                           System 1 meta information
│   │       anomaly_score_BlowerDustCollector_section_00_test.csv   System 1 output for each section and domain in the evaluation dataset   
│   │       anomaly_score_Sander_section_00_test.csv
│   :       :
│   │       anomaly_score_ToyDrone_section_00_test.csv
│   │       decision_result_BlowerDustCollector_section_00_test.csv
│   :       :
│   │       decision_result_ToyDrone_section_00_test.csv
│   │
│   └───Nishida_HIT_task2_4                                         System 4 submission files
│           Nishida_HIT_task2_4.meta.yaml                           System 4 meta information
│           anomaly_score_BlowerDustCollector_section_00_test.csv   System 4 output for each section and domain in the evaluation dataset   
│           anomaly_score_AirCompressor_section_00_test.csv        
│           :
│           anomaly_score_ToyDrone_section_00_test.csv
│           decision_result_BlowerDustCollector_section_00_test.csv
│           :
│           decision_result_ToyDrone_section_00_test.csv
|
└───task3                                                  Task 3 submissions
│   │   Roman_QMUL_task3.technical_report.pdf              Technical report for combined AO and AV submissions
│   │   Roman_QMUL_task3a.technical_report.pdf             (Optional) Technical report only for audio-only system (Track A)
│   │   Roman_QMUL_task3b.technical_report.pdf             (Optional) Technical report only for audiovisual system (Track B)
│   │
│   └───Roman_QMUL_task3a_1                                Track A (audio-only) System 1 submission files
│   │     Roman_QMUL_task3a_1.meta.yaml                    Track A (audio-only) System 1 meta information
│   └─────Roman_QMUL_task3a_1                              Track A (audio-only) System 1 output files in a folder
|   |       mix001.json
|   |       ...
│   :
│   │
│   └───Roman_QMUL_task3a_4                                Track A (audio-only) System 4 submission files
│   |     Roman_QMUL_task3a_4.meta.yaml                    Track A (audio-only) System 4 meta information
│   └─────Roman_QMUL_task3a_4                              Track A (audio-only) System 4 output files in a folder
|   |       mix001.json
|   |       ...
|   |
│   └───Roman_QMUL_task3b_1                                Track B (audiovisual) System 1 submission files
│   │     Roman_QMUL_task3b_1.meta.yaml                    Track B (audiovisual) System 1 meta information
│   └─────Roman_QMUL_task3b_1                              Track B (audiovisual) System 1 output files in a folder
|   |       mix001.json
|   |       ...
│   :
│   │
│   └───Roman_QMUL_task3b_4                                Track B (audiovisual) System 4 submission files
│   |     Roman_QMUL_task3b_4.meta.yaml                    Track B (audiovisual) System 4 meta information
│   └─────Roman_QMUL_task3b_4                              Track B (audiovisual) System 4 output files in a folder
|           mix001.json
|           ...
│
└───task4                                                  Task 4 submissions
│   │   Nguyen_NTT_task4.technical_report.pdf              Technical report
│   │   Nguyen_NTT_task4.audio_url.txt                     URLs to zip packages with audio files
│   │   Naming_rule.md                                     Filenaming instructions for audio files in zip files 
│   │
│   └───Nguyen_NTT_task4_1                                 System 1 submission files
│   │     Nguyen_NTT_task4_1.meta.yaml                     System 1 meta information
│   :
│   └───Nguyen_NTT_task4_4                                 System 4 submission files
│         Nguyen_NTT_task4_4.meta.yaml                     System 4 meta information
│    
└───task5                                                  Task 5 submissions
│   │   He_CUHK_task5.technical_report.pdf                 Technical report
│   │
│   └───He_CUHK_task5_1                                    System 1 submission files
│   │     He_CUHK_task5_1.meta.yaml                        System 1 meta information
│   │     He_CUHK_task5_1.output.csv                       System 1 output
│   :
│   │
│   └───He_CUHK_task5_4                                    System 4 submission files
│         He_CUHK_task5_4.meta.yaml                        System 4 meta information
│         He_CUHK_task5_4.output.csv                       System 4 output
│         He_CUHK_task5_4.post_process.py                  (Optional) System 4 post-process code
│  
└───task6                                                  Task 6 submissions
│   │   Munakata_LY_task6.technical_report.pdf             Technical report
│   │
│   └───Munakata_LY_task6_1                                System 1 submission files
│   │     Munakata_LY_task6_1.meta.yaml                    System 1 meta information
│   │     Munakata_LY_task6_1.output.csv                   System 1 output
│   :
│   │
│   └───Munakata_LY_task6_4                                System 4 submission files
│         Munakata_LY_task6_4.meta.yaml                    System 4 meta information
│         Munakata_LY_task6_4.output.csv                   System 4 output
│         Munakata_LY_task6_4.post_process.py              (Optional) System 4 post-process code
│
└───task7                                                  Task 7 submissions
    │   Mulimani_AAU_task7.technical_report.pdf              Technical report 
    │
    └───Mulimani_AAU_task7_1                                 System 1 submission files
    │     Mulimani_AAU_task7_1.meta.yaml                     System 1 meta information
    │     Mulimani_AAU_task7_1.output.csv                    System 1 output
    │     Mulimani_AAU_task7_1_model.py                      System 1 model definition
    │     Mulimani_AAU_task7_1_D2_dictionary.pth             System 1 model weights after domain 2
    │     Mulimani_AAU_task7_1_D3_dictionary.pth             System 1 model weights after domain 3
    :
    │
    └───Mulimani_AAU_task7_4                                 System 4 submission files
          Mulimani_AAU_task7_4.meta.yaml                     System 4 meta information
          Mulimani_AAU_task7_4.output.csv                    System 4 output
          Mulimani_AAU_task7_4_model.py                      System 4 model definition
          Mulimani_AAU_task7_4_D2_dictionary.pth             System 4 model weights after domain 2
          Mulimani_AAU_task7_4_D3_dictionary.pth             System 4 model weights after domain 3

System outputs

Participants must submit the results for the provided evaluation datasets.

Follow the system output format specified in the task description.
Tasks are independent. You can participate in a single task or multiple tasks.
Multiple submissions for the same task are allowed (maximum 4 per task). Use a running index in the submission label, and give more detailed names for the submitted systems in the system meta information files. Please mark carefully the connection between the submitted systems and system parameters description in the technical report (for example by referring to the systems by using the submission label or system name given in the system meta information file).
Submitted system outputs will be published online on the DCASE2026 website later to allow future evaluations.

Meta information

To enable the fast processing of submissions and meta-analysis of submitted systems, participants should provide meta information presented in a structured and correctly formatted YAML-file. Participants are advised to fill in the meta information carefully while making sure all requested information is provided correctly.

A complete meta file will help us identify possible errors before officially publishing the results (for example, an unexpectedly large difference in performance between the development and evaluation sets) and allow us to contact the authors in case we consider it necessary. Please note that task organizers may ask you to update the meta file after the challenge submission deadline.

See the example meta files below for each baseline system. These examples are also available in the example submission package. Meta file structure is mostly the same for all tasks, only the metrics collected in results->development_dataset-section differ per challenge task.

Task 1 - Heterogeneous Audio Classification

Example meta information file for Task 1 baseline system task1/Font_UPF_task1_1/Font_UPF_task1_1.meta.yaml:

# Submission information
submission:
  # Submission label
  # Label is used to index submissions.
  # Generate your label in the following way to avoid
  # overlapping codes among submissions:
  # [Last name of corresponding author]_[Abbreviation of institute of the
  # corresponding author]_task[task number]_[index number of your submission
  # (1-4)]
  label: Font_UPF_task1_1

  # Submission name
  # This name will be used in the results tables when space permits
  name: Example submission system

  # Submission name abbreviated
  # This abbreviated name will be used in the results table when space is tight.
  # Use maximum 10 characters.
  abbreviation: ExampleSys

  # Authors of the submitted system. Mark authors in
  # the order you want them to appear in submission lists.
  # One of the authors has to be marked as corresponding author,
  # this will be listed next to the submission in the results tables.
  authors:
    # First author
    - lastname: Font
      firstname: Frederic
      email: frederic.font@upf.edu           # Contact email address
      corresponding: true                    # Mark true for one of the authors
      # Affiliation information for the author
      affiliation:
        abbreviation: UPF
        institute: Universitat Pompeu Fabra (UPF)
        department: Music Technology Group (MTG)   # Optional
        location: Barcelona, Spain

    # Second author
    - lastname: Anastasopoulou
      firstname: Panagiota
      email: panagiota.anastasopoulou@upf.edu   
      affiliation:
        abbreviation: UPF
        institute: Universitat Pompeu Fabra (UPF)
        department: Music Technology Group (MTG)   # Optional
        location: Barcelona, Spain  

# System information
system:
  # System description, meta data provided here will be used to do
  # meta analysis of the submitted system.
  # Use general level tags, when possible use the tags provided in comments.
  # If information field is not applicable to the system, use "!!null".
  # Use commas to separate tags if when multiple tags are applicable
   
  # URL to the full source code of the system [optional]
  source_code: https://github.com/MTG/dcase2026_task1_baseline
  
  description:

    # Audio input / sampling rate
    # e.g. 16kHz, 22.05kHz, 32kHz, 44.1kHz, 48.0kHz
    input_sampling_rate: 48kHz

    # Audio representation (audio features or embedding spaces)
    # e.g. MFCC, log-mel energies, spectrogram, CQT, raw waveform, PANNs,
    # CLAP, EnCodec ...
    audio_representation: CLAP

    # Representation for text input (textual features, metadata items and/or
    # embedding spaces, only relevant in multimodal systems that use metadata
    # as input)
    # e.g. title, tags, description, Word2Vec, CLAP ...
    # Use !!null if the system does not use metadata as input (if the system is
    # audio-only).
    text_representation: title, tags, description, CLAP

    # Data augmentation methods
    # e.g. mixup, freq-mixstyle, dir augmentation, pitch shifting, time
    # rolling, frequency masking, time masking, frequency warping,
    # noise addition, ...
    # Use !!null if the system does not use data augmentation
    data_augmentation: noise addition, time masking

    # Machine learning
    # e.g., MLP, (RF-regularized) CNN, RNN, CRNN, Transformer, ...
    machine_learning_method: MLP

    # External data usage method
    # e.g. dataset, embeddings, ...
    # Use !!null if the system does not use external data
    external_data_usage: embeddings

    # Method for considering taxonomy hierarchy in the system design
    # e.g. loss function, multiple classifiers, ...
    # Use !!null if the system does not use any method to leverage hierarchy
    hierarchical_setting: !!null 


  # System complexity
  complexity:
    # Total amount of parameters used in the acoustic model.
    # For neural networks, this information is usually given before training
    # process in the network summary.
    # For other than neural networks, if parameter count information is not
    # directly available, try estimating the count as
    # accurately as possible.
    # In case of ensemble approaches, add up parameters for all subsystems.
    # In case embeddings are used, add up parameter count of the embedding
    # extraction networks and classification network.
    # Use numerical value.
    total_parameters: 269992
    MACS: 1.036 G

  # List of external datasets used in the submission. If using embeddings from
  # pre-trained models, there's NO need to include the datasets used to train
  # the embedding models here.
  external_datasets:
    # Below are two examples (NOT used in the baseline system)
    #- name: EfficientAT
    #  url: https://github.com/fschmid56/EfficientAT
    #  total_audio_length: !!null
    #- name: MicIRP
    #  url: http://micirp.blogspot.com/?m=1
    #  total_audio_length: 2   # specify in minutes

# System results [OPTIONAL]
results:
  development_datasets:
    
    # System results for both development datasets 
    # If possible, please provide overall and class-wise results when
    # training/evaluating your systems on each of the development datasets
    # separately. You can follow the baseline code example to do that. Note that
    # development datasets do not include data splits, our baseline example uses
    # 5-fold cross-validation and reports the average performance across the 5
    # folds. You can reproduce the same experiment setup using the code
    # provided with the baseline system and using the same random seed to
    # generate the same folds.

    # Please refer to the baseline code for the calculation of the overall
    # metrics and class-wise (hP, hR, hF).
    # Set parameter lambda=0.75 (which is default in the baseline code).
    # Note that the numbers below are just examples, they do not correspond to
    # the actual performance of any system.
    
    bsd10k-v1.2:
      overall:
        hP: 0.584
        hR: 0.688
        hF: 0.632

      class_wise:
        m-sp:
          hP: 0.799
          hR: 0.875
          hF: 0.835
        m-si:
          hP: 0.794
          hR: 0.844
          hF: 0.818
        m-m:
          hP: 0.834
          hR: 0.944
          hF: 0.885
        is-p:
          hP: 0.799
          hR: 0.887
          hF: 0.841
        is-s:
          hP: 0.821
          hR: 0.887
          hF: 0.853
        is-w:
          hP: 0.790
          hR: 0.881
          hF: 0.833
        is-k:
          hP: 0.836
          hR: 0.900
          hF: 0.867
        is-e:
          hP: 0.753
          hR: 0.844
          hF: 0.796
        sp-s:
          hP: 0.790
          hR: 0.881
          hF: 0.833
        sp-c:
          hP: 0.821
          hR: 0.906
          hF: 0.862
        sp-p:
          hP: 0.764
          hR: 0.838
          hF: 0.799
        fx-o:
          hP: 0.788
          hR: 0.900
          hF: 0.841
        fx-v:
          hP: 0.788
          hR: 0.887
          hF: 0.835
        fx-m:
          hP: 0.802
          hR: 0.875
          hF: 0.837
        fx-h:
          hP: 0.778
          hR: 0.850
          hF: 0.812
        fx-a:
          hP: 0.795
          hR: 0.863
          hF: 0.828
        fx-n:
          hP: 0.805
          hR: 0.900
          hF: 0.850
        fx-ex:
          hP: 0.797
          hR: 0.881
          hF: 0.837
        fx-el:
          hP: 0.753
          hR: 0.825
          hF: 0.787
        ss-n:
          hP: 0.806
          hR: 0.887
          hF: 0.845
        ss-i:
          hP: 0.840
          hR: 0.900
          hF: 0.869
        ss-u:
          hP: 0.817
          hR: 0.887
          hF: 0.851
        ss-s:
          hP: 0.849
          hR: 0.925
          hF: 0.885
      
    bsd35k-cs:
      overall:
        hP: 0.339
        hR: 0.506
        hF: 0.406

      class_wise:
        m-sp:
          hP: 0.594
          hR: 0.713
          hF: 0.648
        m-si:
          hP: 0.618
          hR: 0.725
          hF: 0.667
        m-m:
          hP: 0.566
          hR: 0.656
          hF: 0.608
        is-p:
          hP: 0.544
          hR: 0.650
          hF: 0.592
        is-s:
          hP: 0.579
          hR: 0.675
          hF: 0.623
        is-w:
          hP: 0.589
          hR: 0.700
          hF: 0.640
        is-k:
          hP: 0.576
          hR: 0.681
          hF: 0.625
        is-e:
          hP: 0.605
          hR: 0.700
          hF: 0.649
        sp-s:
          hP: 0.601
          hR: 0.688
          hF: 0.642
        sp-c:
          hP: 0.587
          hR: 0.706
          hF: 0.641
        sp-p:
          hP: 0.607
          hR: 0.719
          hF: 0.658
        fx-o:
          hP: 0.590
          hR: 0.694
          hF: 0.638
        fx-v:
          hP: 0.598
          hR: 0.719
          hF: 0.653
        fx-m:
          hP: 0.563
          hR: 0.644
          hF: 0.601
        fx-h:
          hP: 0.565
          hR: 0.669
          hF: 0.612
        fx-a:
          hP: 0.597
          hR: 0.688
          hF: 0.639
        fx-n:
          hP: 0.541
          hR: 0.631
          hF: 0.583
        fx-ex:
          hP: 0.604
          hR: 0.719
          hF: 0.656
        fx-el:
          hP: 0.587
          hR: 0.713
          hF: 0.644
        ss-n:
          hP: 0.561
          hR: 0.656
          hF: 0.605
        ss-i:
          hP: 0.598
          hR: 0.719
          hF: 0.653
        ss-u:
          hP: 0.573
          hR: 0.669
          hF: 0.617
        ss-s:
          hP: 0.583
          hR: 0.688
          hF: 0.631

Task 2 - Noise-aware Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

Example meta information file for Task 2 baseline system task2/Nishida_HIT_task2_1/Nishida_HIT_task2_1.meta.yaml:

# Submission information
submission:
  # Submission label
  # Label is used to index submissions.
  # Generate your label following way to avoid overlapping codes among submissions:
  # [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
  label: Nishida_HIT_task2_1

  # Submission name
  # This name will be used in the results tables when space permits.
  name: DCASE2026 baseline system

  # Submission name abbreviated
  # This abbreviated name will be used in the results table when space is tight.
  # Use a maximum of 10 characters.
  abbreviation: Baseline

  # Authors of the submitted system.
  # Mark authors in the order you want them to appear in submission lists.
  # One of the authors has to be marked as corresponding author, this will be listed next to the submission in the results tables.
  authors:
    # First author
    - firstname: Tomoya
      lastname: Nishida
      email: tomoya.nishida.ax@hitachi.com # Contact email address
      corresponding: true # Mark true for one of the authors

      # Affiliation information for the author
      affiliation:
        institution: Hitachi, Ltd.
        department: Research and Development Group # Optional
        location: Tokyo, Japan

    # Second author
    - firstname: Noboru
      lastname: Harada
      email: noboru@ieee.org

      # Affiliation information for the author
      affiliation:
        institution: NTT Corporation
        location: Kanagawa, Japan

    # Third author
    - firstname: Daisuke
      lastname: Niizumi
      email: daisuke.niizumi.dt@hco.ntt.co.jp

      # Affiliation information for the author
      affiliation:
        institution: NTT Corporation
        location: Kanagawa, Japan


# System information
system:
  # System description, metadata provided here will be used to do a meta-analysis of the submitted system.
  # Use general level tags, when possible use the tags provided in comments.
  # If information field is not applicable to the system, use "!!null".
  description:
    # Audio input
    # Please specify all sampling rates (comma-separated list).
    # e.g. 16kHz, 22.05kHz, 44.1kHz
    input_sampling_rate: 16kHz

    # Data augmentation methods
    # Please specify all methods used (comma-separated list).
    # e.g. mixup, time stretching, block mixing, pitch shifting, ...
    data_augmentation: !!null

    # Front-end (preprocessing) methods
    # Please specify all methods used (comma-separated list).
    # e.g. HPSS, WPE, NMF, NN filter, RPCA, ...
    front_end: !!null

    # Acoustic representation
    # one or multiple labels, e.g. MFCC, log-mel energies, spectrogram, CQT, raw waveform, ...
    acoustic_features: log-mel energies

    # Embeddings
    # Please specify all pre-trained embedings used (comma-separated list).
    # one or multiple, e.g. VGGish, OpenL3, ...
    embeddings: !!null

    # Machine learning
    # In case using ensemble methods, please specify all methods used (comma-separated list).
    # e.g. AE, VAE, GAN, GMM, k-means, OCSVM, normalizing flow, CNN, LSTM, random forest, ensemble, ...
    machine_learning_method: AE

    # Method for aggregating predictions over time
    # Please specify all methods used (comma-separated list).
    # e.g. average, median, maximum, minimum, ...
    aggregation_method: average

    # Method for domain generalizatoin and domain adaptation
    # Please specify all methods used (comma-separated list).
    # e.g. fine-tuning, invariant feature extraction, ...
    domain_adaptation_method: !!null
    domain_generalization_method: !!null

    # Ensemble method subsystem count
    # In case ensemble method is not used, mark !!null.
    # e.g. 2, 3, 4, 5, ...
    ensemble_method_subsystem_count: !!null

    # Decision making in ensemble
    # e.g. average, median, maximum, minimum, ...
    decision_making: !!null

    # Usage of the attribute information in the file names and attribute csv files
    # Please specify all usages (comma-separated list).
    # e.g. interpolation, extrapolation, condition ...
    attribute_usage: !!null

    # External data usage method
    # Please specify all usages (comma-separated list).
    # e.g. simulation of anomalous samples, embeddings, pre-trained model, ...
    external_data_usage: !!null

    # Usage of the development dataset
    # Please specify all usages (comma-separated list).
    # e.g. development, pre-training, fine-tuning
    development_data_usage: development

  # System complexity, metadata provided here may be used to evaluate submitted systems from the computational load perspective.
  complexity:
    # Total amount of parameters used in the acoustic model.
    # For neural networks, this information is usually given before training process in the network summary.
    # For other than neural networks, if parameter count information is not directly available, try estimating the count as accurately as possible.
    # In case of ensemble approaches, add up parameters for all subsystems.
    # In case embeddings are used, add up parameter count of the embedding extraction networks and classification network.
    # Use numerical value.
    total_parameters: 269992
  
  # List of external datasets used in the submission.
  # Development dataset is used here only as an example, list only external datasets
  external_datasets:
    # Dataset name
    - name: 

      # Dataset access URL
      url: 

  # URL to the source code of the system [optional, highly recommended]
  # Reproducibility will be used to evaluate submitted systems.
  source_code: https://github.com/nttcslab/dcase2023_task2_baseline_ae

# System results
results:
  development_dataset:
    # System results for development dataset.
    # Full results are not mandatory, however, they are highly recommended as they are needed for a thorough analysis of the challenge submissions.
    # If you are unable to provide all results, also incomplete results can be reported.

    # AUC for all domains [%]
    # No need to round numbers
    ToyCarEmu:
      auc_source: 69.62
      auc_target: 61.2
      pauc: 55.89

    ToyCar:
      auc_source: 75.62
      auc_target: 37.87
      pauc: 54.03

    bearingEmu:
      auc_source: 62.34
      auc_target: 59.56
      pauc: 59.85

    fan:
      auc_source: 61.45
      auc_target: 46.94
      pauc: 53.33

    gearboxEmu:
      auc_source: 68.23
      auc_target: 49.78
      pauc: 52.94

    sliderEmu:
      auc_source: 67.25
      auc_target: 45.05
      pauc: 50/38

    valveEmu:
      auc_source: 67.74
      auc_target: 68.78
      pauc: 55.08

Task 3 - Semantic Acoustic Imaging for Sound Event Localization and Detection from Spatial Audio and Audiovisual Scenes

Example meta information file for Task 3 baseline system task3/Roman_QMUL_task3a_1/Roman_QMUL_task3a_1.meta.yaml:

# Submission information
submission:
  # Submission label
  # Label is used to index submissions, to avoid overlapping codes among submissions
  # use following way to form your label:
  # [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
  label: Roman_QMUL_task3a_1

  # Submission name
  # This name will be used in the results tables when space permits
  name: DCASE2026 Audio-only baseline

  # Submission name abbreviated
  # This abbreviated name will be used in the results table when space is tight, maximum 10 characters
  abbreviation: AO_base

  # Submission authors in order, mark one of the authors as corresponding author.
  authors:
    # First author
    - lastname: Roman
      firstname: Iran
      email: i.roman@qmul.ac.uk                         # Contact email address
      corresponding: true                             	# Mark true for one of the authors

      # Affiliation information for the author
      affiliation:
        abbreviation: QMUL
        institute: Queen Mary University
        department: Audio Research Group
        location: London, UK

    # Second author
    - lastname: Shimada
      firstname: Kazuki
      email: kazuki.shimada@sony.com                   # Contact email address

      # Affiliation information for the author
      affiliation:
        abbreviation: SONY
        institute: Sony AI
        department:
        location: Tokyo, Japan


# System information
system:
  # System description, meta data provided here will be used to do
  # meta analysis of the submitted system. Use general level tags, if possible use the tags provided in comments.
  # If information field is not applicable to the system, use "!!null".
  description:
  
    # Model type (audio-only or audiovisual track)
    model_type: Audio                       # Audio or Audiovisual

    # Audio input
    input_format: MIC                       # Tetrahedral microphone array
    input_sampling_rate: 24kHz

    # Acoustic representation
    acoustic_features: log mel spectra      # e.g one or multiple [phase and magnitude spectra, log mel spectra, GCC-PHAT, TDOA, ...]
    # Video representation
    visual_features: !!null

    # Data augmentation methods
    data_augmentation: !!null             	# [time stretching, block mixing, pitch shifting, ...]

    # Machine learning
    # In case of using ensemble methods, please specify all methods used (comma separated list).
    machine_learning_method: CRNN, MHSA     # e.g one or multiple [GMM, HMM, SVM, kNN, MLP, CNN, RNN, CRNN, NMF, MHSA, random forest, ensemble, ...]
    
    # List external datasets in case of use for training
    external_datasets: FSD50K, TAU-SRIR DB  # AudioSet, ImageNet, ...

    # List here pre-trained models in case of use
    pre_trained_models: !!null              # AST, PANNs...

  # System complexity, meta data provided here will be used to evaluate
  # submitted systems from the computational load perspective.
  complexity:

    # Total amount of parameters used in the acoustic model. For neural networks, this
    # information is usually given before training process in the network summary.
    # For other than neural networks, if parameter count information is not directly available,
    # try estimating the count as accurately as possible.
    # In case of ensemble approaches, add up parameters for all subsystems.
    total_parameters: 500000

  # URL to the source code of the system [optional]
  source_code: https://github.com/iranroman/DCASE2026_Task3_SAISELD_baseline


# System results
results:

  development_dataset:
    # System result for development dataset on the provided testing split.

    # Overall score 
    overall:
      mAP: !!null
      EFRQ: !!null

Task 4 - Spatial Semantic Segmentation of Sound Scenes

Example meta information file for Task 4 baseline system task4/Nguyen_NTT_task4_1/Nguyen_NTT_task4_1.meta.yaml:

# Submission information for task 4
submission:
  # Submission label
  # Label is used to index submissions.
  # Generate your label following way to avoid overlapping codes among submissions
  # [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_submission_[index number of your submission (1-4)]
  label: Nguyen_NTT_task4_1

  # Submission name
  # This name will be used in the results tables when space permits
  name: DCASE2026 baseline system 1-channel M2D ResUnetK

  # Submission name abbreviated
  # This abbreviated name will be used in the results table when space is tight.
  # Use maximum 10 characters.
  abbreviation: M2D1cRUnet

# Authors of the submitted system. Mark authors in
# the order you want them to appear in submission lists.
# One of the authors has to be marked as corresponding author,
# this will be listed next to the submission in the results tables.
authors:
  # First author
  - lastname: Nguyen
    firstname: Binh Thien
    email: binhthien.nguyen@ntt.com             # Contact email address
    corresponding: true                         # Mark true for one of the authors

    # Affiliation information for the author
    affiliation:
      abbreviation: NTT
      institute: NTT Corporation
      department: Communication Science Laboratories   # Optional
      location: Atsugi, Kanagawa, Japan

  # Second author
  # ...

# System information
system:
  # System description, meta data provided here will be used to do
  # meta analysis of the submitted system.
  # Use general level tags, when possible use the tags provided in comments.
  # If information field is not applicable to the system, use "!!null".
  description:
    # Audio input sampling rate
    # e.g., 16kHz, 32kHz
    input_sampling_rate: 32kHz

    # Input Acoustic representation
    # Here you should indicate which audio representation you used as system input.
    input_acoustic_features: waveform, spectrogram

    # Data augmentation methods
    # e.g., volume augmentation
    data_augmentation: !!null

    # Method scheme
    # Here you should indicate the scheme of the method that you used. For example
    machine_learning_method: ResUNet-based separation model, M2D-based audio tagging model

    # Ensemble
    # - Here you should indicate the number of systems involved if you used ensembling.
    # - If you did not use ensembling, just write 1.
    ensemble_num_systems: 1

    # Loss function
    # - Here you should indicate the loss fuction that you employed.
    loss_function: BCE, CAPI-SDR

    # List of ALL pre-trained models used in the submission.
    # If multiple pre-trained models are used, please copy the lines after [# Model name] and list information on all the pre-trained models.
    pretrained_models:
    -
      name: M2D
      # Access URL for pre-trained model
      url: https://github.com/nttcslab/m2d

      # How to use pre-trained model
      # e.g. text encoder, separation model
      usage: backbone for audio tagging model

  # submitted systems from the computational load perspective.
  complexity:
    # Total amount of parameters involved at inference time
    total_parameters: 119356966
    # Number of GPUs used for training
    gpu_count: 4
    # GPU model name
    gpu_model: NVIDIA RTX 3090

  # List of datasets used for training your system.
  # Unless you also used them to train your system, you do not need to include datasets involved to your pre-trained modules (e.g., datasets used to train M2D models).
  train_datasets:
    - # Dataset name
      name: DCASE2026Task4Dataset
      # Audio source (use !!null if not applicable)
      source: DCASE2026Task4Dataset
      # Dataset access url
      url: https://zenodo.org/records/19328046
      # Is private
      is_private: No
      # Total duration of audio clips (hours)
      total_duration: !!null
      # Used for (e.g., s5_modelling)
      used_for: s5_modelling

  # URL to the source code of the system (optional, write !!null if you do not want to share code)
  source_code: https://github.com/nttcslab/dcase2026_task4_baseline

# System results
results:
  dev_set_test_result:
    # System results on the dev_set/test data.
    # - Each score should contain at least 3 decimals.
    CAPI-SDRi: 8.171
    accuracy_mix: 57.143
    accuracy_src: 67.147

# Questionnaire
questionnaire:
  # Do you give permission for the task organizer to conduct a meta-analysis on your submitted audio samples and to publish a technical report and paper using the results? [mandatory]
  # This does not mean that the copyright of audio samples is transferred to the DCASE community or task 4 organizers.
  publish_audio_samples: Yes

  # Do you agree to allow the DCASE use of your submitted separated audio samples in a future version of this DCASE competition? (not required for competition entry, optional).
  # This may be used in future baseline comparisons or separation challenges.
  # This does not mean that the copyright of audio samples is transferred to the DCASE community or task 4 organizers.
  use_audio_samples: Yes

Task 5 - Audio-Dependent Question Answering

Example meta information file for Task 5 baseline system task5/He_CUHK_task5_1/He_CUHK_task5_1.meta.yaml:

# Submission information for task 5
submission:
  # Submission label
  # Label is used to index submissions.
  # Generate your label following way to avoid
  # overlapping codes among submissions
  # [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
  label: He_CUHK_task5_1
  #
  # Submission name
  # This name will be used in the results tables when space permits
  name: Qwen3-Omni Baseline
  #
  # Submission name abbreviated
  # This abbreviated name will be used in the results table when space is tight.
  # Use maximum 10 characters.
  abbreviation: Qwen3base

# Authors of the submitted system. Mark authors in
# the order you want them to appear in submission lists.
# One of the authors has to be marked as corresponding author,
# this will be listed next to the submission in the results tables.
authors:
  # First author
  - lastname: He
    firstname: Haolin
    email: haolin.he@example.com                   # Contact email address
    corresponding: true                            # Mark true for one of the authors

    # Affiliation information for the author
    affiliation:
      abbreviation: CUHK
      institute: The Chinese University of Hong Kong
      department: Department of Computer Science      # Optional
      location: Hong Kong, China

  # Second author
  # ...

# System information
system:
  end_to_end: true # True if single end-to-end system, false if cascaded (chained) system
  pretrained: true # True if the system is pretrained, false if not
  pre_loaded: qwen3-omni  # Name of the pre-trained model used in the system. If not pretrained, null
  autoregressive_model: true # True if the system is based on autoregressive language model, false if not
  model_size: 30B # Number of total parameters of the system in billions.
  light_weighted: false # True if the system is lightweight submission (i.e. less than 30B parameters)

  # Post processing: Details about the post processing method used in the system
  post_processing: Direct string matching to extract the option letter from the model response

  # Optional. External data resources to train the system
  external_data_resources: [
    "AudioSet"
  ]

# System results on the development set.
    # - Full results are not mandatory, however, they are highly recommended as they are needed for thorough analysis of the challenge submissions.
    # - If you are unable to provide all the results, incomplete results can also be reported.
    # - Each score should contain at least 3 decimals.
results:
  development:
    accuracy: 64.450%

Task 6 - Audio Moment Retrieval from Long Audio

Example meta information file for Task 6 baseline system task6/Munakata_LY_task6_1/Munakata_LY_task6_1.meta.yaml:

# Submission information for task 6
submission:
    # Submission label
    # The label is used to index submissions.
    # Generate your label following way to avoid overlapping codes among submissions:
    # [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
    label: Munakata_LY_task6_1
    #
    # Submission name
    # This name will be used in the results tables when space permits
    name: DCASE2026 baseline system
    #
    # Submission name abbreviated
    # This abbreviated name will be used in the result table when space is tight.
    # Use maximum 10 characters.
    abbreviation: Baseline

    # Authors of the submitted system.
    # Mark authors in the order you want them to appear in submission lists.
    # One of the authors has to be marked as corresponding author,
    # this will be listed next to the submission in the results tables.
    authors:
        # First author
        -   lastname: Munakata
            firstname: Hokuto
            email: hokuto.munakata@lycorp.co.jp                    # Contact email address
            corresponding: true                         # Mark true for one of the authors

            # Affiliation information for the author
            affiliation:
                abbreviation: LY
                institute: LY Corporation
                department: Multimodal AI Unit
                location: Osaka, Japan

        # Second author
        -   lastname: Author
            firstname: Second
            email: first.last@some.org

            affiliation:
                abbreviation: ORG
                institute: Some Organization
                department: Department of Something
                location: City, Country

# System information
system:
    model:
        # Describe the model architecture of your system.
        # If your system is an ensemble of multiple models, please describe all models used in the system.
        audio_models: [
          MS-CLAP,
        ]
        text_models: [
          MS-CLAP
        ]
        # If you use audio llms, such as Qwen2-audio, please specify the number of trainable and freezed parameters.
        LLMs: []
        # Describe the number of trainable parameters in your system.
        trainable_parameters: 7.1 M
        # Describe the number of freezed parameters in your system, if any.
        freezed_parameters: 158.4 M
        loss_function: [
          "L1",
          "gIoU",
          "cross_entropy"
        ]

    dataset:
        # If you use data augmentation, please specify the data augmentation methods used in your system.
        data_augmentation: !!null
        # If you use external data resources except for the provided dataset (i.e., CASTELLA and Clotho-Moment), please specify the name of the data resources used in your system.
        external_data_resources: [ 
            "audiocaps"
        ]
        # Describe the number of audio-caption pairs used for training your system.
        audio_captions: 48k

    ensemble: false

# System results
results:
    development_testing:
        # System results for the development-testing split.
        # Report Recall1@0.5 and Recall1@0.7 for the CASTELLA test set.
        # Full results are not mandatory, however, they are highly recommended as they are needed for through analysis of the challenge submissions.
        # If you are unable to provide all results, also incomplete results can be reported.
        Recall1@0.7: 0.0
        Recall1@0.5: 0.0

Task 7 - Domain-Agnostic Incremental Learning for Audio Classification

Example meta information file for Task 7 baseline system task7/Mulimani_AAU_task7_1/Mulimani_AAU_task7_1.meta.yaml:

# Submission information
submission:
  # Submission label
  # Label is used to index submissions.
  # Generate your label following way to avoid overlapping codes among submissions:
  # [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
  label: Mulimani_TUNI_task7_1

  # Submission name
  # This name will be used in the results tables when space permits.
  name: DCASE2026 Challenge Task 7 baseline system

  # Submission name abbreviated
  # This abbreviated name will be used in the results table when space is tight.
  # Use a maximum of 10 characters.
  abbreviation: Baseline

  # Authors of the submitted system.
  # Mark authors in the order you want them to appear in submission lists.
  # One of the authors has to be marked as corresponding author, this will be listed next to the submission in the results tables.
  authors:
    # First author
    - firstname: Manjunath
      lastname: Mulimani
      email: manjunathm@es.aau.dk # Contact email address
      corresponding: true # Mark true for one of the authors

      # Affiliation information for the author
      affiliation:
        institution: Tampere University
        department: Signal Processing Research Centre # optional
        location: Tampere, Finland

    # Second author
    - firstname: Riccardo
      lastname: Casciotti
      email: riccardo.casciotti@tuni.fi # Contact email address

      # Affiliation information for the author
      affiliation:
        institution: Tampere University
        department: Signal Processing Research Centre # optional
        location: Tampere, Finland

    # Third author
    - firstname: Manu
      lastname: Harju
      email: manu.harju@tuni.fi # Contact email address

      # Affiliation information for the author
      affiliation:
        institution: Tampere University
        department: Signal Processing Research Centre # optional
        location: Tampere, Finland

    # Fourth author
    - firstname: Annamaria
      lastname: Mesaros
      email: annamaria.mesaros@tuni.fi # Contact email address

      # Affiliation information for the author
      affiliation:
        institution: Tampere University
        department: Signal Processing Research Centre # optional
        location: Tampere, Finland

# System information
system:
  # System description, metadata provided here will be used to do a meta-analysis of the submitted system.
  # Use general level tags, when possible use the tags provided in comments.
  # If information field is not applicable to the system, use "!!null".
  description:
    # Data augmentation methods
    # Please specify all methods used (comma-separated list).
    # e.g. mixup, time stretching, block mixing, pitch shifting, ...
    data_augmentation: !!null
  
    # How the system handles catastrophic forgetting
    # Please specify all methods used (comma-separated list).
    # e.g. regularization, distillation, domain_specific_components, ...
    catastrophic_forgetting_handling: domain_specific_components 
    
    # Does the system explicitly predict the domain? (true/false)
    domain_specific_components: true
  
  # URL to the source code of the system [optional, highly recommended]
  # Reproducibility will be used to evaluate submitted systems.
  source_code: https://github.com/mulimani/dcase2026_task7_baseline

# System results
results:
  development_dataset:
    # System results for development dataset.
    # Full results are not mandatory, however, they are highly recommended as they are needed for a thorough analysis of the challenge submissions.

    # Accuracy after D2
    Step2:
      Domain2:
        average: 58.6

        # classiwise
        alarm: 33.08
        dog: 78.26
        engine: 79.71
        fire: 22.22
        footsteps: 61.22
        knock: 75.0
        piano: 67.01
        speech: 52.26
    
    # Accuracy after D3
    Step3:
      Domain2:
        average: 59.0

        # classwise
        alarm: 32.31
        dog: 82.61
        engine: 79.71
        fire: 22.22
        footsteps: 63.27
        knock: 72.22
        piano: 67.01
        speech: 52.26
        
      Domain3:
        average: 46.1
        
        # classwise
        alarm: 74.19
        baby: 54.17
        dog: 20.78
        engine: 57.65
        fire: 37.84
        footsteps: 19.26
        phone: 45.16
        piano: 93.15
        speech: 12.81

Technical report

All participants are expected to submit a technical report about the submitted system, to help the DCASE community better understand how the algorithm works.

Technical reports are not peer-reviewed. The technical reports will be published on the challenge website together with all other information about the submitted system. For the technical report, it is not necessary to follow closely the scientific publication structure (for example there is no need for extensive literature review). The report should however contain a sufficient description of the system.

Please report the system performance using the provided cross-validation setup or development set, according to the task. For participants taking part in multiple tasks, one technical report covering all tasks is sufficient, if the systems have only small differences. Describe the task-specific parameters in the report.

Participants can also submit the same report as a scientific paper to DCASE2026 Workshop. In this case, the paper must respect the structure of a scientific publication, and be prepared according to the provided Workshop paper instructions and template. Please note that the template is slightly different, and you will have to create a separate submission to the DCASE2026 Workshop track in the submission system. Please refer to the workshop webpage for more details. DCASE2026 Workshop papers will be peer-reviewed.

Template

Reports are in format 4+1 pages. Papers are maximum 5 pages, including all text, figures, and references, with the 5th page containing only references. The templates for technical report are available here:

Latex template (133 KB)
version 1.0 (.zip)

Word template (37 KB)
version 1.0 (.docx)

Sample PDF produced with Latex template (158 KB)
version 1.0 (.pdf)

The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.

Content

Introduction

Submission system

Submission package

Submission label

Package structure

System outputs

Meta information

Task 1 - Heterogeneous Audio Classification

Task 2 - Noise-aware Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

Task 3 - Semantic Acoustic Imaging for Sound Event Localization and Detection from Spatial Audio and Audiovisual Scenes

Task 4 - Spatial Semantic Segmentation of Sound Scenes

Task 5 - Audio-Dependent Question Answering

Task 6 - Audio Moment Retrieval from Long Audio

Task 7 - Domain-Agnostic Incremental Learning for Audio Classification

Technical report

Template