Introduction

Challenge submission consists in submission package (one zip-package) containing system outputs and system meta information, and one technical report (pdf file). The technical report can be submitted also as a scientific paper to the DCASE2017 Workshop.

Please prepare the submission package (zip-file) as instructed here. The submission package contains system outputs for all tasks, maximum 4 per task, and systems meta information.
Please use the provided paper template for your technical report.
Follow the submission process to submit your system to the DCASE Challenge.

Submission package

Participants are instructed to pack their system output(s) and system meta information into one zip-package. Example package:

DCASE2017 challenge submission example package (10 kB)
(.zip)

Please prepare your submission zip-file as the provided example. Follow the same file structure and fill meta information with similar structure than as the one in *.meta.yaml -files. The zip-file should contain system outputs for all tasks, maximum 4 submissions per task, and separate meta information for each submission.

More detailed instructions can be found in the following subsections.

Submission label

Submission label is used to index all your submissions (systems per tasks), to avoid overlapping labels among all submitted systems use following way to form your label:

[Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]

As example, for baseline system this would translate to codes:

Heittola_TUT_task1_1
Heittola_TUT_task2_1
Heittola_TUT_task3_1
Elizalde_CMU_task4_1

System meta information

In order to allow meta analysis of submitted systems, participants should provide rough meta information in structured and correctly formatted YAML-file.

See example meta files below for each baseline system. These examples are also available in the example submission package. Meta file structure is mostly the same for all tasks, only the metrics collected in results->development_dataset-section differ per challenge task.

Scenes Task 1 Task 1 - Acoustic scene classification

Example meta information file for Task 1 baseline system task1/Heittola_TUT_task1_1/Heittola_TUT_task1_1.meta.yaml:

# Submission information
submission:
  # Submission label
  # Label is used to index submissions, to avoid overlapping codes among submissions
  # use following way to form your label:
  # [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
  label: Heittola_TUT_task1_1
  # Submission name
  # This name will be used in the results tables when space permits
  name: DCASE2017 baseline system
  # Submission name abbreviated
  # This abbreviated name will be used in the results table when space is tight, maximum 10 characters
  abbreviation: Baseline
  # Submission authors in order, mark one of the authors as corresponding author.
  authors:
    - lastname: Heittola
      firstname: Toni
      email: toni.heittola@tut.fi                 # Contact email address
      corresponding: true                         # Mark true for one of the authors
      # Affiliation information for the author
      affiliation:
        abbreviation: TUT
        institute: Tampere University of Technology
        department: Laboratory of Signal Processing
        location: Tampere, Finland
    - lastname: Mesaros
      firstname: Annamaria
      email: annamaria.mesaros@tut.fi                 # Contact email address
      # Affiliation information for the author
      affiliation:
        abbreviation: TUT
        institute: Tampere University of Technology
        department: Laboratory of Signal Processing
        location: Tampere, Finland
# System information
system:
  # System description, meta data provided here will be used to do
  # meta analysis of the submitted system. Use general level tags, if possible use the tags provided in comments.
  description:
    # Audio input
    input_channels: mono                  # e.g. one or combination [mono, binaural, left, right, mixed, ...]
    input_sampling_rate: 44.1kHz          #
    # Acoustic representation
    acoustic_features: log-mel energies   # e.g one or combination [MFCC, log-mel energies, spectrogram, CQT, ...]
    # Data augmentation methods
    data_augmentation: null               # [time stretching, block mixing, pitch shifting, ...]
    # Machine learning
    machine_learning_method: MLP          # e.g one or combination [GMM, HMM, SVM, kNN, MLP, CNN, RNN, CRNN, NMF, random forest, ensemble, ...]
    # Decision making methods
    decision_making: majority vote        # [majority vote, ...]
  # URL to the source code of the system [optional]
  source_code: https://github.com/TUT-ARG/DCASE2017-baseline-system
# System results
results:
  # System evaluation result for provided the cross-validation setup.
  development_dataset:
    # Overall accuracy (mean of class-wise accuracies)
    overall:
      accuracy: 74.8
    # Class-wise accuracies
    class_wise:
      beach:
        accuracy: 75.3
      bus:
        accuracy:  71.8
      cafe/restaurant:
        accuracy: 57.7
      car:
        accuracy: 97.1
      city_center:
        accuracy: 90.7
      forest_path:
        accuracy: 79.5
      grocery_store:
        accuracy: 58.7
      home:
        accuracy: 68.6
      library:
        accuracy: 57.1
      metro_station:
        accuracy: 91.7
      office:
        accuracy: 99.7
      park:
        accuracy: 70.2
      residential_area:
        accuracy: 64.1
      train:
        accuracy: 58.0
      tram:
        accuracy: 74.8

Alarms Task 2 Task 2 - Detection of rare sound events

Example meta information file for Task 2 baseline system task2/Heittola_TUT_task2_1/Heittola_TUT_task2_1.meta.yaml:

# Submission information
submission:
  # Submission label
  # Label is used to index submissions, to avoid overlapping codes among submissions
  # use following way to form your label:
  # [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
  label: Heittola_TUT_task2_1
  # Submission name
  # This name will be used in the results tables when space permits
  name: DCASE2017 baseline system
  # Submission name abbreviated
  # This abbreviated name will be used in the results table when space is tight, maximum 10 characters
  abbreviation: Baseline
  # Submission authors in order, mark one of the authors as corresponding author.
  authors:
    - lastname: Heittola
      firstname: Toni
      email: toni.heittola@tut.fi                 # Contact email address
      corresponding: true                         # Mark true for one of the authors
      # Affiliation information for the author
      affiliation:
        abbreviation: TUT
        institute: Tampere University of Technology
        department: Laboratory of Signal Processing
        location: Tampere, Finland
    - lastname: Mesaros
      firstname: Annamaria
      email: annamaria.mesaros@tut.fi                 # Contact email address
      # Affiliation information for the author
      affiliation:
        abbreviation: TUT
        institute: Tampere University of Technology
        department: Laboratory of Signal Processing
        location: Tampere, Finland
# System information
system:
  # System description, meta data provided here will be used to do
  # meta analysis of the submitted system. Use general level tags, if possible use the tags provided in comments.
  description:
    # Audio input
    input_channels: mono                  # e.g. one or combination [mono, binaural, left, right, mixed, ...]
    input_sampling_rate: 44.1kHz          #
    # Acoustic representation
    acoustic_features: log-mel energies   # e.g one or combination [MFCC, log-mel energies, spectrogram, CQT, ...]
    # Data augmentation methods
    data_augmentation: null               # [time stretching, block mixing, pitch shifting, ...]
    # Machine learning
    machine_learning_method: MLP          # e.g one or combination [GMM, HMM, SVM, kNN, MLP, CNN, RNN, CRNN, NMF, random forest, ensemble, ...]
    # Decision making methods
    decision_making: median filtering     # [sliding median filter, ...]
  # URL to the source code of the system [optional]
  source_code: https://github.com/TUT-ARG/DCASE2017-baseline-system
# System results
results:
  # System evaluation result for provided the cross-validation setup.
  development_dataset:
    event_based:
      # Overall metrics
      overall:
        er: 0.56
        f1: 71.7
      # Class-wise metrics
      class_wise:
        babycry:
          er: 0.77
          f1: 69.2
        glassbreak:
          er: 0.22
          f1: 88.5
        gunshot:
          er: 0.56
          f1: 71.7

Events Task 3 Task 3 - Sound event detection in real life audio

Example meta information file for Task 3 baseline system task3/Heittola_TUT_task3_1/Heittola_TUT_task3_1.meta.yaml:

# Submission information
submission:
  # Submission label
  # Label is used to index submissions, to avoid overlapping codes among submissions
  # use following way to form your label:
  # [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
  label: Heittola_TUT_task3_1
  # Submission name
  # This name will be used in the results tables when space permits
  name: DCASE2017 baseline system
  # Submission name abbreviated
  # This abbreviated name will be used in the results table when space is tight, maximum 10 characters
  abbreviation: Baseline
  # Submission authors in order, mark one of the authors as corresponding author.
  authors:
    - lastname: Heittola
      firstname: Toni
      email: toni.heittola@tut.fi                 # Contact email address
      corresponding: true                         # Mark true for one of the authors
      # Affiliation information for the author
      affiliation:
        abbreviation: TUT
        institute: Tampere University of Technology
        department: Laboratory of Signal Processing
        location: Tampere, Finland
    - lastname: Mesaros
      firstname: Annamaria
      email: annamaria.mesaros@tut.fi                 # Contact email address
      # Affiliation information for the author
      affiliation:
        abbreviation: TUT
        institute: Tampere University of Technology
        department: Laboratory of Signal Processing
        location: Tampere, Finland
# System information
system:
  # System description, meta data provided here will be used to do
  # meta analysis of the submitted system. Use general level tags, if possible use the tags provided in comments.
  description:
    # Audio input
    input_channels: mono                  # e.g. one or combination [mono, binaural, left, right, mixed, ...]
    input_sampling_rate: 44.1kHz          #
    # Acoustic representation
    acoustic_features: log-mel energies   # e.g one or combination [MFCC, log-mel energies, spectrogram, CQT, ...]
    # Data augmentation methods
    data_augmentation: null               # [time stretching, block mixing, pitch shifting, ...]
    # Machine learning
    machine_learning_method: MLP          # e.g one or combination [GMM, HMM, SVM, kNN, MLP, CNN, RNN, CRNN, NMF, random forest, ensemble, ...]
    # Decision making methods
    decision_making: median filtering     # [sliding median filter, ...]
  # URL to the source code of the system [optional]
  source_code: https://github.com/TUT-ARG/DCASE2017-baseline-system
# System results
results:
  # System evaluation result for provided the cross-validation setup.
  development_dataset:
    segment_based:
      # Overall metrics
      overall:
        er: 0.69
        f1: 56.7
      # Class-wise metrics
      class_wise:
        brakes_squeking:
          er: 0.98
          f1: 4.1
        car:
          er: 0.57
          f1: 74.1
        children:
          er: 1.35
          f1: 0.0
        large_vehicle:
          er: 0.90
          f1: 50.8
        people_speaking:
          er: 1.25
          f1: 18.5
        people_walking:
          er: 0.84
          f1: 55.6

Large-scale Task 4 Task 4 - Large-scale weakly supervised sound event detection for smart cars

Example meta information file for Task 4 baseline system task4/Elizalde_CMU_task4_1/Elizalde_CMU_task4_1.meta.yaml:

# Submission information
submission:
  # Submission label
  # Label is used to index submissions, to avoid overlapping codes among submissions
  # use following way to form your label:
  # [Last name of corresponding author]_[Abbreviation of institute of the corresponding author]_task[task number]_[index number of your submission (1-4)]
  label: Elizalde_CMU_task4_1
  # Submission name
  # This name will be used in the results tables when space permits
  name: DCASE2017 baseline system
  # Submission name abbreviated
  # This abbreviated name will be used in the results table when space is tight, maximum 10 characters
  abbreviation: Baseline
  # Submission authors in order, mark one of the authors as corresponding author.
  authors:
    - lastname: Elizalde
      firstname: Benjamin
      email: bmartin1@andrew.cmu.edu              # Contact email address
      corresponding: true                         # Mark true for one of the authors
      # Affiliation information for the author
      affiliation:
        abbreviation: CMU
        institute: Carnegie Mellon University
        location: Pittsburgh, USA
    - lastname: Badlani
      firstname: Rohan
      email: rohan.badlani@gmail.com              # Contact email address
      # Affiliation information for the author
      affiliation:
        institute: Birla Institute of Technology & Science
        location: Rajasthan, India
    - lastname: Shah
      firstname: Ankit
      email: ankit.tronix@gmail.com               # Contact email address
      # Affiliation information for the author
      affiliation:
        abbreviation: CMU
        institute: Carnegie Mellon University
        location: Pittsburgh, USA
# System information
system:
  # System description, meta data provided here will be used to do
  # meta analysis of the submitted system. Use general level tags, if possible use the tags provided in comments.
  description:
    # Audio input
    input_channels: mono                  # e.g. one or combination [mono, binaural, left, right, mixed, ...]
    input_sampling_rate: 44.1kHz          #
    # Acoustic representation
    acoustic_features: log-mel energies   # e.g one or combination [MFCC, log-mel energies, spectrogram, CQT, ...]
    # Data augmentation methods
    data_augmentation: null               # [time stretching, block mixing, pitch shifting, ...]
    # Machine learning
    machine_learning_method: MLP          # e.g one or combination [GMM, HMM, SVM, kNN, MLP, CNN, RNN, CRNN, NMF, random forest, ensemble, ...]
    # Decision making methods
    decision_making: median filtering     # [sliding median filter, ...]
  # URL to the source code of the system [optional]
  source_code: https://github.com/ankitshah009/Task-4-Large-scale-weakly-supervised-sound-event-detection-for-smart-cars
# System results
results:
  # System evaluation result for provided the cross-validation setup.
  development_dataset:
    subtask_a:
        overall:
          f1: 19.8
          precision: 16.2
          recall: 25.6
    subtask_b:
      segment_based:
        # Overall metrics
        overall:
          er: 1.00
          f1: 11.4

System output

Participants must submit the results for the provided evaluation dataset (see download page).

Tasks are independent. You can participate a single task or multiple tasks.
Multiple submissions for the same task are allowed (maximum 4 per task). Use running index in the submission label, and give more detailed names for the submitted system in the system meta information files. Please mark carefully the connection between the submitted systems and system parameters description in the technical report (for example by referring to the systems with submission label or system name (given in the system meta information file)).
Submitted system outputs will be published online on the DCASE2017 website later to allow future evaluations.

Examples for formatting the output for the different tasks are given below.

Task 1 - Acoustic scene classification

Single text-file (in CSV format) containing classification result for each audio file in the evaluation set. Result items can be in any order. Format:

[filename (string)][tab][scene label (string)]

Example task1_results.txt file

audio/178.wav    residential_street
audio/62.wav     office
audio/261.wav    home
...

Task 2 - Detection of rare sound events

Single text-file (in CSV format) containing detected sound event from each audio file. Events can be in any order. Format:

[filename (string)][tab][event onset time in seconds (float)][tab][event offset time in seconds (float)][tab][event label (string)]

Example task2_results.txt file

audio/mixture_evaltest_babycry_001_0e22e5d08617707ea812e0268d628031.wav 1.44    3.8 babycry
audio/mixture_evaltest_babycry_000_35c7bc20a21ec8fbb7097c6fb71487b5.wav
audio/mixture_evaltest_glassbreak_000_c711628a46aab5b1032e19b003bf78d7.wav  2.44    5.8 glassbreak
audio/mixture_evaltest_glassbreak_001_1b1c3c26ee642fafed65ed873910adad.wav
audio/mixture_evaltest_gunshot_001_9a2eb11a7c6edea6c75ff30dc3f5de12.wav 3.44    6.8 gunshot
audio/mixture_evaltest_gunshot_002_54d69de42bbd0acc7b4a89b0207eacf1.wav

...

If no event is detected for the particular audio signal, the system should still output a row containing only the file name, to indicate that the file was processed. This is used to verify that participants processed all evaluation files.

Task 3 - Sound event detection in real life audio

Single text-file (in CSV format) containing detected sound event from each audio file. Events can be in any order. Format:

[filename (string)][tab][event onset time in seconds (float)][tab][event offset time in seconds (float)][tab][event label (string)]

Example task3_results.txt file

audio/a029.wav  1.0000    2.8000    car
audio/a029.wav  0.0000    0.2000    people walking
audio/a033.wav
audio/a034.wav  4.0000    4.1000    brakes squeaking
...

Task 4 - Large-scale weakly supervised sound event detection for smart cars

For Subtasks A and B, provide a text-file (in CSV format) containing detected sound event from each audio file. Events can be in any order. Detection for both subtasks--with and without timestamps will be evaluated using the same output format, but the latter will ignore the timestamps. The system output file can be the same for both subtasks or two different versions can be provided. We require one folder package per system output, which includes one yaml file, and one or two prediction files (one for each subtask or one for both). Note that you are not obligated to submit a prediction file for both subtasks if you want to participate only in one subtask.

Format:

[filename (string)][tab][event onset time in seconds (float)][tab][event offset time in seconds (float)][tab][event label (string)]

Note that "audio/" and/or "Y" can be inserted in front of the filename and it would be properly parsed by the metric scripts, assuming the chosen convention is consistent throughout the file.

Example team_task4_results.txt file

Y--0w1YA1Hm4_30.000_40.000.wav  1.0000    2.8000    Car alarm
Y-fCSO8SVWZU_6.000_16.000.wav   0.0000    0.2000    Police car (siren)
Y0Hz4R_m0hmI_80.000_90.000.wav    7.3000    9.4000    Fire engine, fire truck (siren)
Y0Hz4R_m0hmI_80.000_90.000.wav
...

Package structure

Make sure your zip-package has the following structure (example zip-file for baseline systems):

Zip-package root
│
└───task1                                       Task1 submissions
│   │
│   └───Heittola_TUT_task1_1                    System 1 submission files
│   │   │ Heittola_TUT_task1_1.meta.yaml        System 1 meta information
│   │   │ Heittola_TUT_task1_1.output.txt       System 1 output
│   │
│   └───Heittola_TUT_task1_1                    System 2 submission files
│   │   │ Heittola_TUT_task1_2.meta.yaml        System 2 meta information
│   │   │ Heittola_TUT_task1_2.output.txt       System 2 output
│   │
│   │...
│
└───task2                                       Task2 submissions
│   │
│   └───Heittola_TUT_task2_1                    System 1 submission files
│   │   │ Heittola_TUT_task2_1.meta.yaml        System 1 meta information
│   │   │ Heittola_TUT_task2_1.output.txt       System 1 output
│   │
│   │...
│
│
└───task3                                       Task3 submissions
│   │
│   └───Heittola_TUT_task3_1                    System 1 submission files
│   │   │ Heittola_TUT_task3_1.meta.yaml        System 1 meta information
│   │   │ Heittola_TUT_task3_1.output.txt       System 1 output
│   │
│   │...
│
│
└───task4                                       Task4 submissions
    │
    └───Elizalde_CMU_task4_1
    │   │ Elizalde_CMU_task4_1.meta.yaml        System 1 meta information
    │   │ Elizalde_CMU_task4_1_A.output.txt     System 1 output subtask A
    │   │ Elizalde_CMU_task4_1_B.output.txt     System 1 output subtask B
    │   │ Elizalde_CMU_task4_1_AB.output.txt    In case the System 1 output is used for both subtasks
    │
    │...

Technical report

All participants are expected to submit a technical report about the submitted system, to help the DCASE community better understand how the algorithm works.

Technical reports are not peer-reviewed. The technical reports will be published on the challenge website together with all other information about the submitted system. For the technical report it is not necessary to follow closely the scientific publication structure (for example there is no need for extensive literature review). The report should however contain sufficient description of the system.

Please report the system performance using the provided cross-validation setup or development set, according to the task. For participants taking part in multiple tasks, one technical report covering all tasks is sufficient.

Authors have the possibility to update their technical report to include the challenge evaluation results. The deadline for the camera-ready report is 20th October 2017.

When referring to the DCASE2017 Challenge use the following:

Publication

A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Raj, and T. Virtanen. DCASE 2017 challenge setup: tasks, datasets and baseline system. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), 85–92. November 2017.

PDF

DCASE 2017 Challenge Setup: Tasks, Datasets and Baseline System

Abstract

DCASE 2017 Challenge consists of four tasks: acoustic scene classification, detection of rare sound events, sound event detection in real-life audio, and large-scale weakly supervised sound event detection for smart cars. This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset. The baseline systems for all tasks rely on the same implementation using multilayer perceptron and log mel-energies, but differ in the structure of the output layer and the decision making process, as well as the evaluation of system output using task specific metrics.

Keywords

Sound scene analysis, Acoustic scene classification, Sound event detection, Audio tagging, Rare sound events

PDF

Participants can also submit the same report as a scientific paper to DCASE 2017 Workshop. In this case, the paper must respect the structure of a scientific publication, and be prepared according to the provided Workshop paper instructions. Please report the system performance using the provided cross-validation setup or development set, according to the task. Scientific papers will be peer-reviewed.

Template

Report are in format 4+1 pages. Papers are maximum 5 pages, including all text, figures, and references, with the 5th page containing only references. The template is the same for the technical report and DCASE2017 Workshop papers.

Latex template (279 KB)
version 1.1 (.zip)

Word template (70 KB)
version 1.0 (.doc)

Submission process

Create a user account in the Submission system. The submitting author is considered corresponding author for the submission.
Select Your Submissions in the menu. Using this system you can submit a contribution to the DCASE 2017 Challenge and also to DCASE 2017 Workshop.
Select Challenge Submission.
- Please fill in names and affiliations of all authors.
- Fill in submission details (title and abstract).
- Indicate the task you submit for by selecting the corresponding topic. If your zip-file contains system outputs for multiple tasks, tick multiple topics accordingly.
- Mark if the main author is a student.
- Check the copyright form box.
- At the next step you will be able to upload the files.
Upload files:
- Technical report in pdf format
- System outputs as zip-file
The system will send a confirmation email.
If you intend to submit your technical report as a scientific paper for the DCASE2017 Workshop, return to step 2 (Your Submissions) in the submission system and select Workshop submission. Follow the steps to upload the pdf. The system will also send a confirmation email about the workshop submission.

Content