Challenge has ended. Full results for this task can be found in the Results page.

If you are interested in the task, you can join us on the dedicated slack channel

We have released the ground truth labels and evaluator for the evaluation dataset.

Description

Anomalous sound detection (ASD) is the task of identifying whether the sound emitted from a target machine is normal or anomalous. Automatic detection of mechanical failure is an essential technology in the fourth industrial revolution, which involves artificial-intelligence-based factory automation. Prompt detection of machine anomalies by observing sounds is useful for monitoring the condition of machines. Figure 1 shows an overview of the detection system.

This task is the follow-up from DCASE 2020 Task 2 to DCASE 2023 Task 2. The task this year is to develop an ASD system that meets the following five requirements.

Train a model using only normal sound (unsupervised learning scenario)
Because anomalies rarely occur and are highly diverse in real-world factories, it can be difficult to collect exhaustive patterns of anomalous sounds. Therefore, the system must detect unknown types of anomalous sounds that are not provided in the training data. This is the same requirement as in the previous tasks.
Detect anomalies regardless of domain shifts (domain generalization task)
In real-world cases, the operational states of a machine or the environmental noise can change to cause domain shifts. Domain-generalization techniques can be useful for handling domain shifts that occur frequently or are hard-to-notice. In this task, the system is required to use domain-generalization techniques for handling these domain shifts. This requirement is the same as in DCASE 2022 Task 2 and DCASE 2023 Task 2.
Train a model for a completely new machine type
For a completely new machine type, hyperparameters of the trained model cannot be tuned. Therefore, the system should have the ability to train models without additional hyperparameter tuning. This requirement is the same as in DCASE 2023 Task 2.
Train a model using a limited number of machines from its machine type
While sounds from multiple machines of the same machine type can be used to enhance the detection performance, it is often the case that only a limited number of machines are available for a machine type. In such a case, the system should be able to train models using a few machines from a machine type. This requirement is the same as in DCASE 2023 Task 2.
Train a model both with or without attribute information
While additional attribute information can help enhance the detection performance, we cannot always obtain such information. Therefore, the system must work well both when attribute information is available and when it is not.

The last requirement is newly introduced in DCASE 2024 Task2.

First-shot problem under attribute-available and unavailable conditions: Focus of task

The focus of this year's task is the same as last year, the first-shot problem, with one modification. We explain the importance of the first-shot problem and how they are reflected on the requirements.

First, in real-world applications, tuning hyperparameters using test data is often infeasible because we can face completely new machine types or the amount of test data can be insufficient for tuning hyperparameters. This problem motivated the organizers to set the third requirement described above and prepare completely different set of machine types between the development dataset and evaluation dataset.

Second, there can be a limited number of machines for a machine type. Until DCASE 2023 task 2, multiple sections from multiple different machines were provided for each machine type. Although this feature has led to the development of outlier exposure approaches that use sound clips from different machines as anomalies, in many practical cases, the number of machines for a machine type can be limited. This is because the customers may not have multiple machines or they may first plan to install the system for a few machines. Considering these cases, the organizers set the forth requirement and prepared only one section for each machine type since DCASE 2023 task 2.

In addition to the above two features of the first-shot problem, we introduced another requirement following real-world applications. When recording machine sounds, we cannot always obtain information on the machine conditions or types of noise along with them. In such cases, some effective methods such as attribute classification methods cannot be used. To reflect such situations, we introduced the fifth requirement and hid the additional attribute information for some machine types.

In summary, the main features in the task this year are that: (1) The set of machine types in the development dataset and evaluation dataset are completely different, (2) Each machine type contains only one section, and (3) additional attribute information is not given for some machine types.

Schedule

Based on the DCASE Challenge 2024 schedule, the task important days will be as follows.

Task open: 1st of April 2024
Additional training dataset release: 15th of May 2024
Evaluation dataset release: 1st of June 2024
External resource list lock: 1st of June 2024
Challenge deadline: 15th of June 2024
Challenge results: 30th of June 2024

External resources on the "List of external datasets and models allowed" can be used (cf. external data resource section). List of external datasets and models allowed will be updated upon request. Any external resource which are freely accessed before 15th of May 2024 can be added. Please send a request email to the task organizers. The list will be locked after the release date of evaluation dataset (1st of June 2024). To avoid developing new external resources using machine information in the evaluation dataset, we will release the additional training dataset after 15th of May 2024. Note that the additional training dataset contains matching training data of machines used in the evaluation dataset (cf. dataset section).

Audio datasets

Dataset overview

Three datasets (development dataset, additional training dataset, and evaluation dataset) are provided for this task.

The development dataset consists of normal/anomalous operating sounds of seven types of machines. Each recording is a single-channel 10-sec length audio clip that includes both the sounds of the target machine and environmental sounds. The following seven types of machines are used:

Fan
Gearbox
Bearing
Slide rail
Toy car
Toy train
Valve

The additional training and evaluation datasets also consist of normal/anomalous operating sounds of machines, but the sets of machine types are completely different from the development dataset.

The datasets consist of sounds from nine types of real/toy machines. Each recording is single-channel audio, including a machine's operating sound and environmental noise. The duration of recordings varies from 6 to 18 sec, depending on the machine type. The detailed information on the additional machine types will be provided after 15th of May.

Figure 2 shows an overview of the datasets for development, additional training, and evaluation. Each dataset consists of several types of machines, and each type of machine consists of one "section".

Definition

We first define the key terms in this task: "machine type," "section," "source domain," "target domain," and "attributes."

"Machine type" indicates the type of machine, which in the development dataset is one of seven: fan, gearbox, bearing, slide rail, valve, ToyCar, and ToyTrain.
A section is defined as a subset of the dataset for calculating performance metrics.
The source domain is the domain under which most of the training data and some of the test data were recorded, and the target domain is a different set of domains under which some of the training data and some of the test data were recorded. There are differences between the source and target domains in terms of operating speed, machine load, viscosity, heating temperature, type of environmental noise, signal-to-noise ratio, etc.
Attributes are parameters that define states of machines or types of noise.

Development, additional training, and evaluation datasets

Our entire dataset consists of three datasets:

1. Development dataset: This dataset consists of seven machine types. For each machine type, one section is provided, and the section is a complete set of training and test data. For each section, this dataset provides (i) 990 clips of normal sounds in the source domain for training, (ii) ten clips of normal sounds in the target domain for training, and (iii) 100 clips each of normal and anomalous sounds for the test. The source/target domain of each sample is provided. Additionally, the attributes of each sample in the training and test data are provided in the file names and attribute csv files.

2. Additional training dataset: This dataset also consists of several machine types, but the set of machine types are completely different from the development dataset. This dataset also provides one section for each machine type. Each section consists of (i) 990 clips of normal sounds in the source domain for training and (ii) ten clips of normal sounds in a target domain for training. The domain of each sample and attributes are provide. Participants may also use this dataset for training. The additional training dataset will be open on May 15th.

3. Evaluation dataset: This dataset provides test clips for the sections in the additional training dataset. Each section has 200 test clips, none of which have a condition label (i.e., normal or anomaly) or information about the domain to which it belongs (i.e., source or target). Attributes are not provided. The additional training dataset will be open on June 1st.

File names and attribute csv files

File names and attribute csv files provide reference labels for each clip. The given reference labels for each training/test clip include machine type, section index, normal/anomaly information, and attributes regarding the condition other than normal/anomaly. The machine type is given by the directory name. The section index is given by their respective file names. For the datasets other than the evaluation dataset, the normal/anomaly information and the attribute information are given by their respective file names. Note that for machine types that has its attribute information hidden, the attribute information in each file names are only labeled as "noAttributes". Attribute csv files are for easy access to attributes that cause domain shifts. In these files, the file names, name of parameters that cause domain shifts (domain shift parameter, dp), and the value or type of these parameters (domain shift value, dv) are listed. Each row takes the following format:

[filename (string)], [d1p (string)], [d1v (int | float | string)], [d2p], [d2v]...

For machine types that have their attribute information hidden, all columns except the filename column are left blank for each row.

Recording procedure

Normal/anomalous operating sounds of machines and its related equipment are recorded. Anomalous sounds were collected by deliberately damaging target machines. For simplifying the task, we use only the first channel of multi-channel recordings; all recordings are regarded as single-channel recordings of a fixed microphone. We mixed a target machine sound with environmental noise, and only noisy recordings are provided as training/test data. The environmental noise samples were recorded in several real factory environments. We will publish papers on the dataset to explain the details of the recording procedure by the submission deadline.

Short description of each section in the development dataset

Short descriptions of each section in the development dataset and the attribute format in the file names of their training data are as follows:

Machine type	Section	Description	Attribute format in file names of training data
ToyCar	00	Car model, speed, and mic variations between domains.	car_<car_model>_spd(speed)_<speed_level_times_10>_mic_<microphone_number>
ToyTrain	00	Train model, speed, and mic variations between domains.	No attributes
Fan	00	Mixing of different machine sound between domains.	n(noise)_<noise_index>
Gearbox	00	Different Gearbox model and weight attached to the box between domains.	No attributes
Bearing	00	Different Bearing model, rotation velocity and location of the microphone between domains.	pro(bearing_product_model)_<bearing_product_model>_vel(velocity)_ <velocity>_loc(location_of_the_microphone)_<location>
Slide rail	00	Different operation velocity and acceleration between domains.	No attributes
Valve	00	Open/close operation patterns varies between domains.	v1pat(pattern)_<pattern_index_of_value_1>_v2pat(pattern)_<pattern_index_of_value_2>

Short description of each section in the additional training dataset

Short descriptions of each section in the additional training dataset and the attribute format in the file names of their training data are as follows:

Machine type	Section	Description	Attribute format in file names of training data
3DPrinter	00	3D-Printer	mdl(model)_<3d_model_id>_spd(speed)_<speed_level>
AirCompressor	00	Air compressor with an air blow gun	No attributes
BrushlessMotor	00	Rotating brushless motor	No attributes
HairDryer	00	Hairdryer blowing continuously	id_<model_id>_spd(speed)_<speed_level>_mic_<microphone_number>
HoveringDrone	00	Drone hovering stationary	No attributes
RoboticArm	00	Robotic arm simulating an industrial pick and place task	weight<load_weight_level>_Bckg<background_noise_type>
Scanner	00	Document scanner scanning paper sheets	res(resolution)_<scanning_resolution_level>_len(length)_<paper_length_level>
ToothBrush	00	Electric toothbrush brushing a fixed point	No attributes
ToyCircuit	00	Toy car racing on a circuit track	id_<model_id>_spd(speed)_<speed_level>_mic_<microphone_number>

External data resources

Based on the past DCASE's external data resource policy, we allow the use of external datasets and trained models under the following conditions:

Any test data in both development and evaluation datasets shall not be used for training.
Any data in ToyADMOS, ToyADMOS2, MIMII Dataset, MIMII DUE Dataset, MIMII DG Dataset, the dataset of DCASE 2020 Challenge Task 2 , the dataset of DCASE 2021 Challenge Task 2 , the dataset of DCASE 2022 Challenge Task 2, and the dataset of DCASE 2023 Challenge Task 2 shall not be used.
Datasets, pre-trained models, and pre-trained parameters on the "List of external data resources allowed" can be used. The list will be updated upon request. Datasets, pre-trained models, and pre-trained parameters, which are freely accessible by any other research group before 15th of May 2024, can be added to the list.
To add sources of external datasets, pre-trained models, or pre-trained parameters to the list, send a request to the organizers by the evaluation set publishing date. To give an equal opportunity to use them for all competitors, we will update the "list of external data resources allowed" on the web page accordingly.
Once the evaluation set is published, no further external sources will be added. The list will be locked after 1st of June 2024.

List of external data resources allowed:

Dataset name	Type	Added	Link
IDMT-ISA-ELECTRIC-ENGINE	audio	15.03.2022	https://www.idmt.fraunhofer.de/en/publications/isa-electric-engine.html
AudioSet	audio	15.03.2022	https://research.google.com/audioset/
VGGish	model	15.03.2022	https://github.com/tensorflow/models/tree/master/research/audioset/vggish
OpenL3	model	15.03.2022	https://openl3.readthedocs.io/en/latest/
PANNs	model	15.03.2022	https://zenodo.org/record/3576403/
PyTorch Image Models (including tens of pre-trained models)	model	15.03.2022	https://github.com/rwightman/pytorch-image-models
torchvision.models (including tens of pre-trained models)	model	15.03.2022	https://pytorch.org/vision/stable/models.html
Meta AI pre-trained models on Hugging Face (including hundreds of pre-trained models)	model	10.04.2023	https://huggingface.co/facebook
Meta AI datasets on Hugging Face (including 8 datasets)	audio	10.04.2023	https://huggingface.co/facebook
Fairseq pre-trained models (including tens of pre-trained models)	model	10.04.2023	https://github.com/facebookresearch/fairseq/tree/main/examples
UniSpeech	model	10.04.2023	https://github.com/microsoft/UniSpeech
AudioLDM	model	20.04.2023	https://zenodo.org/record/7813012
CED	model	03.04.2024	https://github.com/RicherMans/CED
BEATs	model	24.05.2024	https://github.com/microsoft/unilm/tree/master/beats
EAT	model	24.05.2024	https://github.com/cwx-worst-one/EAT
Tango	model	24.05.2024	https://github.com/declare-lab/tango
Audiobox	model	24.05.2024	https://audiobox.metademolab.com/
EnCodec	model	24.05.2024	https://github.com/facebookresearch/encodec
Llama 3	model	24.05.2024	https://github.com/meta-llama/llama3

Download

Task 2 Development dataset (2.2 GB)

Additional training dataset (2.0 GB)

version 2.0

Evaluation dataset (0.4 GB)

version 1.0

Task setup and rules

Participants are required to submit both an anomaly score and a normal/anomaly decision result for each test clip. The anomaly score for each test clip will be used to calculate the area under the receiver operating characteristic (ROC) curve (AUC) and partial-AUC (pAUC) scores, which are used to calculate an official score and a final ranking. The normal/anomaly decision result for each test clip is used to calculate the precision, recall, and F1 scores, which will also be published when the challenge results are open. The method of evaluation is described in the Evaluation section.

The anomaly score takes a large value when the input signal seems to be anomalous, and vice versa. To calculate the anomaly score, participants need to train an anomaly score calculator $\mathcal{A}$ with parameter $\theta$. The input of $\mathcal{A}$ is a machine's operating sound $x \in \mathbb{R}^L$ and its machine information including machine type, section index, and other attribute information, and $\mathcal{A}$ outputs one anomaly score for the whole audio clip $x$ as $\mathcal{A}_\theta (x) \in \mathbb{R}$. Then, $x$ is determined to be anomalous when the anomaly score exceeds a pre-defined threshold value. Thus, $\mathcal{A}$ needs to be trained so that $\mathcal{A}_\theta(x)$ will be a large value both when the whole audio clip $x$ is anomalous and when a part of $x$ is anomalous, such as with collision anomalous sounds.

Figure 3 shows the overview of this task, where the example is a procedure for calculating the anomaly scores of the test clips of (fan, section 00, target domain). First, the participants train an anomaly score calculator $\mathcal{A}$ using training data both in the source and target domains and optional external data resources. Then, by using $\mathcal{A}$, participants calculate anomaly scores of all the test clips of (fan, section 00, target domain). By repeating this procedure, participants calculate the anomaly score of all the test clips of all the machine types, sections, and domains.

Arbitral numbers of an anomaly score calculator $\mathcal{A}$ can be used to calculate the anomaly scores of test clips. The simplest strategy is to use a single $\mathcal{A}$ to calculate the anomaly scores for a single section (e.g., section 00). In this case, $\mathcal{A}$ is specialized to a single section, so users of such a system are required to train $\mathcal{A}$ for each machine type, each product, and each condition. A more challenging strategy is to use a single $\mathcal{A}$ to calculate the anomaly scores of all the test clips of all the machine types and sections. The advantage of this strategy is that participants can use all the training clips provided; however, they need to consider the generalization of the model. Another typical scenario that can be inspired by real-world applications is where you train a general model only with the source-domain data. The task organizers do not impose this constraint but would appreciate participants’ efforts to impose constraints on themselves based on various real-world applications.

All training data with arbitrary splitting can be used to train an anomaly score calculator. For example, to train $\mathcal{A}$ to calculate the anomaly score of (valve, section 00, source domain), participants can opt to use training data only in (valve, section 00, source domain), training data in both the source domain and target domains, training data of all sections of valves, all provided training data, and/or other strategies. Of course, normal/anomalous clips in test data cannot be used for training; however, simulating anomalous samples using the listed external data resources is allowed.

Changing the model (model/architecture/hyperparameters) between machine types within a single submission is allowed. However, we expect participants to develop a simple ASD system, (i.e. keep the model and hyperparameters fixed and only change the training data to adapt to each machine type).

Submission

The official challenge submission consists of:

System output for the evaluation data
Meta information files

System output should be presented as a text-file that corresponds to each machine type, section index. Its file name should be:

Anomaly score file: anomaly_score_<machine_type>_section_<section_index>.csv
Detection result file: decision_result_<machine_type>_section_<section_index>.csv

The anomaly score file (in CSV format, without header row) contains the anomaly score for each audio file in the test data of the evaluation dataset. Result items can be in any order. All rows must be in the following format:

[filename (string)],[anomaly score (real value)]

Anomaly scores in the second column can take a negative value. For example, typical auto-encoder-based anomaly score calculators use the squared reconstruction error, which takes a non-negative value, while statistical model-based methods (such as GMM) use the negative log-likelihood as the anomaly score, which can take both positive and negative values.

The decision result file (in CSV format, without header row) contains the normal/anomaly decision result for each audio file in the test data of the evaluation dataset. Result items can be in any order. All rows must be the following format:

[filename (string)],[decision result (0: normal, 1: anomaly)]

We allow up to four system output submissions per participant/team. For each system, meta information should be provided in a separate file that contains the task-specific information. All files should be packaged into a zip file for submission. Detailed information on the submission process can be found on the Submission page.

Evaluation

Metrics

This task is evaluated with the AUC and the pAUC. The pAUC is an AUC calculated from a portion of the ROC curve over the pre-specified range of interest.

Because the anomaly detector is expected to work with the same threshold regardless of the domain, data from both domains in a section are used to calculate the AUC and pAUC. Also, in order to evaluate the detection performance for each domain, the AUC is calculated for each domain. The AUC for each machine type, section, and domain (source/target) and the pAUC for each machine type and section are defined as

$$ {\rm AUC}_{m, n, d} = \frac{1}{N^{-}_{d}N^{+}_{n}} \sum_{i=1}^{N^{-}_{d}} \sum_{j=1}^{N^{+}_{n}} \mathcal{H} (\mathcal{A}_{\theta} (x_{j}^{+}) - \mathcal{A}_{\theta} (x_{i}^{-})), $$

$$ {\rm pAUC}_{m, n} = \frac{1}{\lfloor p N^{-}_{n} \rfloor N^{+}_{n}} \sum_{i=1}^{\lfloor p N^{-}_{n} \rfloor} \sum_{j=1}^{N^{+}_{n}} \mathcal{H} (\mathcal{A}_{\theta} (x_{j}^{+}) - \mathcal{A}_{\theta} (x_{i}^{-})) $$

where $m$ represents the index of a machine type, $n$ represents the index of a section, $d = \{ {\rm source}, {\rm target} \}$ represents a domain, $\lfloor \cdot \rfloor$ is the flooring function, and $\mathcal{H} (x)$ returns 1 when $x$ > 0 and 0 otherwise. Here, $\{x_{i}^{−}\}_{i=1}^{N^{-}_{d}}$ is normal test clips in the domain $d$ in the section $n$ in the machine type $m$ and $\{x_{j}^{+}\}_{j=1}^{N^{+}_{n}}$ is anomalous test clips in the section $n$ in the machine type $m$, respectively, and they have been sorted so that their anomaly scores are in descending order. Here, $N^{-}_{d}$ is the number of normal test clips in the domain $d$ in the section $n$ in the machine type $m$, $N^{-}_{n}$ are the number of normal test clips in the section $n$ in the machine type $m$, and $N^{+}_{n}$ is the number of anomalous test clips in the section $n$ in the machine type $m$, respectively.

In our metric, the pAUC is calculated as the AUC over a low false-positive-rate (FPR) range $[0, p]$. The reason for the additional use of the pAUC is based on practical requirements. If an ASD system frequently gives false alarms, we cannot trust it. Therefore, it is especially important to increase the true-positive-rate under low FPR conditions. In this task, we will use $p=0.1$.

The official score $\Omega$ for each submitted system is given by the harmonic mean of the AUC and pAUC scores over all the machine types, sections, and domains as follows:

$$ \Omega = h \left\{ {\rm AUC}_{m, n, d}, \ {\rm pAUC}_{m, n} \quad | \quad m \in \mathcal{M}, \ n \in \mathcal{S}(m), \ d \in \{ {\rm source}, {\rm target} \} \right\}, $$

where $h\left\{\cdot\right\}$ represents the harmonic mean (over all machine types, sections, and domains), $\mathcal{M}$ represents the set of the machine types, and $\mathcal{S}(m)$ represents the set of the sections for the machine type $m$.

As the equations above show, a threshold value does not need to be determined to calculate AUC, pAUC, or the official score because the threshold value is the anomaly scores of normal test clips. However, in real applications, the threshold value must be determined, and a decision must be made as to whether it is normal or anomalous. Therefore, participants are also required to submit the normal/anomaly decision results. The organizers will publish the AUC, pAUC, and official scores as well as the precision, recall, and F1-scores calculated for the normal/anomaly decision results.

Note: The submitted normal/anomaly decision results will not be used for the final ranking because the task organizers do not want to encourage participants to use a forbidden approach (i.e., threshold tuning based on the distribution in the evaluation dataset). Do not use other test clips to determine anomalies for each test clip.

Ranking

The final ranking will be decided by sorting based on the official score $\Omega$.

Ground truth labels and evaluator for evaluation dataset

The dcase2024_task2_evaluator includes the ground truth labels and it calculates the AUC, pAUC, precision, recall, and F1 scores from the anomaly score list for the evaluation dataset.

DCASE2024 Task 2 evaluator, repository

Results

Rank	Submission Information		Evaluation Dataset																				Development Dataset
Rank	Submission Code	Technical Report	Official Rank	Official Score	3DPrinter (AUC)	3DPrinter (pAUC)	AirCompressor (AUC)	AirCompressor (pAUC)	BrushlessMotor (AUC)	BrushlessMotor (pAUC)	HairDryer (AUC)	HairDryer (pAUC)	HoveringDrone (AUC)	HoveringDrone (pAUC)	RoboticArm (AUC)	RoboticArm (pAUC)	Scanner (AUC)	Scanner (pAUC)	ToothBrush (AUC)	ToothBrush (pAUC)	ToyCircuit (AUC)	ToyCircuit (pAUC)	ToyCar (AUC)	ToyCar (pAUC)	ToyTrain (AUC)	ToyTrain (pAUC)	Bearing (AUC)	Bearing (pAUC)	Fan (AUC)	Fan (pAUC)	Gearbox (AUC)	Gearbox (pAUC)	Slider (AUC)	Slider (pAUC)	Valve (AUC)	Valve (pAUC)
	DCASE2024_baseline_task2_MSE	DCASE2024baseline2024	39	56.50830191796572 ± 0.001050516938082758	59.72	49.42	55.38	55.47	66.92	55.58	52.89	51.63	58.11	50.21	50.96	51.16	60.48	50.11	72.15	52.74	62.41	50.00	50.37	48.77	61.77	47.95	61.70	57.58	61.47	57.53	69.87	55.65	61.26	51.77	48.66	52.42
	Wilkinghoff_FKIE_task2_1	WilkinghoffFKIE2024	62	52.46265926819839 ± 0.0010707667655151189	67.11	58.74	53.56	48.63	47.76	50.68	57.02	56.32	43.14	53.26	67.81	58.11	58.67	49.16	41.68	52.00	48.59	48.00	50.45	49.40	60.65	53.50	68.60	58.80	75.10	59.50	72.55	54.80	86.25	65.60	77.30	68.30
	Fujimura_NU_task2_1	FujimuraNU2024	26	58.89866752033418 ± 0.001041273079131894	73.17	59.79	60.48	49.05	56.59	50.53	65.50	52.68	66.39	53.16	70.58	57.68	60.86	50.37	71.18	56.11	47.10	48.68	54.41	49.84	75.94	59.74	75.75	61.16	63.17	56.68	72.28	54.53	93.24	80.37	78.61	69.16
	Zhao_CUMT_task2_1	ZhaoCUMT2024	9	61.96515488201288 ± 0.0011914958305639057	54.91	51.37	63.75	60.16	78.63	61.05	64.81	54.53	74.05	53.21	71.06	53.26	78.57	63.24	55.76	50.16	63.81	50.79	54.88	48.32	56.47	48.58	53.41	58.79	65.01	56.79	77.39	58.21	77.03	54.42	52.78	50.05
	Wang_USTC_task2_1	WangUSTC2024	19	59.683659221054455 ± 0.0011407300964302418	54.82	51.37	63.64	60.16	61.90	50.24	64.81	54.53	74.06	53.21	71.07	53.26	71.53	51.21	55.88	50.16	63.80	50.79	60.14	49.91	70.44	55.14	70.78	59.75	59.67	57.30	67.09	52.13	72.33	54.91	80.04	67.37
	Lee_KNU_task2_3	LeeKNU2024	73	50.31619240957195 ± 0.0009955894014041765	54.53	49.68	54.53	48.11	63.57	50.95	40.12	49.74	43.15	49.00	43.09	51.05	64.12	49.32	53.45	55.16	47.89	50.63	45.40	49.26	69.42	53.75	66.54	56.16	59.48	52.16	63.74	51.63	63.06	51.21	45.16	48.26
	Qian_NIVIC_task2_1	QianNIVIC2024	14	60.49793602398476 ± 0.0011415310840067227	54.91	51.37	63.64	60.16	73.40	53.00	64.80	54.53	74.16	53.21	71.08	53.26	71.53	51.21	55.78	50.16	63.70	50.79	59.21	49.42	68.92	54.84	69.78	58.47	58.80	56.37	68.15	52.84	71.22	54.89	78.94	66.74
	Jiang_CUP_task2_2	JiangCUP2024	80	49.54166350875297 ± 0.0009707659088424591	55.62	53.32	49.99	48.68	51.88	49.53	54.41	49.32	42.47	48.21	50.59	51.00	51.49	48.32	46.74	49.37	45.96	47.89	47.96	48.63	46.36	51.26	62.98	53.73	62.88	55.73	70.42	54.15	90.34	69.52	82.16	67.10
	Jiang_THUEE_task2_1	JiangTHUEE2024	4	65.36869889765188 ± 0.001185491785330456	64.77	53.53	68.31	53.37	67.82	53.79	69.81	54.63	73.21	57.74	72.70	57.16	93.07	76.89	67.46	54.89	67.72	52.95	62.68	51.05	71.04	58.53	71.66	57.21	63.46	58.26	76.87	61.79	89.51	65.68	81.54	68.00
	Lv_AITHU_task2_4	LvAITHU2024	1	66.24102452310362 ± 0.0011854991900309608	68.07	56.11	64.88	50.84	69.26	54.05	73.14	54.79	74.73	56.84	72.89	54.58	94.89	79.26	72.35	59.74	67.69	52.11	63.45	48.63	70.37	57.68	73.39	62.11	65.41	60.05	76.93	62.63	88.50	61.53	81.93	64.68
	Yin_Midea_task2_2	YinMidea2024	63	52.23563242136593 ± 0.0010200734060762124	62.26	50.68	53.81	48.32	51.02	51.21	61.81	51.95	46.85	54.42	67.66	60.21	51.86	48.79	45.61	48.79	42.77	51.47	50.78	50.53	66.56	58.16	70.36	54.42	63.22	56.16	75.26	53.63	91.94	73.68	76.76	63.84
	Perez_UPV_task2_1	PerezUPV2024	85	48.98393318219933 ± 0.0009685037559411823	53.60	51.13	47.61	49.16	50.71	49.74	40.66	48.23	41.77	49.83	49.17	49.66	49.54	50.01	44.25	51.63	65.35	54.98	56.24		56.06		69.02		66.19		60.71		59.84		59.00
	Wu_IACAS_task2_3	WuIACAS2024	47	54.16899844515759 ± 0.001057108336360687	61.69	53.47	55.90	48.47	52.08	51.21	53.71	48.63	52.53	51.95	66.58	55.00	66.90	49.68	43.65	50.89	60.10	47.89	49.46	48.63	64.08	53.84	69.84	59.47	62.61	55.42	77.56	60.00	93.80	73.21	75.92	64.36
	Li_SMALLRICE_task2_3	LiSMALLRICE2024	53	53.81856456562342 ± 0.0011243926792786517	51.46	50.00	49.04	48.37	54.65	53.74	55.68	49.11	45.79	53.63	65.34	50.16	93.89	74.63	40.74	51.16	56.53	54.68	55.64	49.21	66.26	53.53	66.68	55.42	65.74	57.84	77.52	59.16	69.82	51.47	77.02	65.21
	Huang_Xju_task2_1	HuangXju2024	46	54.38612240957236 ± 0.001005392973156531	54.05	49.84	58.72	58.63	67.92	52.21	53.68	48.53	57.09	55.00	45.11	50.16	54.60	49.79	62.31	52.89	55.06	49.32	49.89	50.57	49.98	47.68	54.82	59.31	61.59	52.26	73.29	51.94	70.10	48.21	52.55	51.68
	Guo_BIT_task2_3	GuoBIT2024	48	54.084345600639494 ± 0.0010037944335590143	65.66	51.26	59.86	53.84	57.17	50.47	59.52	55.32	52.75	56.79	52.10	52.32	54.94	48.42	53.79	50.16	46.16	49.32	48.16	51.58	59.22	49.16	55.92	59.11	55.92	50.84	81.31	57.79	78.01	58.32	49.28	51.11
	Wan_HFUU_task2_1	WanHFUU2024	37	56.95741891894512 ± 0.0010154269725155024	62.30	51.63	63.32	51.53	55.72	53.58	55.58	51.26	50.91	55.16	60.41	55.63	64.45	51.89	61.04	53.84	65.54	47.89	51.10	49.20	70.44	52.57	64.40	57.57	62.80	55.42	75.68	54.31	91.74	70.84	76.82	63.21
	Kong_IMECAS_task2_2	KongIMECAS2024	40	56.50382577391263 ± 0.0009983711243153732	55.91	51.16	64.26	60.68	77.64	60.16	59.36	52.05	53.65	53.95	51.42	51.05	52.16	48.47	65.65	53.84	54.09	49.16	49.86	51.21	50.65	47.89	54.30	58.95	60.62	52.05	77.56	54.79	77.83	48.84	52.46	50.74
	Hai_SCU_task2_1	HaiSCU2024	72	50.34153796416583 ± 0.0010155482504932121	57.31	51.00	41.57	47.84	65.33	57.16	54.01	54.53	41.13	47.84	57.91	55.74	49.20	48.58	46.77	49.53	46.16	50.53	49.64	51.21	54.78	48.78	66.57	60.36	51.51	48.95	62.52	50.47	63.18	48.89	51.29	50.94
	Kim_DAU_task2_1	KimDAU2024	71	50.433236408626016 ± 0.0010116627475769257	69.05	52.05	46.70	48.32	50.78	49.68	49.59	49.63	57.53	54.21	51.58	50.11	45.57	49.05	46.63	49.42	45.13	50.00	66.83	66.65	62.39	53.43	80.15	70.95	75.95	75.95	73.51	72.32	79.72	71.36	78.03	72.48
	Bian_NR_task2_2	BianNR2024	74	50.30296617352682 ± 0.0009936159397799474	59.33	62.84	52.88	49.74	58.74	51.74	43.64	48.89	51.73	51.68	51.21	50.53	52.51	49.00	38.20	49.95	47.03	51.84	42.45	51.05	61.04	52.63	61.91	56.95	54.52	50.00	54.82	51.63	49.69	50.58	48.06	49.26
	Gleichmann_TNT_task2_1	GleichmannTNT2024	93	45.314372753290186 ± 0.0009577819435369494	49.30	49.37	49.06	49.26	48.12	48.63	33.24	48.32	29.82	50.42	48.21	53.37	48.73	50.37	40.46	50.95	55.87	55.37	57.34	50.05	59.56	52.68	62.16	52.32	54.56	51.74	59.38	53.16	73.42	57.74	58.48	49.95
	Kim_CAU_task2_1	KimCAU2024	89	46.45632996840251 ± 0.0010088781424119019	45.63	50.37	42.38	47.89	42.28	49.16	38.85	51.58	41.32	51.95	56.60	50.53	50.01	50.42	44.30	49.79	46.82	49.11	47.70	49.11	56.48	51.58	60.34	53.26	58.14	57.37	62.30	52.42	77.70	49.53	65.50	56.79
	Zhang_HEU_task2_1	ZhangHEU2024	54	53.74623090811457 ± 0.001070067273571812	64.76	53.95	56.05	51.32	49.93	49.79	61.59	56.63	50.50	54.42	47.10	50.42	59.50	48.00	58.03	51.79	50.86	50.79	45.17	49.37	61.37	50.05	56.23	58.00	58.81	51.74	81.43	61.05	85.73	78.84	73.25	56.63
	Liu_CXL_task2_1	LiuCXL2024	13	60.52026046959108 ± 0.0011605372202239733	54.82	51.37	63.76	60.16	61.91	50.24	64.81	54.53	74.04	53.21	71.15	53.26	78.56	63.24	55.87	50.16	63.70	50.79	58.38	49.43	68.99	54.26	69.94	57.44	57.45	55.74	67.03	52.54	70.32	55.24	79.75	65.69
	Guan_HEU_task2_4	GuanHEU2024	43	55.56904711277052 ± 0.0010596716564482372	60.80	51.05	61.57	50.42	62.33	53.42	58.52	49.32	60.31	55.63	58.50	56.11	59.44	51.95	54.08	53.32	46.05	48.53	52.52	49.58	70.52	52.32	63.66	53.84	60.77	55.79	70.29	51.42	89.24	76.00	81.29	65.63
	Wang_iflytek_task2_1	Wangiflytek2024	11	61.08842633062041 ± 0.0011747355735174372	54.87	51.37	63.72	60.16	78.59	61.05	64.76	54.53	74.08	53.21	71.11	53.26	71.52	51.21	55.83	50.16	63.75	50.79	60.37	51.13	69.31	55.32	71.55	58.90	60.14	57.35	68.38	53.86	72.44	55.67	80.31	67.34
	Yang_IND_task2_1	YangIND2024	10	61.35410342629337 ± 0.001148839259798822	54.91	51.37	63.77	60.16	73.30	53.00	64.81	54.53	74.03	53.21	71.16	53.26	78.56	63.24	55.78	50.16	63.70	50.79	55.63	47.28	55.48	47.79	52.98	57.09	64.15	54.98	76.33	60.04	75.31	55.65	53.61	49.98

Complete results and technical reports can be found at results page.

Baseline system

The task organizers provide a baseline system that give a reasonable performance in the dataset of Task 2. It is the same system as DCASE 2023 Challange Task 2 that has two different operating modes of simple autoencoder and selective Mahalanobis. They are good starting points, especially for entry-level researchers who want to get familiar with the ASD task.

Simple Autoencoder mode

The anomaly score is calculated as the reconstruction error of the observed sound. To obtain small anomaly scores for normal sounds, the AE is trained to minimize the reconstruction error of the normal training data. This method is based on the assumption that the AE cannot reconstruct sounds that are not used in training, that is unknown anomalous sounds.

In the baseline system, we first calculate the log-mel-spectrogram of the input $X = \{X_t\}_{t = 1}^T$ where $X_t \in \mathbb{R}^F$, and $F$ and $T$ are the number of mel-filters and time-frames, respectively. Then, the acoustic feature at $t$ is obtained by concatenating consecutive frames of the log-mel-spectrogram as $\psi_t = (X_t, \cdots, X_{t + P - 1}) \in \mathbb{R}^D$, where $D = P \times F$, and $P$ is the number of frames of the context window. The anomaly score is calculated as:

$$ A_{\theta}(X) = \frac{1}{DT} \sum_{t = 1}^T \| \psi_t - r_{\theta}(\psi_t) \|_{2}^{2}, $$

where $r_{\theta}$ is the vector reconstructed by the autoencoder, and $\| \cdot \|_2$ is $\ell_2$ norm.

To determine the anomaly detection threshold, we assume that $A_{\theta}$ follows a gamma distribution. The parameters of the gamma distribution are estimated from the histogram of $A_{\theta}$, and the anomaly detection threshold is determined as the 90th percentile of the gamma distribution. If $A_{\theta}$ for each test clip is greater than this threshold, the clip is judged to be abnormal; if it is smaller, it is judged to be normal.

Selective Mahalanobis mode

The anomaly score is calculated as the reconstruction error of the observed sound in the Mahalanobis metric with the covariance matrixes calculated after the last epoch of the training phase.

Parameters

The basic architecture and parameters are the same for both modes, simple autoencoder mode and selective Mahalanobis mode.

Acoustic features

The frame size for STFT is 64 ms (50 % hop size)
Log-mel energies for 128 ($= F$) bands
5 ($= P$) consecutive frames are concatenated.
640 ($= D = P \times F$) dimensions are input to the autoencoder.

Network Architecture

Input shape: 640
Architecture:
- Dense layer #1
  - Dense layer (units: 128)
  - Batch Normalization
  - Activation (ReLU)
- Dense layer #2
  - Dense layer (units: 128)
  - Batch Normalization
  - Activation (ReLU)
- Dense layer #3
  - Dense layer (units: 128)
  - Batch Normalization
  - Activation (ReLU)
- Dense layer #4
  - Dense layer (units: 128)
  - Batch Normalization
  - Activation (ReLU)
- Bottleneck layer
  - Dense layer (units: 8)
  - Batch Normalization
  - Activation (ReLU)
- Dense layer #5
  - Dense layer (units: 128)
  - Batch Normalization
  - Activation (ReLU)
- Dense layer #6
  - Dense layer (units: 128)
  - Batch Normalization
  - Activation (ReLU)
- Dense layer #7
  - Dense layer (units: 128)
  - Batch Normalization
  - Activation (ReLU)
- Dense layer #8
  - Dense layer (units: 128)
  - Batch Normalization
  - Activation (ReLU)
- Output layer
  - Dense layer (units: 640)
Learning (epochs: 100, batch size: 256, data shuffling between epochs)
- Optimizer: Adam (learning rate: 0.001)

Repository

DCASE2024 Task 2 baseline, repository (same as DCASE2023 Task 2)
version 2.0.1

Results with the development dataset

We evaluated the AUC and pAUC on the development dataset with three random seeds on a V100. The following table shows the results with average and standard deviations.

ToyCar MSE MAHALA	AUC_source (Ave.) 66.98 % 63.01 %	AUC_source (Std.) 0.89 % 2.12 %	AUC_target (Ave.) 33.75 % 37.35 %	AUC_target (Std.) 0.81 % 0.83 %	pAUC (Ave.) 48.77 % 51.04 %	pAUC (Std.) 0.13 % 0.16 %
ToyTrain MSE MAHALA	AUC_source (Ave.) 76.63 % 61.99 %	AUC_source (Std.) 0.22 % 1.79 %	AUC_target (Ave.) 46.92 % 39.99 %	AUC_target (Std.) 0.8 % 1.37 %	pAUC (Ave.) 47.95 % 48.21 %	pAUC (Std.) 0.09 % 0.05 %
bearing MSE MAHALA	AUC_source (Ave.) 62.01 % 54.43 %	AUC_source (Std.) 0.64 % 0.27 %	AUC_target (Ave.) 61.4 % 51.58 %	AUC_target (Std.) 0.26 % 1.73 %	pAUC (Ave.) 57.58 % 58.82 %	pAUC (Std.) 0.32 % 0.13 %
fan MSE MAHALA	AUC_source (Ave.) 67.71 % 79.37 %	AUC_source (Std.) 0.7 % 0.44 %	AUC_target (Ave.) 55.24 % 42.7 %	AUC_target (Std.) 0.91 % 0.26 %	pAUC (Ave.) 57.53 % 53.44 %	pAUC (Std.) 0.19 % 1.03 %
gearbox MSE MAHALA	AUC_source (Ave.) 70.4 % 81.82 %	AUC_source (Std.) 0.58 % 0.33 %	AUC_target (Ave.) 69.34 % 74.35 %	AUC_target (Std.) 0.82 % 1.21 %	pAUC (Ave.) 55.65 % 55.74 %	pAUC (Std.) 0.44 % 0.35 %
slider MSE MAHALA	AUC_source (Ave.) 66.51 % 75.35 %	AUC_source (Std.) 1.66 % 3.02 %	AUC_target (Ave.) 56.01 % 68.11 %	AUC_target (Std.) 0.29 % 0.63 %	pAUC (Ave.) 51.77 % 49.05 %	pAUC (Std.) 0.35 % 1.0 %
valve MSE MAHALA	AUC_source (Ave.) 51.07 % 55.69 %	AUC_source (Std.) 0.88 % 1.44 %	AUC_target (Ave.) 46.25 % 53.61 %	AUC_target (Std.) 1.3 % 0.19 %	pAUC (Ave.) 52.42 % 51.26 %	pAUC (Std.) 0.5 % 0.47 %

Citation

If you are participating in this task, using the challenge dataset, or the baseline code, please cite the following papers.

Task description paper

Publication

Tomoya Nishida, Noboru Harada, Daisuke Niizumi, Davide Albertini, Roberto Sannino, Simone Pradolini, Filippo Augusti, Keisuke Imoto, Kota Dohi, Harsh Purohit, Takashi Endo, and Yohei Kawaguchi. Description and discussion on DCASE 2024 challenge task 2: first-shot unsupervised anomalous sound detection for machine condition monitoring. In arXiv e-prints: 2406.07250, 2024.

PDF

Description and Discussion on DCASE 2024 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

PDF

Dataset papers

Publication

Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Masahiro Yasuda, and Shoichiro Saito. ToyADMOS2: another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions. In Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE), 1–5. Barcelona, Spain, November 2021.

PDF

ToyADMOS2: Another Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection under Domain Shift Conditions

Abstract

This paper proposes a new large-scale dataset called “ToyADMOS” for anomaly detection in machine operating sounds (ADMOS). As with our previous ToyADMOS dataset, we collected a large number of operating sounds of miniature machines (toys) under normal and anomaly conditions by deliberately damaging them, but extended them in this case by providing a controlled depth of damages in the anomaly samples. Since typical application scenarios of ADMOS require robust performance under domain-shift conditions, the ToyADMOS2 dataset is designed for evaluating systems under such conditions. The released dataset consists of two sub-datasets for machine-condition inspection: fault diagnosis of machines with geometrically fixed tasks and fault diagnosis of machines with moving tasks. Domain shifts are represented by introducing several differences in operating conditions, such as the use of the same machine type but with different models and parts configurations, operating speeds, microphone arrangements, etc. Each subdataset contains over 27 k samples of normal machine-operating sounds and over 8 k samples of anomalous sounds recorded with five to eight microphones. The dataset is freely available for download at https://github.com/nttcslab/ToyADMOS2-dataset and https://doi.org/10.5281/zenodo.4580270.

PDF

Publication

Kota Dohi, Tomoya Nishida, Harsh Purohit, Ryo Tanabe, Takashi Endo, Masaaki Yamamoto, Yuki Nikaido, and Yohei Kawaguchi. MIMII DG: sound dataset for malfunctioning industrial machine investigation and inspection for domain generalization task. In Proceedings of the 7th Detection and Classification of Acoustic Scenes and Events 2022 Workshop (DCASE2022). Nancy, France, November 2022.

PDF

MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection for Domain Generalization Task

Abstract

We present a machine sound dataset to benchmark domain generalization techniques for anomalous sound detection (ASD). Domain shifts are differences in data distributions that can degrade the detection performance, and handling them is a major issue for the application of ASD systems. While currently available datasets for ASD tasks assume that occurrences of domain shifts are known, in practice, they can be difficult to detect. To handle such domain shifts, domain generalization techniques that perform well regardless of the domains should be investigated. In this paper, we present the first ASD dataset for the domain generalization techniques, called MIMII DG. The dataset consists of five machine types and three domain shift scenarios for each machine type. The dataset is dedicated to the domain generalization task with features such as multiple different values for parameters that cause domain shifts and introduction of domain shifts that can be difficult to detect, such as shifts in the background noise. Experimental results using two baseline systems indicate that the dataset reproduces domain shift scenarios and is useful for benchmarking domain generalization techniques.

PDF

Baseline system paper

Publication

Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, and Masahiro Yasuda. First-shot anomaly detection for machine condition monitoring: a domain generalization baseline. Proceedings of 31st European Signal Processing Conference (EUSIPCO), pages 191–195, 2023.

PDF

First-Shot Anomaly Detection for Machine Condition Monitoring: A Domain Generalization Baseline

PDF

	Tomoya Nishida Hitachi, Ltd.
	Noboru Harada NTT Corporation
	Daisuke Niizumi NTT Corporation
	Davide Albertini STMicroelectronics
	Roberto Sannino STMicroelectronics
	Simone Pradolini STMicroelectronics
	Filippo Augusti STMicroelectronics
	Keisuke Imoto Doshisha University
	Kota Dohi Hitachi, Ltd.
	Harsh Purohit Hitachi, Ltd.
	Takashi Endo Hitachi, Ltd.
	Yohei Kawaguchi Hitachi, Ltd.

Coordinators

Content

Description

First-shot problem under attribute-available and unavailable conditions: Focus of task

Schedule

Audio datasets

Dataset overview

Definition

Development, additional training, and evaluation datasets

File names and attribute csv files

Recording procedure

Short description of each section in the development dataset

Short description of each section in the additional training dataset

External data resources

List of external data resources allowed:

Download

Task setup and rules

Submission

Evaluation

Metrics

Ranking

Ground truth labels and evaluator for evaluation dataset

Results

Baseline system

Simple Autoencoder mode

Selective Mahalanobis mode

Parameters

Acoustic features

Network Architecture

Repository

Results with the development dataset

Citation

Task description paper

Description and Discussion on DCASE 2024 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

Dataset papers

ToyADMOS2: Another Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection under Domain Shift Conditions

Abstract

MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection for Domain Generalization Task

Abstract

Baseline system paper

First-Shot Anomaly Detection for Machine Condition Monitoring: A Domain Generalization Baseline