# Unsupervised Detection of Anomalous Sounds for Machine Condition Monitoring

### Coordinators

 Yuma Koizumi Yohei Kawaguchi Hitachi, Ltd. Keisuke Imoto Doshisha University Toshiki Nakamura Hitachi, Ltd. Yuki Nikaido Hitachi, Ltd. Ryo Tanabe Hitachi, Ltd. Harsh Purohit Hitachi, Ltd. Kaori Suefusa Hitachi, Ltd. Takashi Endo Hitachi, Ltd. Masahiro Yasuda Noboru Harada

Challenge has ended. Full results for this task can be found in the page.

# Description

Anomalous sound detection (ASD) is the task to identify whether the sound emitted from a target machine is normal or anomalous.. Automatically detecting mechanical failure is an essential technology in the fourth industrial revolution, including artificial intelligence (AI)-based factory automation. Prompt detection of machine anomaly by observing its sounds may be useful for machine condition monitoring.

The main challenge of this task is to detect unknown anomalous sounds under the condition that only normal sound samples have been provided as training data. In real-world factories, actual anomalous sounds rarely occur and are highly diverse. Therefore, exhaustive patterns of anomalous sounds are impossible to deliberately make and/or collect. This means we have to detect unknown anomalous sounds that were not observed in the given training data. This point is one of the major differences in premise between ASD for industrial equipment and the past supervised DCASE tasks for detecting defined anomalous sounds such as gunshots or a baby crying.

This task cannot be solved as a simple classification problem, even though the simplified task description shown in Fig. 1 seems to be a two-class classification problem. Please refer to the “Task setup and rules” section for the details of the task.

## Schedule

Based on the DCASE challenge 2020 schedule, the task important days will be as follows.

• Task open: 2nd of March 2020
• Additional training dataset release: 1st of April 2020
• Evaluation dataset release: 1st of June 2020
• External resource list lock: 1st of June 2020
• Challenge deadline: 15th of June 2020
• Challenge results: 1st of July 2020

External resources on the "List of external datasets and models allowed" can be used (cf. external data resource section). List of external datasets and models allowed will be updated upon request. Any external resource which are freely accessed before 1st of April 2020 can be added. Please send a request email to the task organizers. The list will be locked after the release date of Evaluation dataset (1st of June 2020). To avoid developing new external resources using machine information in the evaluation dataset, we will release the additional training dataset after 1st of April 2020. Note that the additional training dataset contains matching training data of machines used in the evaluation dataset (cf. dataset section).

# Audio dataset

The data used for this task comprises parts of ToyADMOS and the MIMII Dataset consisting of the normal/anomalous operating sounds of six types of toy/real machines. Each recording is a single-channel (approximately) 10-sec length audio that includes both a target machine's operating sound and environmental noise. The following six types of toy/real machines are used in this task:

• Valve (MIMII Dataset)
• Pump (MIMII Dataset)
• Fan (MIMII Dataset)
• Slide rail (MIMII Dataset)
Publication

Yuma Koizumi, Shoichiro Saito, Hisashi Uematsu, Noboru Harada, and Keisuke Imoto. ToyADMOS: a dataset of miniature-machine operating sounds for anomalous sound detection. In Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 308–312. November 2019. URL: https://ieeexplore.ieee.org/document/8937164.

#### ToyADMOS: A Dataset of Miniature-machine Operating Sounds for Anomalous Sound Detection

##### Abstract

This paper introduces a new dataset called {ToyADMOS}'' designed for anomaly detection in machine operating sounds (ADMOS). To the best our knowledge, no large-scale datasets are available for ADMOS, although large-scale datasets have contributed to recent advancements in acoustic signal processing. This is because anomalous sound data are difficult to collect. To build a large-scale dataset for ADMOS, we collected anomalous operating sounds of miniature machines (toys) by deliberately damaging them. The released dataset consists of three sub-datasets for machine-condition inspection, fault diagnosis of machines with geometrically fixed tasks, and fault diagnosis of machines with moving tasks. Each sub-dataset includes over 180 hours of normal machine-operating sounds and over 4,000 samples of anomalous sounds collected with four microphones at a 48-kHz sampling rate. The dataset is freely available for download at https://github.com/YumaKoizumi/ToyADMOS-dataset.

##### Keywords

Anomaly detection in sounds, machine operating sounds, product inspection, dataset

Publication

Harsh Purohit, Ryo Tanabe, Takeshi Ichige, Takashi Endo, Yuki Nikaido, Kaori Suefusa, and Yohei Kawaguchi. MIMII Dataset: sound dataset for malfunctioning industrial machine investigation and inspection. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), 209–213. November 2019. URL: http://dcase.community/documents/workshop2019/proceedings/DCASE2019Workshop_Purohit_21.pdf.

#### MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection

##### Abstract

Factory machinery is prone to failure or breakdown, resulting in significant expenses for companies. Hence, there is a rising interest in machine monitoring using different sensors including microphones. In scientific community, the emergence of public datasets has been promoting the advancement in acoustic detection and classification of scenes and events, but there are no public datasets that focus on the sound of industrial machines under normal and anomalous operating conditions in real factory environments. In this paper, we present a new dataset of industrial machine sounds which we call a sound dataset for malfunctioning industrial machine investigation and inspection (MIMII dataset). Normal and anomalous sounds were recorded for different types of industrial machines, i.e. valves, pumps, fans and slide rails. To resemble the real-life scenario, various anomalous sounds have been recorded, for instance, contamination, leakage, rotating unbalance, rail damage, etc. The purpose of releasing the MIMII dataset is to help the machine-learning and signal-processing community to advance the development of automated facility maintenance.

##### Keywords

Machine sound dataset, acoustic scene classification, anomaly detection, unsupervised anomalous sound detection

## Recording procedure

The ToyADMOS consists of normal/anomalous operating sounds of miniature machines (toys) collected with four microphones, and the MIMII dataset consists of those of real-machines collected with eight microphones. Anomalous sounds in these datasets were collected by deliberately damaging target machines. For simplifying the task, we used only the first channel of multi-channel recordings; all recordings are regarded as single-channel recordings of a fixed microphone. The sampling rate of all signals has been downsampled to 16 kHz. From ToyADMOS, we used only IND-type data that contain the operating sounds of the entire operation (i.e., from start to stop) in a recording. We mixed a target machine sound with environmental noise, and only noisy recordings are provided as training/test data. The environmental noise samples were recorded in several real factory environments. For the details of the recording procedure, please refer to the papers of ToyADMOS and MIMII Dataset.

## Development and evaluation datasets

We first define two important terms in this task: Machine Type and Machine ID. Machine Type means the kind of machine, which in this task can be one of six: toy-car, toy-conveyor, valve, pump, fan, and slide rail. Machine ID is the identifier of each individual of the same type of machine, which in the training dataset can be of three or four.

Figure 2 shows an overview of our dataset, which consists of a development dataset, an additional training dataset, and an evaluation dataset.

Development dataset: Each Machine Type has three or four Machine IDs. Each machine ID's dataset consists of (i) around 1,000 samples of normal sounds for training and (ii) 100-200 samples each of normal and anomalous sounds for the test. The normal and anomalous sound samples in (ii) are only for checking performance therefore the sound samples in (ii) shall not be used for training.
Evaluation dataset: This dataset consists of the same Machine Types' test samples as the development dataset. The number of test samples for each Machine ID is around 400, none of which have a condition label (i.e., normal or anomaly). Note that the Machine IDs of the evaluation dataset are different from those of the development dataset.
Additional training dataset: This dataset includes around 1,000 normal samples for each Machine Type and Machine ID used in the evaluation dataset. The participants can also use this dataset for training. The additional training dataset will be open on April 1st.

## Reference labels

The given labels for each training/test sample are Machine Type, Machine ID, and condition (normal/anomaly). Machine Type information is given by directory name, and Machine ID and condition information are given by their respective file names. Note that the condition information of the test samples in the evaluation dataset is not given. Their condition labels will be released after the challenge results open. The detailed information is in the Repository section.

## External data resources

Based on the past DCASE's external data resource policy, we allow the use of external datasets and trained models under the following conditions:

1. Test data in both development and evaluation datasets shall not be used for training.
2. ToyADMOS and MIMII Dataset except for provided development/additional training/evaluation dataset for this challenge shall not be used. Because the datasets of this task are part of these datasets, thus, information on anomalous sounds in the test dataset might leak to the model.
3. Datasets/trained models on the "List of external data resources allowed" can be used. The list will be updated upon request. Datasets/trained models, which are freely accessed by any other research group before 1st of April 2020, can be added to the list.
4. To add sources of external datasets/trained models to the list, send a request to the organizers by the evaluation set publishing date. To give an equal opportunity to use them for all competitors, we will update the "list of external data resources allowed" on the web page accordingly.
5. Once the evaluation set is published, no further external sources will be added. The list will be locked after 1st of June 2020 which is the same date as Task 1.

### List of external data resources allowed:

IDMT-ISA-ELECTRIC-ENGINE audio 01.03.2020 https://www.idmt.fraunhofer.de/en/publications/isa-electric-engine.html
VGGish model 01.03.2020 https://github.com/tensorflow/models/tree/master/research/audioset/vggish
PANNs model 26.05.2020 https://zenodo.org/record/3576403/

Development dataset (8.1 GB)
version 1.0

Evaluation dataset (1.9 GB)
version 1.0

In order to calculate several metrics used in the anomaly detection research area, participants will calculate and submit anomaly scores for each test sample instead of a decision result. Here, the anomaly score takes a large value when the input signal seems to be anomalous, and vice versa. To calculate the anomaly score, participants need to train an anomaly score calculator $$\mathcal{A}$$ with parameter $$\theta$$. The input of $$\mathcal{A}$$ is a target machine's operating sound $$x \in \mathbb{R}^{L}$$ and its machine information including Machine Type and Machine ID, and $$\mathcal{A}$$ outputs one anomaly score for the whole audio-clip $$x$$ as $$\mathcal{A}_{\theta} (x) \in \mathbb{R}$$. Then, $$x$$ is determined to be anomalous when the anomaly score exceeds the pre-defined threshold value. Thus, $$\mathcal{A}$$ needs to be trained so that $$\mathcal{A}_{\theta} (x)$$ takes a large value not only when the whole audio-clip $$x$$ is anomalous but also when a part of $$x$$ is anomalous, such as with collision anomalous sounds. The decision procedure of the threshold is described in the Evaluation section.

Figure 3 shows the overview of this task, where the example is a procedure for calculating anomaly scores of the test samples of fan-id01. First, the participants train an anomaly score calculator $$\mathcal{A}$$ using training data and optional external data resources. Then, by using $$\mathcal{A}$$, participants calculate anomaly scores of all test samples of fan-id01. By repeating this procedure, participants calculate the anomaly score of all test samples of all Machine Types and Machine IDs.

Arbitral numbers of anomaly score calculator $$\mathcal{A}$$ can be used for calculating anomaly scores of test samples. The simplest strategy is to use a single $$\mathcal{A}$$ for calculating the anomaly scores of all test samples of a single target Machine ID. In this case, $$\mathcal{A}$$ is specialized to a single target machine. Thus, participants need to train $$\mathcal{A}$$ for each Machine Type and each Machine ID. A more challenging strategy is to use a single $$\mathcal{A}$$ for calculating the anomaly scores of all test samples of all Machine Types and all Machine IDs. The advantage of this strategy is that participants can use all provided training samples for training; however, they need to consider the generalization of the model.

All training data with arbitrary splittings can be used to train an anomaly score calculator. For example, to train $$\mathcal{A}$$ for calculating the anomaly score of "toy-car ID 5", participants can opt to use only toy-car ID 5's training data, the training data of all toy-car's IDs, all provided training data, and/or other strategies. Of course, normal/anomalous samples in test data cannot be used for training; however, simulating anomalous samples using the listed external data resources is allowed.

Changing the model (model/architecture/hyperparameters) between machine types within a single submission is allowed. However, we expect participants to develop a simple ASD system, i.e. keep the model and hyperparameters fixed and just change the training data to adapt to each machine type.

# Submission

The official challenge submission consists of

• System output for the evaluation data
• Meta information files

System output should be presented as a text-file corresponding to each Machine Type and Machine ID, and its file name should be anomaly_score_<type>_id_<ID>.csv. The file (in CSV format, without header row) contains anomaly scores for each audio file in the test data of the evaluation dataset. Result items can be in any order. All rows must be the following format:

[filename (string)],[anomaly score (real value)]


Anomaly scores in the second column can take a negative value. For example, typical auto-encoder-based anomaly score calculators use the squared reconstruction error, which takes a non-negative value, while statistical model-based methods (such as GMM) use the negative log-likelihood as the anomaly score, which can take both positive and negative values.

We allow up to four system output submissions per participant/team. For each system, meta information should be provided in a separate file containing the task-specific information. All files should be packaged into a zip file for submission. Detailed information on the submission process can be found on the Submission page.

# Evaluation

This task is evaluated with the area under the receiver operating characteristic (ROC) curve (AUC) and the partial-AUC (pAUC). The pAUC is an AUC calculated from a portion of the ROC curve over the pre-specified range of interest. In our metric, the pAUC is calculated as the AUC over a low false-positive-rate (FPR) range $$[0, p]$$. The AUC and pAUC are defined as

$${\rm AUC} = \frac{1}{N_{-}N_{+}} \sum_{i=1}^{N_{-}} \sum_{j=1}^{N_{+}} \mathcal{H} (\mathcal{A}_{\theta} (x_{j}^{+}) - \mathcal{A}_{\theta} (x_{i}^{-})),$$
$${\rm pAUC} = \frac{1}{\lfloor p N_{-} \rfloor N_{+}} \sum_{i=1}^{\lfloor p N_{-} \rfloor} \sum_{j=1}^{N_{+}} \mathcal{H} (\mathcal{A}_{\theta} (x_{j}^{+}) - \mathcal{A}_{\theta} (x_{i}^{-}))$$

where $$\lfloor \cdot \rfloor$$ is the flooring function and $$\mathcal{H} (x)$$ returns 1 when $$x$$ > 0 and 0 otherwise. Here, $$\{x_{i}^{−}\}_{i=1}^{N_{−}}$$ and $$\{x_{j}^{+}\}_{j=1}^{N_{+}}$$ are normal and anomalous test samples, respectively, and have been sorted so that their anomaly scores are in descending order. Here, $$N_{−}$$ and $$N_{+}$$ are the number of normal and anomalous test samples, respectively. According to the above formulas, the anomaly scores of normal test samples are used as the threshold. This is why participants need to submit anomaly scores of all test samples instead of the decision results.

The reason for the additional use of the pAUC is based on practical requirements. If an ASD system gives false alerts frequently, we cannot trust it, just as "the boy who cried wolf" could not be trusted. Therefore, it is especially important to increase the true-positive-rate under low FPR conditions. In this task, we will use $$p=0.1$$.

## Ranking

In order to compare the submitted systems on all kinds of machines, the final rankings of the systems will be decided by using all Machine Types and Machine IDs. However, the AUC averaged over Machine Types should not be used because the difficulty varies greatly depending on the Machine Type. Therefore, we rank systems by the following procedure:

Step 1: Calculate AUC and pAUC. AUC and pAUC for all Machine Types and Machine IDs are individually calculated using the above formula.
Step 2: Average AUC and pAUC. The average of AUC and pAUC are calculated for each Machine Type.
Step 3: Rank systems for each Machine Type. The AUC and pAUC rankings for each Machine Type are each determined by these averages obtained in the second step. Then, the rank of the system for each Machine Type is determined by the average of the AUC and pAUC ranks. In the case of the averaged rank is the same, the system with the higher pAUC rank wins.
Step 4: Decide final position. The final positions is determined by the rank averaged over Machine Types.

# Results

Complete results and technical reports can be found in the

# Baseline system

The baseline system provides a simple entry-level approach that gives a reasonable performance in the dataset of Task 2. It is a good starting point, especially for entry-level researchers who want to get familiar with the ASD task.

## System description

The baseline system is a simple autoencoder (AE)-based anomaly score calculator. The anomaly score is calculated as the reconstruction error of the observed sound. To obtain small anomaly scores for normal sounds, the AE is trained to minimize the reconstruction error of the normal training data. This method is based on the assumption that the AE cannot reconstruct sounds that are not used in training, that is, unknown anomalous sounds.

In the baseline system, we first calculate a log-mel-spectrogram of the input $$X = \{X_t\}_{t=1}^{T}$$ where $$X_t \in \mathbb{R}^{F}$$, and $$F$$ and $$T$$ are the number of mel-filters and time-frames, respectively. Then, the acoustic feature at $$t$$ is obtained by concatenating before/after several frames of log-mel-filterbank outputs as $$\psi_{t} = ( X_{t-P}, ..., X_{t+P} ) \in \mathbb{R}^{D}$$, where $$D = F \times (2P+1)$$ and $$P$$ is the context window size. Then, anomaly score is calculated as

$$A_{\theta} (x) = \frac{1}{DT} \sum_{t=1}^{T} \| \psi_{t} - {\rm AE}_{\theta} (\psi_{t}) \|_{2}^{2},$$

where $${\rm AE}$$ is an autoencoder and $$\parallel \cdot \parallel_{2}$$ is $$\ell_{2}$$ norm.

### Parameters

#### Acoustic features

• Analysis frame 64 ms (50 % hop size)
• Log mel-band energies (128 bands)
• 5 ($$= 2 P + 1$$) frames are concatenated (i.e., $$P = 2$$).
• 640 ($$= D = F \times (2 P + 1)$$) dimensions are input to the autoencoder.

#### Network Architecture

• Input shape: 640
• Architecture:
• Dense layer #1
• Dense layer (units: 128)
• Batch Normalization
• Activation (ReLU)
• Dense layer #2
• Dense layer (units: 128)
• Batch Normalization
• Activation (ReLU)
• Dense layer #3
• Dense layer (units: 128)
• Batch Normalization
• Activation (ReLU)
• Dense layer #4
• Dense layer (units: 128)
• Batch Normalization
• Activation (ReLU)
• Bottleneck layer
• Dense layer (units: 8)
• Batch Normalization
• Activation (ReLU)
• Dense layer #5
• Dense layer (units: 128)
• Batch Normalization
• Activation (ReLU)
• Dense layer #6
• Dense layer (units: 128)
• Batch Normalization
• Activation (ReLU)
• Dense layer #7
• Dense layer (units: 128)
• Batch Normalization
• Activation (ReLU)
• Dense layer #8
• Dense layer (units: 128)
• Batch Normalization
• Activation (ReLU)
• Output layer
• Dense layer (units: 640)
• Learning (epochs: 100, batch size: 512, data shuffling between epochs)
• Optimizer: Adam (learning rate: 0.001)

## Repository

The detailed information can be found on the Github repository. As a reference for label information, the directory structure is briefly described here. When you unzip the downloaded files from the Github repository and Zenodo, you can see the following directory structure. As described in the Audio dataset section, Machine Type information is given by directory name, and Machine ID and condition information are given by file name, as:

• /00_train.py
• /01_test.py
• /common.py
• /keras_model.py
• /baseline.yaml
• /dev_data
• /ToyCar
• /train (Only normal data for all Machine IDs are included.)
• /normal_id_01_00000000.wav
• ...
• /normal_id_01_00000999.wav
• /normal_id_02_00000000.wav
• ...
• /normal_id_04_00000999.wav
• /test (Normal and anomaly data for all Machine IDs are included.)
• /normal_id_01_00000000.wav
• ...
• /normal_id_01_00000349.wav
• /anomaly_id_01_00000000.wav
• ...
• /anomaly_id_01_00000263.wav
• /normal_id_02_00000000.wav
• ...
• /anomaly_id_04_00000264.wav
• /ToyConveyor (The other Machine Types have the same directory structure as ToyCar.)
• /fan
• /pump
• /slider
• /valve
• /eval_data (after launch of the evaluation dataset)
• /ToyCar
• /train (Only normal data for all Machine IDs are included.)
• /normal_id_05_00000000.wav
• ...
• /normal_id_05_00000999.wav
• /normal_id_06_00000000.wav
• ...
• /normal_id_07_00000999.wav
• /test (Normal and anomaly data for all Machine IDs are included, but there is no information about normal or anomaly.)
• /id_05_00000000.wav
• ...
• /id_05_00000411.wav
• /id_06_00000000.wav
• ...
• /id_07_00000411.wav
• /ToyConveyor (The other machine types have the same directory structure as ToyCar.)
• /fan
• /pump
• /slider
• /valve

After you run the training script 00_train.py and the test script 01_test.py, the csv files for each machine ID including the anomaly scores will be saved in the directory result/. You can check more detailed information on the Github repository.

## Results for the development dataset

The AUC and pAUC on the development dataset was evaluated using several types of GPUs (RTX 2080, etc.). Because the results produced with a GPU are generally non-deterministic, the average and standard deviation from these 10 independent trials (training and testing) are shown in the following table.

 ToyCarMachine ID1234Average AUC (Ave.)81.36 %85.97 %63.30 %84.45 %78.77 % AUC (Std.)1.15 %0.58 %1.03 %1.87 %1.03 % pAUC (Ave.)68.40 %77.72 %55.21 %68.97 %67.58 % pAUC (Std.)0.92 %0.90 %0.37 %2.37 %1.04 % ToyConveyorMachine ID123Average AUC (Ave.)78.07 %64.16 %75.35 %72.53 % AUC (Std.)0.79 %0.53 %1.39 %0.67 % pAUC (Ave.)64.25 %56.01 %61.03 %60.43 % pAUC (Std.)0.99 %0.71 %1.00 %0.74 % fanMachine ID0246Average AUC (Ave.)54.41 %73.40 %61.61 %73.92 %65.83 % AUC (Std.)0.47 %0.58 %1.08 %0.54 %0.53 % pAUC (Ave.)49.37 %54.81 %53.26 %52.35 %52.45 % pAUC (Std.)0.10 %0.34 %0.40 %0.51 %0.21 % pumpMachine ID0246Average AUC (Ave.)67.15 %61.53 %88.33 %74.55 %72.89 % AUC (Std.)0.87 %0.97 %0.66 %1.24 %0.70 % pAUC (Ave.)56.74 %58.10 %67.10 %58.02 %59.99 % pAUC (Std.)0.82 %0.93 %1.09 %1.21 %0.77 % sliderMachine ID0246Average AUC (Ave.)96.19 %78.97 %94.30 %69.59 %84.76 % AUC (Std.)0.43 %0.28 %0.64 %1.45 %0.29 % pAUC (Ave.)81.44 %63.68 %71.98 %49.02 %66.53 % pAUC (Std.)1.89 %0.72 %2.20 %0.41 %0.62 % valveMachine ID0246Average AUC (Ave.)68.76 %68.18 %74.30 %53.90 %66.28 % AUC (Std.)0.65 %0.86 %0.71 %0.38 %0.49 % pAUC (Ave.)51.70 %51.83 %51.97 %48.43 %50.98 % pAUC (Std.)0.19 %0.31 %0.20 %0.20 %0.15 %

# Citation

If you are participating in this task or using the ToyADMOS, MIMII Dataset, and/or baseline code, please cite the following papers:

Publication

Yuma Koizumi, Shoichiro Saito, Hisashi Uematsu, Noboru Harada, and Keisuke Imoto. ToyADMOS: a dataset of miniature-machine operating sounds for anomalous sound detection. In Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 308–312. November 2019. URL: https://ieeexplore.ieee.org/document/8937164.

#### ToyADMOS: A Dataset of Miniature-machine Operating Sounds for Anomalous Sound Detection

##### Abstract

This paper introduces a new dataset called {ToyADMOS}'' designed for anomaly detection in machine operating sounds (ADMOS). To the best our knowledge, no large-scale datasets are available for ADMOS, although large-scale datasets have contributed to recent advancements in acoustic signal processing. This is because anomalous sound data are difficult to collect. To build a large-scale dataset for ADMOS, we collected anomalous operating sounds of miniature machines (toys) by deliberately damaging them. The released dataset consists of three sub-datasets for machine-condition inspection, fault diagnosis of machines with geometrically fixed tasks, and fault diagnosis of machines with moving tasks. Each sub-dataset includes over 180 hours of normal machine-operating sounds and over 4,000 samples of anomalous sounds collected with four microphones at a 48-kHz sampling rate. The dataset is freely available for download at https://github.com/YumaKoizumi/ToyADMOS-dataset.

##### Keywords

Anomaly detection in sounds, machine operating sounds, product inspection, dataset

Publication

Harsh Purohit, Ryo Tanabe, Takeshi Ichige, Takashi Endo, Yuki Nikaido, Kaori Suefusa, and Yohei Kawaguchi. MIMII Dataset: sound dataset for malfunctioning industrial machine investigation and inspection. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), 209–213. November 2019. URL: http://dcase.community/documents/workshop2019/proceedings/DCASE2019Workshop_Purohit_21.pdf.

#### MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection

##### Abstract

Factory machinery is prone to failure or breakdown, resulting in significant expenses for companies. Hence, there is a rising interest in machine monitoring using different sensors including microphones. In scientific community, the emergence of public datasets has been promoting the advancement in acoustic detection and classification of scenes and events, but there are no public datasets that focus on the sound of industrial machines under normal and anomalous operating conditions in real factory environments. In this paper, we present a new dataset of industrial machine sounds which we call a sound dataset for malfunctioning industrial machine investigation and inspection (MIMII dataset). Normal and anomalous sounds were recorded for different types of industrial machines, i.e. valves, pumps, fans and slide rails. To resemble the real-life scenario, various anomalous sounds have been recorded, for instance, contamination, leakage, rotating unbalance, rail damage, etc. The purpose of releasing the MIMII dataset is to help the machine-learning and signal-processing community to advance the development of automated facility maintenance.

##### Keywords

Machine sound dataset, acoustic scene classification, anomaly detection, unsupervised anomalous sound detection

Publication

Yuma Koizumi, Yohei Kawaguchi, Keisuke Imoto, Toshiki Nakamura, Yuki Nikaido, Ryo Tanabe, Harsh Purohit, Kaori Suefusa, Takashi Endo, Masahiro Yasuda, and Noboru Harada. Description and discussion on DCASE2020 challenge task2: unsupervised anomalous sound detection for machine condition monitoring. In arXiv e-prints: 2006.05822, 1–4. June 2020. URL: https://arxiv.org/abs/2006.05822.

#### Description and Discussion on DCASE2020 Challenge Task2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

##### Abstract

This paper presents the details of the DCASE 2020 Challenge Task 2; Unsupervised Detection of Anomalous Sounds for Machine Condition Monitoring. The goal of anomalous sound detection (ASD) is to identify whether the sound emitted from a target machine is normal or anomalous. The main challenge of this task is to detect unknown anomalous sounds under the condition that only normal sound samples have been provided as training data. We have designed a DCASE challenge task which contributes as a starting point and a benchmark of ASD research; the dataset, evaluation metrics, a simple baseline system, and other detailed rules. After the challenge submission deadline, challenge results and analysis of the submissions will be added.

##### Keywords

Anomaly detection, dataset, acoustic condition monitoring, DCASE Challenge