The goal of this task is to classify multi-channel audio segments (i.e. segmented data is given), acquired by a microphone array, into one of the provided predefined classes. These classes are daily activities performed in a home environment (e.g. "Cooking").
The challenge has finished. Results are available at the task results page. Ground truth annotations for the evaluation dataset have also been released.
Introduction
There is a rising interest in smart environments that enhance the quality of live for humans in terms of e.g. safety, security, comfort, and home care. In order to have smart functionality, situational awareness is required, which might be obtained by interpreting a multitude of sensing modalities including acoustics. The latter is already used in vocal assistants such as Google Home, Apple HomePod, and Amazon Echo. While these devices focus on speech, they could be extended to identify domestic activities carried out by humans. In the literature, this recognition of activities based on acoustics is already touched upon. Yet, the acoustic models are typically based on single channel and single location recordings. In this task, it is investigated to which extend multi-channel acoustic recordings are beneficial for the purpose of detecting domestic activities.
Description
The goal of this task is to classify multi-channel audio segments (i.e. segmented data is given), acquired by a microphone array, into one of the provided predefined classes as illustrated by Figure 1. These classes are daily activities performed in a home environment. For example, “Cooking”, “Watching TV” and “Working”. As they can be composed out of different sound events such activities are considered as acoustic scenes. The difference with Task 1: Acoustic scene classification is the type of scenes and the possibility to use multi-channel audio.
In this challenge a person living alone at home is considered. This reduces the complexity of the problem since the number of overlapping activities is expected to be small. In fact, in the considered data set no overlapping activities are present. These conditions were chosen to focus on the main goal of this task which is to investigate to which extend multi-channel acoustic recordings are beneficial for the purpose of detecting domestic activities. This means that spatial properties can be exploited to serve as input features to the classification problem. However, using absolute localization of sound sources as input for the detection model is doomed to not generalize well to cases where the position of the microphone array is altered. Therefore, in this task the focus is on systems which can exploit spatial cues independent of sensor location using multi-channel audio.
Dataset
Content
The dataset used in this task is a derivative of the SINS dataset. It contains a continuous recording of one person living in a vacation home over a period of one week. It was collected using a network of 13 microphone arrays distributed over the entire home. The microphone array consists of 4 linearly arranged microphones. For this task 7 microphone arrays in the combined living room and kitchen area are used. Figure 2 shows the floorplan of the recorded environment along with the position of the used sensor nodes.
The continuous recordings were split into audio segments of 10s. Segments containing more then one active class (e.g. a transition of two actitivies) were left out. This means that each segments represents one activity. Subsampling was then performed starting from the largest classes to make the dataset easier to use for a challenge. These audio segments are provided as individual files along with the ground truth. Each audio segment contains 4 channels (e.g. the 4 microphone channels from a particular node). The daily activities for this task (9) are shown in Table 1 along with the available 10s multi-channel segments in the development set and the amount of full sessions of a certain activity (e.g. a cooking session).
Activity | # 10s segments | # sessions |
---|---|---|
Absence (nobody present in the room) | 18860 | 42 |
Cooking | 5124 | 13 |
Dishwashing | 1424 | 10 |
Eating | 2308 | 13 |
Other (present but not doing any relevant activity) | 2060 | 118 |
Social activity (visit, phone call) | 4944 | 21 |
Vacuum cleaning | 972 | 9 |
Watching TV | 18648 | 9 |
Working (typing, mouse click, ...) | 18644 | 33 |
Total | 72984 | 268 |
As development set, approximately 200 hours of data from 4 sensor nodes along with the ground truth is given. As evaluation set, data is provided from all the sensor nodes (i.e. also sensor nodes not present in the development set). The evaluation will be based on the sensor nodes not present in the development set. The data from the same nodes as in training are provided to give insights about the overfitting on those positions. The partitioning of the data was done randomly. The segments belonging to one particular consecutive activity (e.g. a full session of cooking) were kept together. The data provided for each sensor node contain recordings of the same time period. This means that the performed activities are observed from multiple microphone arrays at the same time instant. Due to the subsampling on the segments of the largest classes, there is not a full time-wise overlap by all sensor nodes for a particular consecutive activity of those classes.
Recording and annotation procedure
The sensor node configuration used in this setup is a control board together with a linear microphone array. The control board contains an EFM32 ARM cortex M4 microcontroller from Silicon Labs (EFM32WG980) used for sampling the analog audio. The microphone array contains four Sonion N8AC03 MEMS low-power (±17µW) microphones with an inter-microphone distance of 5 cm. The sampling for each audio channel is done sequentially at a rate of 16 kHz with a bit depth of 12. The annotation was performed in two phases. First, during the data collection a smartphone application was used to let the monitored person(s) annotate the activities while being recorded. The person could only select a fixed set of activities. The application was easy to use and did not significantly influence the transition between activities. Secondly, the start and stop timestamps of each activity were refined by using our own annotation software. Postprocessing and sharing the database involves privacy-related aspects. Besides the person(s) living there, multiple people visited the home. Moreover, during a phone call, one can partially hear the person on the other end. A written informed consent was obtained from all participants.
More information about the full dataset can be found in:
Gert Dekkers, Steven Lauwereins, Bart Thoen, Mulu Weldegebreal Adhana, Henk Brouckxon, Toon van Waterschoot, Bart Vanrumste, Marian Verhelst, and Peter Karsmakers. The SINS database for detection of daily activities in a home environment using an acoustic sensor network. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), 32–36. November 2017.
The SINS Database for Detection of Daily Activities in a Home Environment Using an Acoustic Sensor Network
Abstract
There is a rising interest in monitoring and improving human wellbeing at home using different types of sensors including microphones. In the context of Ambient Assisted Living (AAL) persons are monitored, e.g. to support patients with a chronic illness and older persons, by tracking their activities being performed at home. When considering an acoustic sensing modality, a performed activity can be seen as an acoustic scene. Recently, acoustic detection and classification of scenes and events has gained interest in the scientific community and led to numerous public databases for a wide range of applications. However, no public databases exist which a) focus on daily activities in a home environment, b) contain activities being performed in a spontaneous manner, c) make use of an acoustic sensor network, and d) are recorded as a continuous stream. In this paper we introduce a database recorded in one living home, over a period of one week. The recording setup is an acoustic sensor network containing thirteen sensor nodes, with four low-cost microphones each, distributed over five rooms. Annotation is available on an activity level. In this paper we present the recording and annotation procedure, the database content and a discussion on a baseline detection benchmark. The baseline consists of Mel-Frequency Cepstral Coefficients, Support Vector Machine and a majority vote late-fusion scheme. The database is publicly released to provide a common ground for future research.
Keywords
Database, Acoustic Scene Classification, Acoustic Event Detection, Acoustic Sensor Networks
Download: Development dataset
In case you are using the provided baseline system, there is no need to download the dataset as the system will automatically download needed dataset for you.
An inconsistency in the dataset was reported here. The issue is fixed in the current release of the dataset (v1.0.3, 15/05/2018). If you have an earlier version you can either download the entire dataset again or overwrite a subset of the files using this archive
The content of the development set is structured in the following manner:
dataset root
│ EULA.pdf End user license agreement
│ meta.txt meta data, tsv-format, [audio file (str)][tab][label (str)][tab][session (str)]
│ README.md Dataset description (markdown)
│ README.html Dataset description (HTML)
│
└───audio 72984 audio segments, 16-bit 16kHz
│ │ DevNode1_ex1_1.wav name format DevNode{NodeID}_ex{sessionID}_{segmentID}.wav
│ │ DevNode2_ex1_2.wav
│ │ ...
│
└───evaluation_setup cross-validation setup, 4 folds
│ fold1_train.txt training file list, tsv-format, [audio file (str)][tab][label (str)][tab][session (str)]
│ fold1_test.txt test file list, tsv-format, [audio file (str)]
│ fold1_evaluate.txt evaluation file list, tsv-format, [audio file (str)][tab][label (str)]
│ ...
The multi-channel audio files can be found under directory audio
and are formatted in the following manner:
DevNode{NodeID}_ex{sessionID}_{segmentID}.wav
- {NodeID} (1-4) is an identifier to indicate which segments belong to a specific node. In total 4 nodes are given (1-4). It is unknown what the location of the node is to the participant.
- {sessionID} indicates a full session of a certain activity.
- {segmentID} indicates a segment belonging to a certain {sessionID}. A session of a certain activity (e.g. cooking) can have multiple 10s segments. Keep in mind that segmentIDs are not shared between nodes (e.g. DevNode1_ex1_1 is not necessarely recorded at the same time range as DevNode2_ex1_1 but it surely belongs to the same session).
The file meta.txt
and the content of the folder evaluation_setup
contain filenames with optionally ground truth labels and an identifier of to which session the segment belongs. These are arranged in the following manner:
[filename (str)][tab][activity label (str)][tab][session (str)]
The directory evaluation_setup
provides cross-validation folds for the development dataset. More information on the usage can be read here
Download: Evaluation dataset
In case you are using the provided baseline system, there is no need to download the dataset as the system will automatically download needed dataset for you.
The content of the dataset is structured in the following manner:
dataset root
│ EULA.pdf End user license agreement
│ meta.txt meta data, tsv-format, [audio file (str)]\n
│ readme.md Dataset description (markdown)
│ readme.html Dataset description (HTML)
│
└───audio 72972 audio segments, 16-bit 16kHz
│ │ 1.wav name format {segmentID}.wav
│ │ 100.wav
│ │ ...
│
└───evaluation_setup evaluation files
│ test.txt test file list, tsv-format, [audio file (str)]\n
The multi-channel audio files can be found under directory audio
and are formatted in the following manner:
{segmentID}.wav
The file meta.txt
and the content of the folder evaluation_setup
contain filenames. Ground truth will be made available after the challenge results have been made public. Additionally, a filename mapping will be made available that will map the filenames to a filename similar as the development dataset.
Task setup
The task is split up in two phases. First the development dataset is provided. A month before the challenge deadline the evaluation sets are provided. At the challenge deadline submissions include the system output on the evaluation set, system meta information and one technical report. The goal of the task to obtain the highest score on the evaluation set.
Development set
The development set includes multi-channel audio segments, recorded by four different sensor nodes, along with the ground truth and cross-validation folds.
Cross-validation folds are provided for the development dataset in order to make results reported with this dataset uniform.
Results on these subsets are used for comparison in the initial internal report and also need to be reported in the outputted meta information.
The setup consists of four folds distributing the available files.
Segments belonging to a particular session of an activity (e.g. a session of cooking collected by multiple sensor nodes) are kept together to minimize leakage between folds.
The folds are provided with the dataset in the directory evaluation setup
. For each fold a training, testing and evaluation subset is provided.
Important: Important: If you are not using the provided cross-validation setup, pay attention to the segments extracted from the same sessions. Make sure that for each given fold, ALL segments from the same session must be either in the training subset OR in the test subset.
External data sources/pre-trained models
List of external datasets/pre-trained models allowed:
Dataset name | Type | Added | Link |
---|---|---|---|
AudioSet | audio | 29.6.2018 | https://research.google.com/audioset/ |
VGGish | model | 5.7.2018 | https://research.google.com/audioset/ |
Xception | model | 5.7.2018 | https://keras.io/applications/ |
VGG16 | model | 5.7.2018 | http://www.robots.ox.ac.uk/~vgg/research/very_deep/ |
VGG19 | model | 5.7.2018 | http://www.robots.ox.ac.uk/~vgg/research/very_deep/ |
ResNet50 | model | 5.7.2018 | https://github.com/KaimingHe/deep-residual-networks |
InceptionV3 | model | 5.7.2018 | https://keras.io/applications/ |
InceptionResNetV2 | model | 5.7.2018 | https://keras.io/applications/ |
MobileNet | model | 5.7.2018 | https://keras.io/applications/ |
DenseNet | model | 5.7.2018 | https://keras.io/applications/ |
MobileNetV2 | model | 5.7.2018 | https://keras.io/applications/ |
Evaluation set
The evaluation set includes multi-channel audio segments, recorded by seven different sensor nodes. Three sensor nodes are not in the development set and will be used for the final evaluation score. The other segments obtained by the same nodes as in the development set are used to check overfitting. Participants should run their system on this dataset, and submit the classification results (system output) to DCASE2018 Challenge. The evaluation dataset is provided without ground truth.
Submission
Challenge submissions consists of one zip-package containing the system outputs and system meta information and one technical report (pdf file). Detailed information for the challenge submission can found on the submission page. System output should be presented as a single text-file (in tab-seperated format) containing classification result for each audio file in the evaluation set. Result items can be in any order. The format is as follows:
[filename (string)][tab][activity label (string)]
The filename should be formatted such that it includes the audio
folder. An example of the output file:
audio/1.wav other
audio/2.wav social_activity
audio/3.wav eating
audio/4.wav working
audio/5.wav absence
audio/6.wav vacuum_cleaner
audio/7.wav dishwashing
audio/8.wav watching_tv
audio/9.wav cooking
A template for the system meta information (.yaml file) is available on the submissions page.
Multiple system outputs can be submitted (maximum 4 per participant). If submitting multiple systems, the individual text-files should be packaged into a zip file for submission. Please carefully mark the connection between the submitted files and the corresponding system or system parameters (for example by naming the text file appropriately).
Task rules
These are the general rules valid for all tasks. The same rules and additional information on technical report and submission requirements can be found here. Task specific rules are highlighted in bold.
- Participants are allowed to use external data for system development taking into account the following principles:
- The used data must be publicly available without cost before 29th of March 2018
- External data includes pre-trained models
- Participants should inform/suggest such data to be listed on the task webpage, so all competitors know about them and have equal opportunity to use them
- Once the evaluation set is published, the list of external datasets allowed is locked (no further external sources allowed)
- Manipulation of provided training data is allowed (e.g. data augmentation).
- Participants are not allowed to make subjective judgements on the evaluation data, nor to annotate it.
- The evaluation dataset cannot be used to train the submitted system; the use of statistics about the evaluation data in the decision making is also forbidden.
- The system outputs that do not respect the challenge rules will be evaluated on request, but they will not be officially included in the challenge rankings.
Evaluation
The scoring of this task will be based on macro-averaged F1-score. The F1-score is calculated for each class seperately and averaged over all classes. A full 10s multi-channel audio segment is considered to be one sample. The winner is of the task is the submission with the highest macro-averaged F1-score on the evaluation set. The output is send through and the scores are calculated by the task coordinators.
Results
Rank | Submission Information | ||||
---|---|---|---|---|---|
Code | Author | Affiliation |
Technical Report |
F1-score on Eval. set (Unknown mic.) |
|
DCASE2018 baseline | Gert Dekkers | Computer Science, KU Leuven - ADVISE, Geel, Belgium | task-monitoring-domestic-activities-results#Dekkers2018 | 83.1 | |
Delphin_OL_task5_1 | Lionel Delphin-Poulat | HOME/CONTENT, Orange Labs, Lannion, France | task-monitoring-domestic-activities-results#Delphin-Poulat2018 | 80.7 | |
Delphin_OL_task5_2 | Lionel Delphin-Poulat | HOME/CONTENT, Orange Labs, Lannion, France | task-monitoring-domestic-activities-results#Delphin-Poulat2018 | 80.8 | |
Delphin_OL_task5_3 | Lionel Delphin-Poulat | HOME/CONTENT, Orange Labs, Lannion, France | task-monitoring-domestic-activities-results#Delphin-Poulat2018 | 81.6 | |
Delphin_OL_task5_4 | Lionel Delphin-Poulat | HOME/CONTENT, Orange Labs, Lannion, France | task-monitoring-domestic-activities-results#Delphin-Poulat2018 | 82.5 | |
Inoue_IBM_task5_1 | Tadanobu Inoue | AI, IBM Research, Tokyo, Japan | task-monitoring-domestic-activities-results#Inoue2018 | 88.4 | |
Inoue_IBM_task5_2 | Tadanobu Inoue | AI, Research, Tokyo, Japan | task-monitoring-domestic-activities-results#Inoue2018 | 88.3 | |
Kong_Surrey_task5_1 | Qiuqiang Kong | Centre for Vission, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, UK | task-monitoring-domestic-activities-results#Kong2018 | 83.2 | |
Kong_Surrey_task5_2 | Qiuqiang Kong | Centre for Vission, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, UK | task-monitoring-domestic-activities-results#Kong2018 | 82.4 | |
Li_NPU_task5_1 | Dexin Li | Speech Signal Processing, CIAIC, Xi'an, China | task-monitoring-domestic-activities-results#Li2018 | 79.0 | |
Li_NPU_task5_2 | Dexin Li | Speech Signal Processing, CIAIC, Xi'an, China | task-monitoring-domestic-activities-results#Li2018 | 78.6 | |
Li_NPU_task5_3 | Dexin Li | Speech Signal Processing, CIAIC, Xi'an, China | task-monitoring-domestic-activities-results#Li2018 | 84.8 | |
Li_NPU_task5_4 | Dexin Li | Speech Signal Processing, CIAIC, Xi'an, China | task-monitoring-domestic-activities-results#Li2018 | 85.1 | |
Liao_NTHU_task5_1 | Hsueh-Wei Liao | Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan | task-monitoring-domestic-activities-results#Liao2018 | 86.7 | |
Liao_NTHU_task5_2 | Hsueh-Wei Liao | Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan | task-monitoring-domestic-activities-results#Liao2018 | 72.1 | |
Liao_NTHU_task5_3 | Hsueh-Wei Liao | Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan | task-monitoring-domestic-activities-results#Liao2018 | 76.7 | |
Liu_THU_task5_1 | Huaping Liu | Computer Science and Technology, Computer Science and Technology, Beijing, China | task-monitoring-domestic-activities-results#Liu2018 | 87.5 | |
Liu_THU_task5_2 | Huaping Liu | Computer Science and Technology, Computer Science and Technology, Beijing, China | task-monitoring-domestic-activities-results#Liu2018 | 87.4 | |
Liu_THU_task5_3 | Huaping Liu | Computer Science and Technology, Computer Science and Technology, Beijing, China | task-monitoring-domestic-activities-results#Liu2018 | 86.8 | |
Nakadai_HRI-JP_task5_1 | Kazuhiro Nakadai | Research Div., Honda Research Institute Japan, Wako, Japan | task-monitoring-domestic-activities-results#Nakadai2018 | 85.4 | |
Raveh_INRC_task5_1 | Alon Raveh | Signal Processing Department, National Research Center, Haifa, Israel | task-monitoring-domestic-activities-results#Raveh2018 | 80.4 | |
Raveh_INRC_task5_2 | Alon Raveh | Signal Processing Department, National Research Center, Haifa, Israel | task-monitoring-domestic-activities-results#Raveh2018 | 80.2 | |
Raveh_INRC_task5_3 | Alon Raveh | Signal Processing Department, National Research Center, Haifa, Israel | task-monitoring-domestic-activities-results#Raveh2018 | 81.7 | |
Raveh_INRC_task5_4 | Alon Raveh | Signal Processing Department, National Research Center, Haifa, Israel | task-monitoring-domestic-activities-results#Raveh2018 | 81.2 | |
Sun_SUTD_task5_1 | Yingxiang Sun | Engineering Product Development, Singapore University of Technology and Design, Singapore | task-monitoring-domestic-activities-results#Chew2018 | 76.8 | |
Tanabe_HIT_task5_1 | Ryo Tanabe | R&D Group, Hitachi, Ltd., Tokyo, Japan | task-monitoring-domestic-activities-results#Tanabe2018 | 88.4 | |
Tanabe_HIT_task5_2 | Ryo Tanabe | R&D Group, Hitachi, Ltd., Tokyo, Japan | task-monitoring-domestic-activities-results#Tanabe2018 | 82.2 | |
Tanabe_HIT_task5_3 | Ryo Tanabe | R&D Group, Hitachi, Ltd., Tokyo, Japan | task-monitoring-domestic-activities-results#Tanabe2018 | 86.3 | |
Tanabe_HIT_task5_4 | Ryo Tanabe | R&D Group, Hitachi, Ltd., Tokyo, Japan | task-monitoring-domestic-activities-results#Tanabe2018 | 88.4 | |
Tiraboschi_UNIMI_task5_1 | Marco Tiraboschi | Computer Science, Università degli Studi di Milano, Milan, Italy | task-monitoring-domestic-activities-results#Tiraboschi2018 | 76.9 | |
Zhang_THU_task5_1 | Weiqiang Zhang | Electronic Engineering, Tsinghua University, Beijing, China | task-monitoring-domestic-activities-results#Shen2018 | 85.9 | |
Zhang_THU_task5_2 | Weiqiang Zhang | Electronic Engineering, Tsinghua University, Beijing, China | task-monitoring-domestic-activities-results#Shen2018 | 84.3 | |
Zhang_THU_task5_3 | Weiqiang Zhang | Electronic Engineering, Tsinghua University, Beijing, China | task-monitoring-domestic-activities-results#Shen2018 | 86.0 | |
Zhang_THU_task5_4 | Weiqiang Zhang | Electronic Engineering, Tsinghua University, Beijing, China | task-monitoring-domestic-activities-results#Shen2018 | 85.9 |
Complete results and technical reports can be found at results page
Baseline system
Setup
The baseline system is intended to lower the hurdle to participate the challenge. It provides an entry-level approach which is simple but relatively close to the state of the art systems. High-end performance is left for the challenge participants to find. Participants are allowed to build their system on top of the given baseline system. The system has all needed functionality for dataset handling, storing / accessing features and models, and evaluating the results, making the adaptation for one's needs rather easy. The baseline system is also a good starting point for entry level researchers.
If participants plan to publish their code to the DCASE community after the challenge, by building their approach on the baseline system will make their code more accessible to the community. DCASE organizers encourage participants strongly to share their code in any form after the challenge to push the research further.
During the recording campaign, data was measured simultaneously using multiple microphone arrays (nodes) each containing 4 microphones. Hence, each domestic activity is recorded as many times as there were microphones. The baseline system trains a single classifier model that takes a single channel as input. Each parallel recording of a single activity is considered as a different example during training. The learner in the baseline system is based on a Neural Network architecture using convolutional and dense layers. As input, log mel-band energies are provided to the network for each microphone channel separately. In the prediction stage a single outcome is computed for each node by averaging the 4 model outcomes (posteriors) that were computed by evaluating the trained classifier model on all 4 microphones.
The baseline system parameters are as follows:
- Frame size: 40 ms (50% hop size)
- Feature matrix:
- 40 log mel-band energies in 501 successive frames (10 s)
- Neural Network:
- Input data: 40x501 (each microphone channel is considered to be a separate example for the learner)
- Architecture:
- 1D Convolutional layer (filters: 32, kernel size: 5, stride: 1, axis: time) + Batch Normalization + ReLU activation
- 1D Max Pooling (pool size: 5, stride: 5) + Dropout (rate: 20%)
- 1D Convolutional layer (filters, 64, kernel size: 3, stride: 1, axis: time) + Batch Normalization + ReLU activation
- 1D Global Max Pooling + Dropout (rate: 20%)
- Dense layer (neurons: 64) + ReLU activation + Dropout (rate: 20%)
- Softmax output layer (classes: 9)
- Learning:
- Optimizer: Adam (learning rate: 0.0001)
- Epochs: 500
- On each epoch, the training dataset is randomly subsampled so that the number of examples for each class match the size of the smallest class
- Batch size: 256 * 4 channels (each channel is considered as a different example for the learner)
- Fusion: Output probabilities from the four microphones in a particular node under test are averaged to obtain the final posterior probability.
- Model selection: The performance of the model is evaluated every 10 epochs on a validation subset (30% subsampled from the training set). The model with the highest Macro-averaged F1-score is picked.
The baseline system is build on dcase_util toolbox. The machine learning part of the code in build on Keras (v2.1.5) while using TensorFlow (v1.4.0) as backend.
Repository
An inconsistency in the dataset was reported here. The issue is fixed in the current release of the dataset (v1.0.3, 15/05/2018). The new repository is updated on the latest release of the [dcase_util library (v0.2.3)](https://github.com/DCASE-REPO/dcase_util). Using an older version will download an older version of the dataset. If you prefer to not download all files again, you can overwrite a subset of the files using this archive.
Results for the development dataset
When running the code in development mode the baseline system provides results for the 4-fold cross-validation setup. The table below shows the averaged Macro-averaged F1-score
over these 4 folds.
Activity | F1-score |
---|---|
Absence | 85.41 % |
Cooking | 95.14 % |
Dishwashing | 76.73 % |
Eating | 83.64 % |
Other | 44.76 % |
Social activity | 93.92 % |
Vacuum cleaning | 99.31 % |
Watching TV | 99.59 % |
Working | 82.03 % |
Macro-averaged F1-score | 84.50 % |
Note: The performance might not be exactly reproducible but similar results should be obtainable.
Citation
If you are using the dataset please cite the following paper:
Gert Dekkers, Steven Lauwereins, Bart Thoen, Mulu Weldegebreal Adhana, Henk Brouckxon, Toon van Waterschoot, Bart Vanrumste, Marian Verhelst, and Peter Karsmakers. The SINS database for detection of daily activities in a home environment using an acoustic sensor network. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), 32–36. November 2017.
The SINS Database for Detection of Daily Activities in a Home Environment Using an Acoustic Sensor Network
Abstract
There is a rising interest in monitoring and improving human wellbeing at home using different types of sensors including microphones. In the context of Ambient Assisted Living (AAL) persons are monitored, e.g. to support patients with a chronic illness and older persons, by tracking their activities being performed at home. When considering an acoustic sensing modality, a performed activity can be seen as an acoustic scene. Recently, acoustic detection and classification of scenes and events has gained interest in the scientific community and led to numerous public databases for a wide range of applications. However, no public databases exist which a) focus on daily activities in a home environment, b) contain activities being performed in a spontaneous manner, c) make use of an acoustic sensor network, and d) are recorded as a continuous stream. In this paper we introduce a database recorded in one living home, over a period of one week. The recording setup is an acoustic sensor network containing thirteen sensor nodes, with four low-cost microphones each, distributed over five rooms. Annotation is available on an activity level. In this paper we present the recording and annotation procedure, the database content and a discussion on a baseline detection benchmark. The baseline consists of Mel-Frequency Cepstral Coefficients, Support Vector Machine and a majority vote late-fusion scheme. The database is publicly released to provide a common ground for future research.
Keywords
Database, Acoustic Scene Classification, Acoustic Event Detection, Acoustic Sensor Networks
If you are using the baseline code, or want to refer challenge task please cite the following papers:
Gert Dekkers, Lode Vuegen, Toon van Waterschoot, Bart Vanrumste, and Peter Karsmakers. DCASE 2018 Challenge - Task 5: Monitoring of domestic activities based on multi-channel acoustics. Technical Report, KU Leuven, 2018. URL: https://arxiv.org/abs/1807.11246, arXiv:1807.11246.
DCASE 2018 Challenge - Task 5: Monitoring of domestic activities based on multi-channel acoustics
Abstract
The DCASE 2018 Challenge consists of five tasks related to automatic classification and detection of sound events and scenes. This paper presents the setup of Task 5 which includes the description of the task, dataset and the baseline system. In this task, it is investigated to which extend multi-channel acoustic recordings are beneficial for the purpose of classifying domestic activities. The goal is to exploit spectral and spatial cues independent of sensor location using multi-channel audio. For this purpose we provided a development and evaluation dataset which are derivatives of the SINS database and contain domestic activities recorded by multiple microphone arrays. The baseline system, based on a Neural Network architecture using convolutional and dense layer(s), is intended to lower the hurdle to participate the challenge and to provide a reference performance.
Keywords
Acoustic scene classification, Multi-channel, Activities of the Daily Living