The goal of this task is to classify multi-channel audio segments (i.e. segmented data is given), acquired by a microphone array, into one of the provided predefined classes. These classes are daily activities performed in a home environment (e.g. "Cooking").

updated 14/09/2018

The challenge has finished. Results are available at the task results page. Ground truth annotations for the evaluation dataset have also been released.

Introduction

There is a rising interest in smart environments that enhance the quality of live for humans in terms of e.g. safety, security, comfort, and home care. In order to have smart functionality, situational awareness is required, which might be obtained by interpreting a multitude of sensing modalities including acoustics. The latter is already used in vocal assistants such as Google Home, Apple HomePod, and Amazon Echo. While these devices focus on speech, they could be extended to identify domestic activities carried out by humans. In the literature, this recognition of activities based on acoustics is already touched upon. Yet, the acoustic models are typically based on single channel and single location recordings. In this task, it is investigated to which extend multi-channel acoustic recordings are beneficial for the purpose of detecting domestic activities.

Description

The goal of this task is to classify multi-channel audio segments (i.e. segmented data is given), acquired by a microphone array, into one of the provided predefined classes as illustrated by Figure 1. These classes are daily activities performed in a home environment. For example, “Cooking”, “Watching TV” and “Working”. As they can be composed out of different sound events such activities are considered as acoustic scenes. The difference with Task 1: Acoustic scene classification is the type of scenes and the possibility to use multi-channel audio.

Figure 1: Conceptual overview of the task.

In this challenge a person living alone at home is considered. This reduces the complexity of the problem since the number of overlapping activities is expected to be small. In fact, in the considered data set no overlapping activities are present. These conditions were chosen to focus on the main goal of this task which is to investigate to which extend multi-channel acoustic recordings are beneficial for the purpose of detecting domestic activities. This means that spatial properties can be exploited to serve as input features to the classification problem. However, using absolute localization of sound sources as input for the detection model is doomed to not generalize well to cases where the position of the microphone array is altered. Therefore, in this task the focus is on systems which can exploit spatial cues independent of sensor location using multi-channel audio.

Dataset

Content

The dataset used in this task is a derivative of the SINS dataset. It contains a continuous recording of one person living in a vacation home over a period of one week. It was collected using a network of 13 microphone arrays distributed over the entire home. The microphone array consists of 4 linearly arranged microphones. For this task 7 microphone arrays in the combined living room and kitchen area are used. Figure 2 shows the floorplan of the recorded environment along with the position of the used sensor nodes.

Figure 2: 2D floorplan of the combined kitchen and living room with the used sensor nodes.

The continuous recordings were split into audio segments of 10s. Segments containing more then one active class (e.g. a transition of two actitivies) were left out. This means that each segments represents one activity. Subsampling was then performed starting from the largest classes to make the dataset easier to use for a challenge. These audio segments are provided as individual files along with the ground truth. Each audio segment contains 4 channels (e.g. the 4 microphone channels from a particular node). The daily activities for this task (9) are shown in Table 1 along with the available 10s multi-channel segments in the development set and the amount of full sessions of a certain activity (e.g. a cooking session).

Activity	# 10s segments	# sessions
Absence (nobody present in the room)	18860	42
Cooking	5124	13
Dishwashing	1424	10
Eating	2308	13
Other (present but not doing any relevant activity)	2060	118
Social activity (visit, phone call)	4944	21
Vacuum cleaning	972	9
Watching TV	18648	9
Working (typing, mouse click, ...)	18644	33
Total	72984	268

As development set, approximately 200 hours of data from 4 sensor nodes along with the ground truth is given. As evaluation set, data is provided from all the sensor nodes (i.e. also sensor nodes not present in the development set). The evaluation will be based on the sensor nodes not present in the development set. The data from the same nodes as in training are provided to give insights about the overfitting on those positions. The partitioning of the data was done randomly. The segments belonging to one particular consecutive activity (e.g. a full session of cooking) were kept together. The data provided for each sensor node contain recordings of the same time period. This means that the performed activities are observed from multiple microphone arrays at the same time instant. Due to the subsampling on the segments of the largest classes, there is not a full time-wise overlap by all sensor nodes for a particular consecutive activity of those classes.

Recording and annotation procedure

The sensor node configuration used in this setup is a control board together with a linear microphone array. The control board contains an EFM32 ARM cortex M4 microcontroller from Silicon Labs (EFM32WG980) used for sampling the analog audio. The microphone array contains four Sonion N8AC03 MEMS low-power (±17µW) microphones with an inter-microphone distance of 5 cm. The sampling for each audio channel is done sequentially at a rate of 16 kHz with a bit depth of 12. The annotation was performed in two phases. First, during the data collection a smartphone application was used to let the monitored person(s) annotate the activities while being recorded. The person could only select a fixed set of activities. The application was easy to use and did not significantly influence the transition between activities. Secondly, the start and stop timestamps of each activity were refined by using our own annotation software. Postprocessing and sharing the database involves privacy-related aspects. Besides the person(s) living there, multiple people visited the home. Moreover, during a phone call, one can partially hear the person on the other end. A written informed consent was obtained from all participants.

More information about the full dataset can be found in:

Publication

Gert Dekkers, Steven Lauwereins, Bart Thoen, Mulu Weldegebreal Adhana, Henk Brouckxon, Toon van Waterschoot, Bart Vanrumste, Marian Verhelst, and Peter Karsmakers. The SINS database for detection of daily activities in a home environment using an acoustic sensor network. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), 32–36. November 2017.

PDF

The SINS Database for Detection of Daily Activities in a Home Environment Using an Acoustic Sensor Network

Abstract

There is a rising interest in monitoring and improving human wellbeing at home using different types of sensors including microphones. In the context of Ambient Assisted Living (AAL) persons are monitored, e.g. to support patients with a chronic illness and older persons, by tracking their activities being performed at home. When considering an acoustic sensing modality, a performed activity can be seen as an acoustic scene. Recently, acoustic detection and classification of scenes and events has gained interest in the scientific community and led to numerous public databases for a wide range of applications. However, no public databases exist which a) focus on daily activities in a home environment, b) contain activities being performed in a spontaneous manner, c) make use of an acoustic sensor network, and d) are recorded as a continuous stream. In this paper we introduce a database recorded in one living home, over a period of one week. The recording setup is an acoustic sensor network containing thirteen sensor nodes, with four low-cost microphones each, distributed over five rooms. Annotation is available on an activity level. In this paper we present the recording and annotation procedure, the database content and a discussion on a baseline detection benchmark. The baseline consists of Mel-Frequency Cepstral Coefficients, Support Vector Machine and a majority vote late-fusion scheme. The database is publicly released to provide a common ground for future research.

Keywords

Database, Acoustic Scene Classification, Acoustic Event Detection, Acoustic Sensor Networks

PDF

Download: Development dataset

In case you are using the provided baseline system, there is no need to download the dataset as the system will automatically download needed dataset for you.

Task 5, development dataset (42.6 GB download - 87.0 GB when extracted)

An inconsistency in the dataset was reported here. The issue is fixed in the current release of the dataset (v1.0.3, 15/05/2018). If you have an earlier version you can either download the entire dataset again or overwrite a subset of the files using this archive

The content of the development set is structured in the following manner:

dataset root
│   EULA.pdf                End user license agreement
│   meta.txt                meta data, tsv-format, [audio file (str)][tab][label (str)][tab][session (str)]
│   README.md               Dataset description (markdown)
│   README.html             Dataset description (HTML)
│
└───audio                   72984 audio segments, 16-bit 16kHz
│   │   DevNode1_ex1_1.wav  name format DevNode{NodeID}_ex{sessionID}_{segmentID}.wav
│   │   DevNode2_ex1_2.wav
│   │   ...
│
└───evaluation_setup        cross-validation setup, 4 folds
    │   fold1_train.txt     training file list, tsv-format, [audio file (str)][tab][label (str)][tab][session (str)]
    │   fold1_test.txt      test file list, tsv-format, [audio file (str)]
    │   fold1_evaluate.txt  evaluation file list, tsv-format, [audio file (str)][tab][label (str)]  
    │   ...

The multi-channel audio files can be found under directory audio and are formatted in the following manner:

DevNode{NodeID}_ex{sessionID}_{segmentID}.wav

{NodeID} (1-4) is an identifier to indicate which segments belong to a specific node. In total 4 nodes are given (1-4). It is unknown what the location of the node is to the participant.
{sessionID} indicates a full session of a certain activity.
{segmentID} indicates a segment belonging to a certain {sessionID}. A session of a certain activity (e.g. cooking) can have multiple 10s segments. Keep in mind that segmentIDs are not shared between nodes (e.g. DevNode1_ex1_1 is not necessarely recorded at the same time range as DevNode2_ex1_1 but it surely belongs to the same session).

The file meta.txt and the content of the folder evaluation_setup contain filenames with optionally ground truth labels and an identifier of to which session the segment belongs. These are arranged in the following manner:

[filename (str)][tab][activity label (str)][tab][session (str)]

The directory evaluation_setup provides cross-validation folds for the development dataset. More information on the usage can be read here

Download: Evaluation dataset

In case you are using the provided baseline system, there is no need to download the dataset as the system will automatically download needed dataset for you.

Task 5, evaluation dataset (42.2 GB download - 87.0 GB when extracted)

version 2.0

The content of the dataset is structured in the following manner:

dataset root
│   EULA.pdf                End user license agreement
│   meta.txt                meta data, tsv-format, [audio file (str)]\n
│   readme.md               Dataset description (markdown)
│   readme.html             Dataset description (HTML)
│
└───audio                   72972 audio segments, 16-bit 16kHz
│   │   1.wav               name format {segmentID}.wav
│   │   100.wav
│   │   ...
│
└───evaluation_setup        evaluation files
    │   test.txt            test file list, tsv-format, [audio file (str)]\n

The multi-channel audio files can be found under directory audio and are formatted in the following manner:

{segmentID}.wav

The file meta.txt and the content of the folder evaluation_setup contain filenames. Ground truth will be made available after the challenge results have been made public. Additionally, a filename mapping will be made available that will map the filenames to a filename similar as the development dataset.

Task setup

The task is split up in two phases. First the development dataset is provided. A month before the challenge deadline the evaluation sets are provided. At the challenge deadline submissions include the system output on the evaluation set, system meta information and one technical report. The goal of the task to obtain the highest score on the evaluation set.

Development set

The development set includes multi-channel audio segments, recorded by four different sensor nodes, along with the ground truth and cross-validation folds. Cross-validation folds are provided for the development dataset in order to make results reported with this dataset uniform. Results on these subsets are used for comparison in the initial internal report and also need to be reported in the outputted meta information. The setup consists of four folds distributing the available files. Segments belonging to a particular session of an activity (e.g. a session of cooking collected by multiple sensor nodes) are kept together to minimize leakage between folds. The folds are provided with the dataset in the directory evaluation setup. For each fold a training, testing and evaluation subset is provided.

Important: Important: If you are not using the provided cross-validation setup, pay attention to the segments extracted from the same sessions. Make sure that for each given fold, ALL segments from the same session must be either in the training subset OR in the test subset.

External data sources/pre-trained models

List of external datasets/pre-trained models allowed:

Dataset name	Type	Added	Link
AudioSet	audio	29.6.2018	https://research.google.com/audioset/
VGGish	model	5.7.2018	https://research.google.com/audioset/
Xception	model	5.7.2018	https://keras.io/applications/
VGG16	model	5.7.2018	http://www.robots.ox.ac.uk/~vgg/research/very_deep/
VGG19	model	5.7.2018	http://www.robots.ox.ac.uk/~vgg/research/very_deep/
ResNet50	model	5.7.2018	https://github.com/KaimingHe/deep-residual-networks
InceptionV3	model	5.7.2018	https://keras.io/applications/
InceptionResNetV2	model	5.7.2018	https://keras.io/applications/
MobileNet	model	5.7.2018	https://keras.io/applications/
DenseNet	model	5.7.2018	https://keras.io/applications/
MobileNetV2	model	5.7.2018	https://keras.io/applications/

Evaluation set

The evaluation set includes multi-channel audio segments, recorded by seven different sensor nodes. Three sensor nodes are not in the development set and will be used for the final evaluation score. The other segments obtained by the same nodes as in the development set are used to check overfitting. Participants should run their system on this dataset, and submit the classification results (system output) to DCASE2018 Challenge. The evaluation dataset is provided without ground truth.

Submission

Challenge submissions consists of one zip-package containing the system outputs and system meta information and one technical report (pdf file). Detailed information for the challenge submission can found on the submission page. System output should be presented as a single text-file (in tab-seperated format) containing classification result for each audio file in the evaluation set. Result items can be in any order. The format is as follows:

[filename (string)][tab][activity label (string)]

The filename should be formatted such that it includes the audio folder. An example of the output file:

audio/1.wav other
audio/2.wav social_activity
audio/3.wav eating
audio/4.wav working
audio/5.wav absence
audio/6.wav vacuum_cleaner
audio/7.wav dishwashing
audio/8.wav watching_tv
audio/9.wav cooking

A template for the system meta information (.yaml file) is available on the submissions page.

Multiple system outputs can be submitted (maximum 4 per participant). If submitting multiple systems, the individual text-files should be packaged into a zip file for submission. Please carefully mark the connection between the submitted files and the corresponding system or system parameters (for example by naming the text file appropriately).

Task rules

These are the general rules valid for all tasks. The same rules and additional information on technical report and submission requirements can be found here. Task specific rules are highlighted in bold.

Participants are allowed to use external data for system development taking into account the following principles:
- The used data must be publicly available without cost before 29th of March 2018
- External data includes pre-trained models
- Participants should inform/suggest such data to be listed on the task webpage, so all competitors know about them and have equal opportunity to use them
- Once the evaluation set is published, the list of external datasets allowed is locked (no further external sources allowed)
Manipulation of provided training data is allowed (e.g. data augmentation).
Participants are not allowed to make subjective judgements on the evaluation data, nor to annotate it.
The evaluation dataset cannot be used to train the submitted system; the use of statistics about the evaluation data in the decision making is also forbidden.
The system outputs that do not respect the challenge rules will be evaluated on request, but they will not be officially included in the challenge rankings.

Evaluation

The scoring of this task will be based on macro-averaged F1-score. The F1-score is calculated for each class seperately and averaged over all classes. A full 10s multi-channel audio segment is considered to be one sample. The winner is of the task is the submission with the highest macro-averaged F1-score on the evaluation set. The output is send through and the scores are calculated by the task coordinators.

Results

Rank	Submission Information
Rank	Code	Author	Affiliation	Technical Report	F1-score on Eval. set (Unknown mic.)
	DCASE2018 baseline	Gert Dekkers	Computer Science, KU Leuven - ADVISE, Geel, Belgium	task-monitoring-domestic-activities-results#Dekkers2018	83.1
	Delphin_OL_task5_1	Lionel Delphin-Poulat	HOME/CONTENT, Orange Labs, Lannion, France	task-monitoring-domestic-activities-results#Delphin-Poulat2018	80.7
	Delphin_OL_task5_2	Lionel Delphin-Poulat	HOME/CONTENT, Orange Labs, Lannion, France	task-monitoring-domestic-activities-results#Delphin-Poulat2018	80.8
	Delphin_OL_task5_3	Lionel Delphin-Poulat	HOME/CONTENT, Orange Labs, Lannion, France	task-monitoring-domestic-activities-results#Delphin-Poulat2018	81.6
	Delphin_OL_task5_4	Lionel Delphin-Poulat	HOME/CONTENT, Orange Labs, Lannion, France	task-monitoring-domestic-activities-results#Delphin-Poulat2018	82.5
	Inoue_IBM_task5_1	Tadanobu Inoue	AI, IBM Research, Tokyo, Japan	task-monitoring-domestic-activities-results#Inoue2018	88.4
	Inoue_IBM_task5_2	Tadanobu Inoue	AI, Research, Tokyo, Japan	task-monitoring-domestic-activities-results#Inoue2018	88.3
	Kong_Surrey_task5_1	Qiuqiang Kong	Centre for Vission, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, UK	task-monitoring-domestic-activities-results#Kong2018	83.2
	Kong_Surrey_task5_2	Qiuqiang Kong	Centre for Vission, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, UK	task-monitoring-domestic-activities-results#Kong2018	82.4
	Li_NPU_task5_1	Dexin Li	Speech Signal Processing, CIAIC, Xi'an, China	task-monitoring-domestic-activities-results#Li2018	79.0
	Li_NPU_task5_2	Dexin Li	Speech Signal Processing, CIAIC, Xi'an, China	task-monitoring-domestic-activities-results#Li2018	78.6
	Li_NPU_task5_3	Dexin Li	Speech Signal Processing, CIAIC, Xi'an, China	task-monitoring-domestic-activities-results#Li2018	84.8
	Li_NPU_task5_4	Dexin Li	Speech Signal Processing, CIAIC, Xi'an, China	task-monitoring-domestic-activities-results#Li2018	85.1
	Liao_NTHU_task5_1	Hsueh-Wei Liao	Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan	task-monitoring-domestic-activities-results#Liao2018	86.7
	Liao_NTHU_task5_2	Hsueh-Wei Liao	Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan	task-monitoring-domestic-activities-results#Liao2018	72.1
	Liao_NTHU_task5_3	Hsueh-Wei Liao	Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan	task-monitoring-domestic-activities-results#Liao2018	76.7
	Liu_THU_task5_1	Huaping Liu	Computer Science and Technology, Computer Science and Technology, Beijing, China	task-monitoring-domestic-activities-results#Liu2018	87.5
	Liu_THU_task5_2	Huaping Liu	Computer Science and Technology, Computer Science and Technology, Beijing, China	task-monitoring-domestic-activities-results#Liu2018	87.4
	Liu_THU_task5_3	Huaping Liu	Computer Science and Technology, Computer Science and Technology, Beijing, China	task-monitoring-domestic-activities-results#Liu2018	86.8
	Nakadai_HRI-JP_task5_1	Kazuhiro Nakadai	Research Div., Honda Research Institute Japan, Wako, Japan	task-monitoring-domestic-activities-results#Nakadai2018	85.4
	Raveh_INRC_task5_1	Alon Raveh	Signal Processing Department, National Research Center, Haifa, Israel	task-monitoring-domestic-activities-results#Raveh2018	80.4
	Raveh_INRC_task5_2	Alon Raveh	Signal Processing Department, National Research Center, Haifa, Israel	task-monitoring-domestic-activities-results#Raveh2018	80.2
	Raveh_INRC_task5_3	Alon Raveh	Signal Processing Department, National Research Center, Haifa, Israel	task-monitoring-domestic-activities-results#Raveh2018	81.7
	Raveh_INRC_task5_4	Alon Raveh	Signal Processing Department, National Research Center, Haifa, Israel	task-monitoring-domestic-activities-results#Raveh2018	81.2
	Sun_SUTD_task5_1	Yingxiang Sun	Engineering Product Development, Singapore University of Technology and Design, Singapore	task-monitoring-domestic-activities-results#Chew2018	76.8
	Tanabe_HIT_task5_1	Ryo Tanabe	R&D Group, Hitachi, Ltd., Tokyo, Japan	task-monitoring-domestic-activities-results#Tanabe2018	88.4
	Tanabe_HIT_task5_2	Ryo Tanabe	R&D Group, Hitachi, Ltd., Tokyo, Japan	task-monitoring-domestic-activities-results#Tanabe2018	82.2
	Tanabe_HIT_task5_3	Ryo Tanabe	R&D Group, Hitachi, Ltd., Tokyo, Japan	task-monitoring-domestic-activities-results#Tanabe2018	86.3
	Tanabe_HIT_task5_4	Ryo Tanabe	R&D Group, Hitachi, Ltd., Tokyo, Japan	task-monitoring-domestic-activities-results#Tanabe2018	88.4
	Tiraboschi_UNIMI_task5_1	Marco Tiraboschi	Computer Science, UniversitÃ degli Studi di Milano, Milan, Italy	task-monitoring-domestic-activities-results#Tiraboschi2018	76.9
	Zhang_THU_task5_1	Weiqiang Zhang	Electronic Engineering, Tsinghua University, Beijing, China	task-monitoring-domestic-activities-results#Shen2018	85.9
	Zhang_THU_task5_2	Weiqiang Zhang	Electronic Engineering, Tsinghua University, Beijing, China	task-monitoring-domestic-activities-results#Shen2018	84.3
	Zhang_THU_task5_3	Weiqiang Zhang	Electronic Engineering, Tsinghua University, Beijing, China	task-monitoring-domestic-activities-results#Shen2018	86.0
	Zhang_THU_task5_4	Weiqiang Zhang	Electronic Engineering, Tsinghua University, Beijing, China	task-monitoring-domestic-activities-results#Shen2018	85.9

Complete results and technical reports can be found at results page

Baseline system

Setup

The baseline system is intended to lower the hurdle to participate the challenge. It provides an entry-level approach which is simple but relatively close to the state of the art systems. High-end performance is left for the challenge participants to find. Participants are allowed to build their system on top of the given baseline system. The system has all needed functionality for dataset handling, storing / accessing features and models, and evaluating the results, making the adaptation for one's needs rather easy. The baseline system is also a good starting point for entry level researchers.

If participants plan to publish their code to the DCASE community after the challenge, by building their approach on the baseline system will make their code more accessible to the community. DCASE organizers encourage participants strongly to share their code in any form after the challenge to push the research further.

During the recording campaign, data was measured simultaneously using multiple microphone arrays (nodes) each containing 4 microphones. Hence, each domestic activity is recorded as many times as there were microphones. The baseline system trains a single classifier model that takes a single channel as input. Each parallel recording of a single activity is considered as a different example during training. The learner in the baseline system is based on a Neural Network architecture using convolutional and dense layers. As input, log mel-band energies are provided to the network for each microphone channel separately. In the prediction stage a single outcome is computed for each node by averaging the 4 model outcomes (posteriors) that were computed by evaluating the trained classifier model on all 4 microphones.

The baseline system parameters are as follows:

Frame size: 40 ms (50% hop size)
Feature matrix:
- 40 log mel-band energies in 501 successive frames (10 s)
Neural Network:
- Input data: 40x501 (each microphone channel is considered to be a separate example for the learner)
- Architecture:
  - 1D Convolutional layer (filters: 32, kernel size: 5, stride: 1, axis: time) + Batch Normalization + ReLU activation
  - 1D Max Pooling (pool size: 5, stride: 5) + Dropout (rate: 20%)
  - 1D Convolutional layer (filters, 64, kernel size: 3, stride: 1, axis: time) + Batch Normalization + ReLU activation
  - 1D Global Max Pooling + Dropout (rate: 20%)
  - Dense layer (neurons: 64) + ReLU activation + Dropout (rate: 20%)
  - Softmax output layer (classes: 9)
- Learning:
  - Optimizer: Adam (learning rate: 0.0001)
  - Epochs: 500
  - On each epoch, the training dataset is randomly subsampled so that the number of examples for each class match the size of the smallest class
  - Batch size: 256 * 4 channels (each channel is considered as a different example for the learner)
Fusion: Output probabilities from the four microphones in a particular node under test are averaged to obtain the final posterior probability.
Model selection: The performance of the model is evaluated every 10 epochs on a validation subset (30% subsampled from the training set). The model with the highest Macro-averaged F1-score is picked.

The baseline system is build on dcase_util toolbox. The machine learning part of the code in build on Keras (v2.1.5) while using TensorFlow (v1.4.0) as backend.

Repository

DCASE2018 Task 5 Baseline

An inconsistency in the dataset was reported here. The issue is fixed in the current release of the dataset (v1.0.3, 15/05/2018). The new repository is updated on the latest release of the [dcase_util library (v0.2.3)](https://github.com/DCASE-REPO/dcase_util). Using an older version will download an older version of the dataset. If you prefer to not download all files again, you can overwrite a subset of the files using this archive.

Results for the development dataset

When running the code in development mode the baseline system provides results for the 4-fold cross-validation setup. The table below shows the averaged Macro-averaged F1-score over these 4 folds.

Activity	F1-score
Absence	85.41 %
Cooking	95.14 %
Dishwashing	76.73 %
Eating	83.64 %
Other	44.76 %
Social activity	93.92 %
Vacuum cleaning	99.31 %
Watching TV	99.59 %
Working	82.03 %
Macro-averaged F1-score	84.50 %

Note: The performance might not be exactly reproducible but similar results should be obtainable.

Citation

If you are using the dataset please cite the following paper:

Publication

PDF

The SINS Database for Detection of Daily Activities in a Home Environment Using an Acoustic Sensor Network

Abstract

Keywords

Database, Acoustic Scene Classification, Acoustic Event Detection, Acoustic Sensor Networks

PDF

If you are using the baseline code, or want to refer challenge task please cite the following papers:

Publication

Gert Dekkers, Lode Vuegen, Toon van Waterschoot, Bart Vanrumste, and Peter Karsmakers. DCASE 2018 Challenge - Task 5: Monitoring of domestic activities based on multi-channel acoustics. Technical Report, KU Leuven, 2018. URL: https://arxiv.org/abs/1807.11246, arXiv:1807.11246.

PDF

DCASE 2018 Challenge - Task 5: Monitoring of domestic activities based on multi-channel acoustics

Abstract

The DCASE 2018 Challenge consists of five tasks related to automatic classification and detection of sound events and scenes. This paper presents the setup of Task 5 which includes the description of the task, dataset and the baseline system. In this task, it is investigated to which extend multi-channel acoustic recordings are beneficial for the purpose of classifying domestic activities. The goal is to exploit spectral and spatial cues independent of sensor location using multi-channel audio. For this purpose we provided a development and evaluation dataset which are derivatives of the SINS database and contain domestic activities recorded by multiple microphone arrays. The baseline system, based on a Neural Network architecture using convolutional and dense layer(s), is intended to lower the hurdle to participate the challenge and to provide a reference performance.

Keywords

Acoustic scene classification, Multi-channel, Activities of the Daily Living

PDF

	Gert Dekkers KU Leuven
	Peter Karsmakers KU Leuven
	Lode Vuegen KU Leuven

Coordinators

Content

Introduction

Description

Dataset

Content

Recording and annotation procedure

The SINS Database for Detection of Daily Activities in a Home Environment Using an Acoustic Sensor Network

Abstract

Keywords

Download: Development dataset

Download: Evaluation dataset

Task setup

Development set

External data sources/pre-trained models

Evaluation set

Submission

Task rules

Evaluation

Results

Baseline system

Setup

Repository

Results for the development dataset

Citation

The SINS Database for Detection of Daily Activities in a Home Environment Using an Acoustic Sensor Network

Abstract

Keywords

DCASE 2018 Challenge - Task 5: Monitoring of domestic activities based on multi-channel acoustics

Abstract

Keywords