Task description

This challenge focuses on sound event detection in a few-shot learning setting for animal (mammal and bird) vocalisations. Participants will be expected to create a method that can extract information from five exemplar vocalisations (shots) of mammals or birds and detect and classify sounds in field recordings. The main objective is to find reliable algorithms that are capable of dealing with data sparsity, class imbalance, and noisy/busy environments.

More detailed task description can be found in the task description page

Systems ranking

Submission code	Submission name	Technical Report	Event-based F-score with 95% confidence interval (Evaluation dataset)	Event-based F-score (Validation dataset)
Baseline_TempMatch_task5_1	Baseline Template Matching		34.8 (32.6 - 37.1)	2.0
Baseline_PROTO_task5_1	Baseline Prototypical Network		20.1 (18.2 - 21.9)	41.5
Anderson_TCD_task5_1	Prototypical Network with SpecAugment	Anderson2021	35.0 (33.1 - 37.0)	26.2
Bielecki_SMSNG_task5_1	Prototypical network with knowledge distillation and attention loss	Bielecki2021	8.4 (7.1 - 9.6)	52.5
Bielecki_SMSNG_task5_2	Prototypical network with knowledge distillation and attention loss	Bielecki2021	5.8 (4.9 - 6.7)	51.8
Bielecki_SMSNG_task5_3	Prototypical network with knowledge distillation and attention loss	Bielecki2021	8.4 (7.1 - 9.7)	51.8
Bielecki_SMSNG_task5_4	Prototypical network with knowledge distillation and attention loss	Bielecki2021	5.3 (4.4 - 6.2)	51.1
Cheng_BIT_task5_1	ivector baseline	Cheng2021	23.8 (21.9 - 25.7)	46.3
Cheng_BIT_task5_2	baseline_5w3s	Cheng2021	12.5 (11.0 - 14.1)	47.8
Cheng_BIT_task5_3	baseline_5w5s	Cheng2021	11.0 (9.4 - 12.6)	45.0
Cheng_BIT_task5_4	ivector-tripleloss baseline	Cheng2021	8.0 (6.7 - 9.3)	44.9
Johannsmeier_OVGU_task5_1	Prototype Segmentation	Johannsmeier2021	5.5 (4.7 - 6.4)	59.8
Johannsmeier_OVGU_task5_2	Prototype Segmentation	Johannsmeier2021	4.5 (3.7 - 5.4)	56.0
Johannsmeier_OVGU_task5_3	Prototype Segmentation	Johannsmeier2021	15.2 (13.7 - 16.7)	58.6
Johannsmeier_OVGU_task5_4	Prototype Segmentation	Johannsmeier2021	7.1 (5.9 - 8.3)	58.8
zhang_uestc_task5_1	dcase2021-t5 prototypical network	Zhang2021	9.0 (7.8 - 10.2)	52.9
zhang_uestc_task5_2	dcase2021-t5 prototypical network	Zhang2021	8.3 (7.1 - 9.4)	53.8
zhang_uestc_task5_3	dcase2021-t5 prototypical network	Zhang2021	16.8 (15.5 - 18.2)	54.4
zhang_uestc_task5_4	dcase2021-t5 prototypical network	Zhang2021	7.2 (6.0 - 8.4)	57.1
Zou_PKU_task5_1	TIM	Zou2021	33.2 (31.0 - 35.3)	55.3
Yang_PKU_task5_2	Contrast learning for few shot learning	Zou2021	22.4 (20.7 - 24.1)	55.3
Zou_PKU_task5_3	TIM-ML	Zou2021	38.4 (36.2 - 40.6)	55.3
Zou_PKU_task5_4	TIM-ML2	Zou2021	33.7 (31.7 - 35.8)	55.3
Tang_SHNU_task5_1	SHNU1	Tang2021	36.5 (34.0 - 38.9)	54.7
Tang_SHNU_task5_2	SHNU2	Tang2021	35.1 (31.7 - 38.4)	51.7
Tang_SHNU_task5_3	SHNU3	Tang2021	38.3 (36.1 - 40.5)	51.4

Dataset wise metrics

Submission code	Submission name	Technical Report	Event-based F-score with 95% confidence interval (Evaluation dataset)	Event-based F-score (DC dataset)	Event-based F-score (ME dataset)	Event-based F-score (ML dataset)
Baseline_TempMatch_task5_1	Baseline Template Matching		34.8 (32.6 - 37.1)	32.2	47.0	29.5
Baseline_PROTO_task5_1	Baseline Prototypical Network		20.1 (18.2 - 21.9)	8.5	72.7	55.7
Anderson_TCD_task5_1	Prototypical Network with SpecAugment	Anderson2021	35.0 (33.1 - 37.0)	19.9	56.6	56.8
Bielecki_SMSNG_task5_1	Prototypical network with knowledge distillation and attention loss	Bielecki2021	8.4 (7.1 - 9.6)	3.1	57.3	43.7
Bielecki_SMSNG_task5_2	Prototypical network with knowledge distillation and attention loss	Bielecki2021	5.8 (4.9 - 6.7)	2.1	74.4	32.9
Bielecki_SMSNG_task5_3	Prototypical network with knowledge distillation and attention loss	Bielecki2021	8.4 (7.1 - 9.7)	3.1	56.3	51.4
Bielecki_SMSNG_task5_4	Prototypical network with knowledge distillation and attention loss	Bielecki2021	5.3 (4.4 - 6.2)	1.9	44.3	45.0
Cheng_BIT_task5_1	ivector baseline	Cheng2021	23.8 (21.9 - 25.7)	10.6	53.5	78.8
Cheng_BIT_task5_2	baseline_5w3s	Cheng2021	12.5 (11.0 - 14.1)	4.8	80.8	57.8
Cheng_BIT_task5_3	baseline_5w5s	Cheng2021	11.0 (9.4 - 12.6)	4.1	75.5	56.4
Cheng_BIT_task5_4	ivector-tripleloss baseline	Cheng2021	8.0 (6.7 - 9.3)	2.9	70.5	53.1
Johannsmeier_OVGU_task5_1	Prototype Segmentation	Johannsmeier2021	5.5 (4.7 - 6.4)	2.0	51.4	37.3
Johannsmeier_OVGU_task5_2	Prototype Segmentation	Johannsmeier2021	4.5 (3.7 - 5.4)	1.7	60.8	17.9
Johannsmeier_OVGU_task5_3	Prototype Segmentation	Johannsmeier2021	15.2 (13.7 - 16.7)	6.5	64.3	35.8
Johannsmeier_OVGU_task5_4	Prototype Segmentation	Johannsmeier2021	7.1 (5.9 - 8.3)	2.7	61.5	29.4
zhang_uestc_task5_1	dcase2021-t5 prototypical network	Zhang2021	9.0 (7.8 - 10.2)	3.5	49.3	32.4
zhang_uestc_task5_2	dcase2021-t5 prototypical network	Zhang2021	8.3 (7.1 - 9.4)	3.4	41.8	23.9
zhang_uestc_task5_3	dcase2021-t5 prototypical network	Zhang2021	16.8 (15.5 - 18.2)	8.1	45.1	29.9
zhang_uestc_task5_4	dcase2021-t5 prototypical network	Zhang2021	7.2 (6.0 - 8.4)	2.8	45.1	24.7
Zou_PKU_task5_1	TIM	Zou2021	33.2 (31.0 - 35.3)	16.1	72.7	67.9
Yang_PKU_task5_2	Contrast learning for few shot learning	Zou2021	22.4 (20.7 - 24.1)	10.3	61.0	49.9
Zou_PKU_task5_3	TIM-ML	Zou2021	38.4 (36.2 - 40.6)	20.6	68.0	67.3
Zou_PKU_task5_4	TIM-ML2	Zou2021	33.7 (31.7 - 35.8)	17.3	62.8	66.4
Tang_SHNU_task5_1	SHNU1	Tang2021	36.5 (34.0 - 38.9)	22.3	48.6	59.3
Tang_SHNU_task5_2	SHNU2	Tang2021	35.1 (31.7 - 38.4)	25.5	31.7	67.2
Tang_SHNU_task5_3	SHNU3	Tang2021	38.3 (36.1 - 40.5)	25.6	61.5	43.3

Teams ranking

Table including only the best performing system per submitting team.

Submission code	Submission name	Technical Report	Event-based F-score with 95% confidence interval (Evaluation dataset)	Event-based F-score (Development dataset)
Baseline_TempMatch_task5_1	Baseline Template Matching		34.8 (32.6 - 37.1)	2.0
Baseline_PROTO_task5_1	Baseline Prototypical Network		20.1 (18.2 - 21.9)	41.5
Anderson_TCD_task5_1	Prototypical Network with SpecAugment	Anderson2021	35.0 (33.1 - 37.0)	26.2
Bielecki_SMSNG_task5_3	Prototypical network with knowledge distillation and attention loss	Bielecki2021	8.4 (7.1 - 9.7)	51.8
Cheng_BIT_task5_1	ivector baseline	Cheng2021	23.8 (21.9 - 25.7)	46.3
Johannsmeier_OVGU_task5_3	Prototype Segmentation	Johannsmeier2021	15.2 (13.7 - 16.7)	58.6
zhang_uestc_task5_3	dcase2021-t5 prototypical network	Zhang2021	16.8 (15.5 - 18.2)	54.4
Zou_PKU_task5_3	TIM-ML	Zou2021	38.4 (36.2 - 40.6)	55.3
Tang_SHNU_task5_3	SHNU3	Tang2021	38.3 (36.1 - 40.5)	51.4

System characteristics

General characteristics

Code	Technical Report	Event-based F-score with 95% confidence interval (Evaluation dataset)	Sampling rate	Data augmentation	Features
Baseline_TempMatch_task5_1		34.8 (32.6 - 37.1)	any		spectrogram
Baseline_PROTO_task5_1		20.1 (18.2 - 21.9)	22.05 KHz		PCEN
Anderson_TCD_task5_1	Anderson2021	35.0 (33.1 - 37.0)	22.05 KHz	time warping, time masking, frequency masking	PCEN, Mel Spectrogram
Bielecki_SMSNG_task5_1	Bielecki2021	8.4 (7.1 - 9.6)	22.05 KHz	melspectrogram time, frequency masking	melspectrogram
Bielecki_SMSNG_task5_2	Bielecki2021	5.8 (4.9 - 6.7)	22.05 KHz	melspectrogram time, frequences masking	melspectrogram
Bielecki_SMSNG_task5_3	Bielecki2021	8.4 (7.1 - 9.7)	22.05 KHz	melspectrogram time, frequency masking	melspectrogram
Bielecki_SMSNG_task5_4	Bielecki2021	5.3 (4.4 - 6.2)	22.05 KHz	melspectrogram time, frequency masking	melspectrogram
Cheng_BIT_task5_1	Cheng2021	23.8 (21.9 - 25.7)	22.05 KHz	Specaugment	PCEN,i-vector
Cheng_BIT_task5_2	Cheng2021	12.5 (11.0 - 14.1)	22.05 KHz	Specaugment	PCEN
Cheng_BIT_task5_3	Cheng2021	11.0 (9.4 - 12.6)	22.05 KHz	Specaugment	PCEN
Cheng_BIT_task5_4	Cheng2021	8.0 (6.7 - 9.3)	22.05 KHz	Specaugment	PCEN, i-vector
Johannsmeier_OVGU_task5_1	Johannsmeier2021	5.5 (4.7 - 6.4)	22.05 KHz	time stretching, pitch shifting, time shifting	mel energies, PCEN
Johannsmeier_OVGU_task5_2	Johannsmeier2021	4.5 (3.7 - 5.4)	22.05 KHz	time stretching, pitch shifting, time shifting	mel energies, PCEN
Johannsmeier_OVGU_task5_3	Johannsmeier2021	15.2 (13.7 - 16.7)	22.05 KHz	time stretching, pitch shifting, time shifting	mel energies, PCEN
Johannsmeier_OVGU_task5_4	Johannsmeier2021	7.1 (5.9 - 8.3)	22.05 KHz	time stretching, pitch shifting, time shifting	mel energies, PCEN
zhang_uestc_task5_1	Zhang2021	9.0 (7.8 - 10.2)	25.6 KHz	Specaugment	PCEN
zhang_uestc_task5_2	Zhang2021	8.3 (7.1 - 9.4)	25.6 KHz	Specaugment	PCEN
zhang_uestc_task5_3	Zhang2021	16.8 (15.5 - 18.2)	25.6 KHz	Specaugment	PCEN
zhang_uestc_task5_4	Zhang2021	7.2 (6.0 - 8.4)	25.6 KHz	Specaugment	PCEN
Zou_PKU_task5_1	Zou2021	33.2 (31.0 - 35.3)	22.05 KHz		spectrogram
Yang_PKU_task5_2	Zou2021	22.4 (20.7 - 24.1)	22.05 KHz		spectrogram
Zou_PKU_task5_3	Zou2021	38.4 (36.2 - 40.6)	22.05 KHz		spectrogram
Zou_PKU_task5_4	Zou2021	33.7 (31.7 - 35.8)	22.05 KHz		spectrogram
Tang_SHNU_task5_1	Tang2021	36.5 (34.0 - 38.9)	any	Specaugment, inference-time augmentation	PCEN
Tang_SHNU_task5_2	Tang2021	35.1 (31.7 - 38.4)	any	Specaugment, inference-time augmentation	PCEN
Tang_SHNU_task5_3	Tang2021	38.3 (36.1 - 40.5)	any	Specaugment, inference-time augmentation	PCEN

Machine learning characteristics

Code	Technical Report	Event-based F-score (Eval)	Classifier	Few-shot approach	Post-processing
Baseline_TempMatch_task5_1		34.8 (32.6 - 37.1)	template matching	template matching	peak picking, threshold
Baseline_PROTO_task5_1		20.1 (18.2 - 21.9)	CNN	prototypical	threshold
Anderson_TCD_task5_1	Anderson2021	35.0 (33.1 - 37.0)	CNN	prototypical	probability averaging, median filtering, minimum event length
Bielecki_SMSNG_task5_1	Bielecki2021	8.4 (7.1 - 9.6)	CNN	prototypical	minimum time length threshold, prediction frames elongation
Bielecki_SMSNG_task5_2	Bielecki2021	5.8 (4.9 - 6.7)	CNN	prototypical	min time length threshold, prediction frames elongation
Bielecki_SMSNG_task5_3	Bielecki2021	8.4 (7.1 - 9.7)	CNN	prototypical	min time length threshold, prediction frames elongation
Bielecki_SMSNG_task5_4	Bielecki2021	5.3 (4.4 - 6.2)	CNN	prototypical	min time length threshold, prediction frames elongation
Cheng_BIT_task5_1	Cheng2021	23.8 (21.9 - 25.7)	CNN	prototypical	threshold
Cheng_BIT_task5_2	Cheng2021	12.5 (11.0 - 14.1)	CNN	prototypical	threshold
Cheng_BIT_task5_3	Cheng2021	11.0 (9.4 - 12.6)	CNN	prototypical	threshold
Cheng_BIT_task5_4	Cheng2021	8.0 (6.7 - 9.3)	CNN	prototypical	threshold
Johannsmeier_OVGU_task5_1	Johannsmeier2021	5.5 (4.7 - 6.4)	CNN	prototypical	threshold, gaussian smoothing (adaptive)
Johannsmeier_OVGU_task5_2	Johannsmeier2021	4.5 (3.7 - 5.4)	CNN	prototypical	threshold, gaussian smoothing (adaptive)
Johannsmeier_OVGU_task5_3	Johannsmeier2021	15.2 (13.7 - 16.7)	CNN	prototypical	threshold, gaussian smoothing (adaptive)
Johannsmeier_OVGU_task5_4	Johannsmeier2021	7.1 (5.9 - 8.3)	CNN	prototypical	threshold, gaussian smoothing (adaptive)
zhang_uestc_task5_1	Zhang2021	9.0 (7.8 - 10.2)	ResNet	prototypical	threshold
zhang_uestc_task5_2	Zhang2021	8.3 (7.1 - 9.4)	ResNet	prototypical	threshold
zhang_uestc_task5_3	Zhang2021	16.8 (15.5 - 18.2)	ResNet	prototypical	threshold
zhang_uestc_task5_4	Zhang2021	7.2 (6.0 - 8.4)	ResNet	prototypical	threshold
Zou_PKU_task5_1	Zou2021	33.2 (31.0 - 35.3)	CNN	Transductive inference	peak picking, threshold
Yang_PKU_task5_2	Zou2021	22.4 (20.7 - 24.1)	CNN	Prototypical network	peak picking, threshold
Zou_PKU_task5_3	Zou2021	38.4 (36.2 - 40.6)	CNN	Transductive inference	peak picking, threshold
Zou_PKU_task5_4	Zou2021	33.7 (31.7 - 35.8)	CNN	Transductive inference	peak picking, threshold
Tang_SHNU_task5_1	Tang2021	36.5 (34.0 - 38.9)	CNN	prototypical	peak picking, median filtering
Tang_SHNU_task5_2	Tang2021	35.1 (31.7 - 38.4)	CNN	prototypical	peak picking, median filtering
Tang_SHNU_task5_3	Tang2021	38.3 (36.1 - 40.5)	ResNet	fine tuning, prototypical	peak picking, median filtering

Complexity

Code	Technical Report	Event-based F-score (Eval)	Model complexity	Training time
Baseline_TempMatch_task5_1		34.8 (32.6 - 37.1)
Baseline_PROTO_task5_1		20.1 (18.2 - 21.9)
Anderson_TCD_task5_1	Anderson2021	35.0 (33.1 - 37.0)	132000	30m34s (Nvidia V100 (1) Intel Xeon Gold 5122 @ 3.60GHz 32GB RAM)
Bielecki_SMSNG_task5_1	Bielecki2021	8.4 (7.1 - 9.6)	813600	3h (Generation)
Bielecki_SMSNG_task5_2	Bielecki2021	5.8 (4.9 - 6.7)	1084200	3h (Generation)
Bielecki_SMSNG_task5_3	Bielecki2021	8.4 (7.1 - 9.7)	813600	3h (Generation)
Bielecki_SMSNG_task5_4	Bielecki2021	5.3 (4.4 - 6.2)	813600	3h (Generation)
Cheng_BIT_task5_1	Cheng2021	23.8 (21.9 - 25.7)	6762757	1h
Cheng_BIT_task5_2	Cheng2021	12.5 (11.0 - 14.1)	6762757	1h
Cheng_BIT_task5_3	Cheng2021	11.0 (9.4 - 12.6)	6762757	1h
Cheng_BIT_task5_4	Cheng2021	8.0 (6.7 - 9.3)	6762757	1h
Johannsmeier_OVGU_task5_1	Johannsmeier2021	5.5 (4.7 - 6.4)	389804	300 seconds (single NVIDIA Geforce 1080Ti)
Johannsmeier_OVGU_task5_2	Johannsmeier2021	4.5 (3.7 - 5.4)	389804	300 seconds (single NVIDIA Geforce 1080Ti)
Johannsmeier_OVGU_task5_3	Johannsmeier2021	15.2 (13.7 - 16.7)	389804	300 seconds (single NVIDIA Geforce 1080Ti)
Johannsmeier_OVGU_task5_4	Johannsmeier2021	7.1 (5.9 - 8.3)	1169412	900 seconds (single NVIDIA Geforce 1080Ti), 300 seconds (3GPUs parallel training)
zhang_uestc_task5_1	Zhang2021	9.0 (7.8 - 10.2)	2889984
zhang_uestc_task5_2	Zhang2021	8.3 (7.1 - 9.4)	2889984
zhang_uestc_task5_3	Zhang2021	16.8 (15.5 - 18.2)	2889984
zhang_uestc_task5_4	Zhang2021	7.2 (6.0 - 8.4)	2889984
Zou_PKU_task5_1	Zou2021	33.2 (31.0 - 35.3)	468627	403.5 seconds
Yang_PKU_task5_2	Zou2021	22.4 (20.7 - 24.1)	464531	403.5 seconds
Zou_PKU_task5_3	Zou2021	38.4 (36.2 - 40.6)	468627	403.5 seconds
Zou_PKU_task5_4	Zou2021	33.7 (31.7 - 35.8)	468627	403.5 seconds
Tang_SHNU_task5_1	Tang2021	36.5 (34.0 - 38.9)	2950000	1h (GeForce RTX 2080 Ti)
Tang_SHNU_task5_2	Tang2021	35.1 (31.7 - 38.4)	2950000	45 min (GeForce RTX 2080 Ti)
Tang_SHNU_task5_3	Tang2021	38.3 (36.1 - 40.5)	4750000	45 min (GeForce RTX 2080 Ti)

Technical reports

Bioacoustic Event Detection with Prototypical Networks and Data Augmentation

Mark Anderson and Naomi Harte

Trinity College Dublin, SIGMEDIA, Dublin, Ireland

Anderson_TCD_task5_1

PDF Code

Bioacoustic Event Detection with Prototypical Networks and Data Augmentation

Mark Anderson and Naomi Harte
Trinity College Dublin, SIGMEDIA, Dublin, Ireland

Abstract

This report presents deep learning and data augmentation techniques used by a system entered into the Few-Shot Bioacoustic Event Detection for the DCASE2021 Challenge. The remit was to develop a few-shot learning system for animal (mammal and bird) vocalisations. Participants were tasked with developing a method that can extract information from five exemplar vocalisations, or shots, of mammals or birds and detect and classify sounds in field recordings. In the system described in this report, prototypical networks are used to learn a metric space, from which classification is performed by computing the distance of a query point to class prototypes, classifying based on shortest distance. We describe the architecture of this network, feature extraction methods, and data augmentation performed on the given dataset and compare our work to the challenge's baseline networks

System characteristics

Data augmentation	time warping, time masking, frequency masking
System embeddings	False
Subsystem count	False
External data usage	False

PDF

Source code

FEW-SHOT BIOACOUSTIC EVENT DETECTION WITH PROTOTYPICAL NETWORKS , KNOWLEDGE DISTILLATION AND ATTENTION TRANSFER LOSS

Radoslaw Bielecki

Audio Intelligence, Samsung R&D Institute, Warsaw, Poland

Bielecki_SMSNG_task5_1 Bielecki_SMSNG_task5_2 Bielecki_SMSNG_task5_3 Bielecki_SMSNG_task5_4

PDF

FEW-SHOT BIOACOUSTIC EVENT DETECTION WITH PROTOTYPICAL NETWORKS , KNOWLEDGE DISTILLATION AND ATTENTION TRANSFER LOSS

Radoslaw Bielecki
Audio Intelligence, Samsung R&D Institute, Warsaw, Poland

Abstract

The report presents the results of submission to Task 5 (Few-shot Bioacoustics Event Detection) of Detection and Classification of Acoustic Scenes and Events Challenge (DCASE) 2021. This task focuses on sound event detection in a few-shot learning setting for animal (mammal and bird) vocalizations. Main issue of this task is the very limited number of training instances. The presented approach is based on prototypical networks built up from the convolutional layers. Main techniques used during model development are knowledge distillation, attention transfer loss and spectrogram augmentation. The best of presented models achieved 55.5% F-measure on the challenge validation set. That is improvement by over 10% in comparison to baseline model.

System characteristics

Data augmentation	melspectrogram time masking, frequency masking
System embeddings	False
Subsystem count	False
External data usage	directly as additional training data

PDF

PROTOTYPICAL NETWORK FOR BIOACOUSTIC EVENT DETECTION VIA I-VECTORS

Hao Cheng

Beijing Institute of Technology, School Of Information And Electronics, Beijing, China

Cheng_BIT_task5_1 Cheng_BIT_task5_2 Cheng_BIT_task5_3 Cheng_BIT_task5_4

PDF

PROTOTYPICAL NETWORK FOR BIOACOUSTIC EVENT DETECTION VIA I-VECTORS

Hao Cheng
Beijing Institute of Technology, School Of Information And Electronics, Beijing, China

Abstract

In this technical report, we present our system for the task 5 of Detection and Classification of Acoustic Scenes and Events 2021 (DCASE2021) challenge, i.e. few-shot bioacoustic event detection. First, per-channel energy normalization (PCEN) and i-vectors are extracted as features. In order to improve the diversity of original audio, some data augmentation methods are adopted, for example, specaugment. Then, the prototypical network with convolutional neural networks (CNN) is used for few-shot detection. Finally, we use aforementioned features as inputs to train our CNN model. We evaluate the proposed systems with overall F-measure for the whole of the evaluation set, and our best F-measure score on the validation set is 46.28.

System characteristics

Data augmentation	Specaugment
System embeddings	False
Subsystem count	False
External data usage	False

PDF

FEW-SHOT BIOACOUSTIC EVENT DETECTION VIA SEGMENTATION USING PROTOTYPICAL NETWORKS

Jens Johannsmeier and Sebastian Stober

Otto-von-Guericke-Universität Magdeburg, Faculty of Computer Science, Magdeburg, Germany

Johannsmeier_OVGU_task5_1 Johannsmeier_OVGU_task5_2 Johannsmeier_OVGU_task5_3 Johannsmeier_OVGU_task5_4

PDF Code

FEW-SHOT BIOACOUSTIC EVENT DETECTION VIA SEGMENTATION USING PROTOTYPICAL NETWORKS

Jens Johannsmeier and Sebastian Stober
Otto-von-Guericke-Universität Magdeburg, Faculty of Computer Science, Magdeburg, Germany

Abstract

This report describes our submission to task 5 of the 2021 DCASE challenge. We detail how we processed the data, the model structure as well as the training procedure. We may submit an extended version to the DCASE 2021 workshop.

System characteristics

Data augmentation	time stretching, pitch shifting, time shifting
System embeddings	False
Subsystem count	False
External data usage	False

PDF

Source code

TWO IMPROVED ARCHITECTURES BASED ON PROTOTYPE NETWORK FOR FEW-SHOT BIOACOUSTIC EVENT DETECTION

Tiantian Tang and Yunhao Liang and Yanhua Long

Shanghai Normal University, The College of Information, Mechanical and Electrical Engineering, Shanghai, China

Tang_SHNU_task5_1 Tang_SHNU_task5_2 Tang_SHNU_task5_3

PDF

TWO IMPROVED ARCHITECTURES BASED ON PROTOTYPE NETWORK FOR FEW-SHOT BIOACOUSTIC EVENT DETECTION

Tiantian Tang and Yunhao Liang and Yanhua Long
Shanghai Normal University, The College of Information, Mechanical and Electrical Engineering, Shanghai, China

Abstract

In this technical report, we describe our submission system for DCASE2021 Task5:few-shot bioacoustic event detection. Few improvements are investigated to better the baseline of deep learn- ing prototypical network. Including the N-way 5-shot classification prototypical network training strategy, data augmentation techniques, the proposed embedding propagation and attention similarity approaches. On the official validation set, we demonstrate that the proposed method achieves the overall F-measure score of 54.7% on the validation set.

System characteristics

Data augmentation	Specaugment, inference-time augmentation
System embeddings	False
Subsystem count	5
External data usage	AudioSet

PDF

FEW-SHOT BIOACOUSTIC EVENT DETECTION USING PROTOTYPICAL NETWORK WITH BACKGROUND CLASSs

Yue Zhang and Jun Wang and Dawei Zhang and Feng Deng

University of Electronic Science and Technology of China,ChengDu, China

zhang_uestc_task5_1 zhang_uestc_task5_2 zhang_uestc_task5_3 zhang_uestc_task5_4

PDF

FEW-SHOT BIOACOUSTIC EVENT DETECTION USING PROTOTYPICAL NETWORK WITH BACKGROUND CLASSs

Yue Zhang and Jun Wang and Dawei Zhang and Feng Deng
University of Electronic Science and Technology of China,ChengDu, China

Abstract

Few-shot bioacoustic event detection is a task to detect and classify bioacoustic events with only a few instances. This task was firstly introduced in DCASE2021 Task 5, which requires participants to create a method that can extract information from five sample sounds (shots) of mammals or birds, and detect sounds in field recordings. In this paper, a prototypical network-based method was proposed for few-shot bioacoustic event detection challenge. In order to detect the target event from the query sequence, we need to distinguish the target event, other events, and background noise with only a few support set. To solve this problem, we propose to sample background noise from the training dataset as the ”NEG” class for small sample learning. To better distinguish between events and background noise, the ”NEG” class is used as a ”way” in each episode of training. Experimental results show that the proposed method can effectively distinguish target events and background noise. The F-measure of sound event detection(SED) in the DCASE2021 Task 5 dataset can reach 57.10%, which is higher than the baseline method(41.48%).

System characteristics

Data augmentation	Specaugment
System embeddings	False
Subsystem count	False
External data usage	False

PDF

FEW-SHOT BIOACOUSTIC EVENT DETECTION = A GOOD TRANSDUCTIVE INFERENCE IS ALL YOU NEED

Dongchao Yang and Helin Wang and Zhongjie Ye and Yuexian Zou

Peking University, Shcool of ECE, Shenzhen,China

Zou_PKU_task5_1 Yang_PKU_task5_2 Zou_PKU_task5_3 Zou_PKU_task5_4

PDF Code

FEW-SHOT BIOACOUSTIC EVENT DETECTION = A GOOD TRANSDUCTIVE INFERENCE IS ALL YOU NEED

Dongchao Yang and Helin Wang and Zhongjie Ye and Yuexian Zou
Peking University, Shcool of ECE, Shenzhen,China

Abstract

In this technical report, we describe our few-shot bioacoustic event detection methods submitted to Detection and Classification of Acoustic Scenes and Events Challenge 2021 Task 5. We analyze the reason why Prototypical networks cannot perform well, and propose to use transductive inference for few shot learning. Our method maximizes the mutual information between the query features and their label predictions for a given few-shot task, in con- junction with a supervision loss based on the support set. Furthermore, we propose a mutual learning framework, which makes feature extractor and classifier to help each other. Experimental results indicate our transductive inference method get better performance than baseline, and F1 score is about 50.8% on evaluation set. Furthermore, our mutual learning framework brings about 5% improvement over the transductive inference method. We will release our code on https://github.com/yangdongchao/ DCASE2021Task5.

System characteristics

Data augmentation	False
System embeddings	False
Subsystem count	False
External data usage	False

PDF

Source code

Content

Task description

Systems ranking

Dataset wise metrics

Teams ranking

System characteristics

General characteristics

Machine learning characteristics

Complexity

Technical reports

Bioacoustic Event Detection with Prototypical Networks and Data Augmentation

Bioacoustic Event Detection with Prototypical Networks and Data Augmentation

Abstract

System characteristics

FEW-SHOT BIOACOUSTIC EVENT DETECTION WITH PROTOTYPICAL NETWORKS , KNOWLEDGE DISTILLATION AND ATTENTION TRANSFER LOSS

FEW-SHOT BIOACOUSTIC EVENT DETECTION WITH PROTOTYPICAL NETWORKS , KNOWLEDGE DISTILLATION AND ATTENTION TRANSFER LOSS

Abstract

System characteristics

PROTOTYPICAL NETWORK FOR BIOACOUSTIC EVENT DETECTION VIA I-VECTORS

PROTOTYPICAL NETWORK FOR BIOACOUSTIC EVENT DETECTION VIA I-VECTORS

Abstract

System characteristics

FEW-SHOT BIOACOUSTIC EVENT DETECTION VIA SEGMENTATION USING PROTOTYPICAL NETWORKS

FEW-SHOT BIOACOUSTIC EVENT DETECTION VIA SEGMENTATION USING PROTOTYPICAL NETWORKS

Abstract

System characteristics

TWO IMPROVED ARCHITECTURES BASED ON PROTOTYPE NETWORK FOR FEW-SHOT BIOACOUSTIC EVENT DETECTION

TWO IMPROVED ARCHITECTURES BASED ON PROTOTYPE NETWORK FOR FEW-SHOT BIOACOUSTIC EVENT DETECTION

Abstract

System characteristics

FEW-SHOT BIOACOUSTIC EVENT DETECTION USING PROTOTYPICAL NETWORK WITH BACKGROUND CLASSs

FEW-SHOT BIOACOUSTIC EVENT DETECTION USING PROTOTYPICAL NETWORK WITH BACKGROUND CLASSs

Abstract

System characteristics

FEW-SHOT BIOACOUSTIC EVENT DETECTION = A GOOD TRANSDUCTIVE INFERENCE IS ALL YOU NEED

FEW-SHOT BIOACOUSTIC EVENT DETECTION = A GOOD TRANSDUCTIVE INFERENCE IS ALL YOU NEED

Abstract

System characteristics