Few-shot Bioacoustic Event Detection


Challenge results

Task description

This challenge focuses on sound event detection in a few-shot learning setting for animal (mammal and bird) vocalisations. Participants will be expected to create a method that can extract information from five exemplar vocalisations (shots) of mammals or birds and detect and classify sounds in field recordings. The main objective is to find reliable algorithms that are capable of dealing with data sparsity, class imbalance, and noisy/busy environments.

More detailed task description can be found in the task description page

Systems ranking

Rank Submission
code
Submission
name
Technical
Report
Event-based
F-score
with 95% confidence interval
(Evaluation dataset)
Event-based
F-score
(Validation dataset)
Baseline_TempMatch_task5_1 Baseline Template Matching 34.8 (32.6 - 37.1) 2.0
Baseline_PROTO_task5_1 Baseline Prototypical Network 20.1 (18.2 - 21.9) 41.5
Anderson_TCD_task5_1 Prototypical Network with SpecAugment Anderson2021 35.0 (33.1 - 37.0) 26.2
Bielecki_SMSNG_task5_1 Prototypical network with knowledge distillation and attention loss Bielecki2021 8.4 (7.1 - 9.6) 52.5
Bielecki_SMSNG_task5_2 Prototypical network with knowledge distillation and attention loss Bielecki2021 5.8 (4.9 - 6.7) 51.8
Bielecki_SMSNG_task5_3 Prototypical network with knowledge distillation and attention loss Bielecki2021 8.4 (7.1 - 9.7) 51.8
Bielecki_SMSNG_task5_4 Prototypical network with knowledge distillation and attention loss Bielecki2021 5.3 (4.4 - 6.2) 51.1
Cheng_BIT_task5_1 ivector baseline Cheng2021 23.8 (21.9 - 25.7) 46.3
Cheng_BIT_task5_2 baseline_5w3s Cheng2021 12.5 (11.0 - 14.1) 47.8
Cheng_BIT_task5_3 baseline_5w5s Cheng2021 11.0 (9.4 - 12.6) 45.0
Cheng_BIT_task5_4 ivector-tripleloss baseline Cheng2021 8.0 (6.7 - 9.3) 44.9
Johannsmeier_OVGU_task5_1 Prototype Segmentation Johannsmeier2021 5.5 (4.7 - 6.4) 59.8
Johannsmeier_OVGU_task5_2 Prototype Segmentation Johannsmeier2021 4.5 (3.7 - 5.4) 56.0
Johannsmeier_OVGU_task5_3 Prototype Segmentation Johannsmeier2021 15.2 (13.7 - 16.7) 58.6
Johannsmeier_OVGU_task5_4 Prototype Segmentation Johannsmeier2021 7.1 (5.9 - 8.3) 58.8
zhang_uestc_task5_1 dcase2021-t5 prototypical network Zhang2021 9.0 (7.8 - 10.2) 52.9
zhang_uestc_task5_2 dcase2021-t5 prototypical network Zhang2021 8.3 (7.1 - 9.4) 53.8
zhang_uestc_task5_3 dcase2021-t5 prototypical network Zhang2021 16.8 (15.5 - 18.2) 54.4
zhang_uestc_task5_4 dcase2021-t5 prototypical network Zhang2021 7.2 (6.0 - 8.4) 57.1
Zou_PKU_task5_1 TIM Zou2021 33.2 (31.0 - 35.3) 55.3
Yang_PKU_task5_2 Contrast learning for few shot learning Zou2021 22.4 (20.7 - 24.1) 55.3
Zou_PKU_task5_3 TIM-ML Zou2021 38.4 (36.2 - 40.6) 55.3
Zou_PKU_task5_4 TIM-ML2 Zou2021 33.7 (31.7 - 35.8) 55.3
Tang_SHNU_task5_1 SHNU1 Tang2021 36.5 (34.0 - 38.9) 54.7
Tang_SHNU_task5_2 SHNU2 Tang2021 35.1 (31.7 - 38.4) 51.7
Tang_SHNU_task5_3 SHNU3 Tang2021 38.3 (36.1 - 40.5) 51.4

Dataset wise metrics

Rank Submission
code
Submission
name
Technical
Report
Event-based
F-score
with 95% confidence interval
(Evaluation dataset)
Event-based
F-score
(DC dataset)
Event-based
F-score
(ME dataset)
Event-based
F-score
(ML dataset)
Baseline_TempMatch_task5_1 Baseline Template Matching 34.8 (32.6 - 37.1) 32.2 47.0 29.5
Baseline_PROTO_task5_1 Baseline Prototypical Network 20.1 (18.2 - 21.9) 8.5 72.7 55.7
Anderson_TCD_task5_1 Prototypical Network with SpecAugment Anderson2021 35.0 (33.1 - 37.0) 19.9 56.6 56.8
Bielecki_SMSNG_task5_1 Prototypical network with knowledge distillation and attention loss Bielecki2021 8.4 (7.1 - 9.6) 3.1 57.3 43.7
Bielecki_SMSNG_task5_2 Prototypical network with knowledge distillation and attention loss Bielecki2021 5.8 (4.9 - 6.7) 2.1 74.4 32.9
Bielecki_SMSNG_task5_3 Prototypical network with knowledge distillation and attention loss Bielecki2021 8.4 (7.1 - 9.7) 3.1 56.3 51.4
Bielecki_SMSNG_task5_4 Prototypical network with knowledge distillation and attention loss Bielecki2021 5.3 (4.4 - 6.2) 1.9 44.3 45.0
Cheng_BIT_task5_1 ivector baseline Cheng2021 23.8 (21.9 - 25.7) 10.6 53.5 78.8
Cheng_BIT_task5_2 baseline_5w3s Cheng2021 12.5 (11.0 - 14.1) 4.8 80.8 57.8
Cheng_BIT_task5_3 baseline_5w5s Cheng2021 11.0 (9.4 - 12.6) 4.1 75.5 56.4
Cheng_BIT_task5_4 ivector-tripleloss baseline Cheng2021 8.0 (6.7 - 9.3) 2.9 70.5 53.1
Johannsmeier_OVGU_task5_1 Prototype Segmentation Johannsmeier2021 5.5 (4.7 - 6.4) 2.0 51.4 37.3
Johannsmeier_OVGU_task5_2 Prototype Segmentation Johannsmeier2021 4.5 (3.7 - 5.4) 1.7 60.8 17.9
Johannsmeier_OVGU_task5_3 Prototype Segmentation Johannsmeier2021 15.2 (13.7 - 16.7) 6.5 64.3 35.8
Johannsmeier_OVGU_task5_4 Prototype Segmentation Johannsmeier2021 7.1 (5.9 - 8.3) 2.7 61.5 29.4
zhang_uestc_task5_1 dcase2021-t5 prototypical network Zhang2021 9.0 (7.8 - 10.2) 3.5 49.3 32.4
zhang_uestc_task5_2 dcase2021-t5 prototypical network Zhang2021 8.3 (7.1 - 9.4) 3.4 41.8 23.9
zhang_uestc_task5_3 dcase2021-t5 prototypical network Zhang2021 16.8 (15.5 - 18.2) 8.1 45.1 29.9
zhang_uestc_task5_4 dcase2021-t5 prototypical network Zhang2021 7.2 (6.0 - 8.4) 2.8 45.1 24.7
Zou_PKU_task5_1 TIM Zou2021 33.2 (31.0 - 35.3) 16.1 72.7 67.9
Yang_PKU_task5_2 Contrast learning for few shot learning Zou2021 22.4 (20.7 - 24.1) 10.3 61.0 49.9
Zou_PKU_task5_3 TIM-ML Zou2021 38.4 (36.2 - 40.6) 20.6 68.0 67.3
Zou_PKU_task5_4 TIM-ML2 Zou2021 33.7 (31.7 - 35.8) 17.3 62.8 66.4
Tang_SHNU_task5_1 SHNU1 Tang2021 36.5 (34.0 - 38.9) 22.3 48.6 59.3
Tang_SHNU_task5_2 SHNU2 Tang2021 35.1 (31.7 - 38.4) 25.5 31.7 67.2
Tang_SHNU_task5_3 SHNU3 Tang2021 38.3 (36.1 - 40.5) 25.6 61.5 43.3

Teams ranking

Table including only the best performing system per submitting team.

Rank Submission
code
Submission
name
Technical
Report
Event-based
F-score
with 95% confidence interval
(Evaluation dataset)
Event-based
F-score
(Development dataset)
Baseline_TempMatch_task5_1 Baseline Template Matching 34.8 (32.6 - 37.1) 2.0
Baseline_PROTO_task5_1 Baseline Prototypical Network 20.1 (18.2 - 21.9) 41.5
Anderson_TCD_task5_1 Prototypical Network with SpecAugment Anderson2021 35.0 (33.1 - 37.0) 26.2
Bielecki_SMSNG_task5_3 Prototypical network with knowledge distillation and attention loss Bielecki2021 8.4 (7.1 - 9.7) 51.8
Cheng_BIT_task5_1 ivector baseline Cheng2021 23.8 (21.9 - 25.7) 46.3
Johannsmeier_OVGU_task5_3 Prototype Segmentation Johannsmeier2021 15.2 (13.7 - 16.7) 58.6
zhang_uestc_task5_3 dcase2021-t5 prototypical network Zhang2021 16.8 (15.5 - 18.2) 54.4
Zou_PKU_task5_3 TIM-ML Zou2021 38.4 (36.2 - 40.6) 55.3
Tang_SHNU_task5_3 SHNU3 Tang2021 38.3 (36.1 - 40.5) 51.4

System characteristics

General characteristics

Rank Code Technical
Report
Event-based
F-score
with 95% confidence interval
(Evaluation dataset)
Sampling
rate
Data
augmentation
Features
Baseline_TempMatch_task5_1 34.8 (32.6 - 37.1) any spectrogram
Baseline_PROTO_task5_1 20.1 (18.2 - 21.9) 22.05 KHz PCEN
Anderson_TCD_task5_1 Anderson2021 35.0 (33.1 - 37.0) 22.05 KHz time warping, time masking, frequency masking PCEN, Mel Spectrogram
Bielecki_SMSNG_task5_1 Bielecki2021 8.4 (7.1 - 9.6) 22.05 KHz melspectrogram time, frequency masking melspectrogram
Bielecki_SMSNG_task5_2 Bielecki2021 5.8 (4.9 - 6.7) 22.05 KHz melspectrogram time, frequences masking melspectrogram
Bielecki_SMSNG_task5_3 Bielecki2021 8.4 (7.1 - 9.7) 22.05 KHz melspectrogram time, frequency masking melspectrogram
Bielecki_SMSNG_task5_4 Bielecki2021 5.3 (4.4 - 6.2) 22.05 KHz melspectrogram time, frequency masking melspectrogram
Cheng_BIT_task5_1 Cheng2021 23.8 (21.9 - 25.7) 22.05 KHz Specaugment PCEN,i-vector
Cheng_BIT_task5_2 Cheng2021 12.5 (11.0 - 14.1) 22.05 KHz Specaugment PCEN
Cheng_BIT_task5_3 Cheng2021 11.0 (9.4 - 12.6) 22.05 KHz Specaugment PCEN
Cheng_BIT_task5_4 Cheng2021 8.0 (6.7 - 9.3) 22.05 KHz Specaugment PCEN, i-vector
Johannsmeier_OVGU_task5_1 Johannsmeier2021 5.5 (4.7 - 6.4) 22.05 KHz time stretching, pitch shifting, time shifting mel energies, PCEN
Johannsmeier_OVGU_task5_2 Johannsmeier2021 4.5 (3.7 - 5.4) 22.05 KHz time stretching, pitch shifting, time shifting mel energies, PCEN
Johannsmeier_OVGU_task5_3 Johannsmeier2021 15.2 (13.7 - 16.7) 22.05 KHz time stretching, pitch shifting, time shifting mel energies, PCEN
Johannsmeier_OVGU_task5_4 Johannsmeier2021 7.1 (5.9 - 8.3) 22.05 KHz time stretching, pitch shifting, time shifting mel energies, PCEN
zhang_uestc_task5_1 Zhang2021 9.0 (7.8 - 10.2) 25.6 KHz Specaugment PCEN
zhang_uestc_task5_2 Zhang2021 8.3 (7.1 - 9.4) 25.6 KHz Specaugment PCEN
zhang_uestc_task5_3 Zhang2021 16.8 (15.5 - 18.2) 25.6 KHz Specaugment PCEN
zhang_uestc_task5_4 Zhang2021 7.2 (6.0 - 8.4) 25.6 KHz Specaugment PCEN
Zou_PKU_task5_1 Zou2021 33.2 (31.0 - 35.3) 22.05 KHz spectrogram
Yang_PKU_task5_2 Zou2021 22.4 (20.7 - 24.1) 22.05 KHz spectrogram
Zou_PKU_task5_3 Zou2021 38.4 (36.2 - 40.6) 22.05 KHz spectrogram
Zou_PKU_task5_4 Zou2021 33.7 (31.7 - 35.8) 22.05 KHz spectrogram
Tang_SHNU_task5_1 Tang2021 36.5 (34.0 - 38.9) any Specaugment, inference-time augmentation PCEN
Tang_SHNU_task5_2 Tang2021 35.1 (31.7 - 38.4) any Specaugment, inference-time augmentation PCEN
Tang_SHNU_task5_3 Tang2021 38.3 (36.1 - 40.5) any Specaugment, inference-time augmentation PCEN



Machine learning characteristics

Rank Code Technical
Report
Event-based
F-score
(Eval)
Classifier Few-shot approach Post-processing
Baseline_TempMatch_task5_1 34.8 (32.6 - 37.1) template matching template matching peak picking, threshold
Baseline_PROTO_task5_1 20.1 (18.2 - 21.9) CNN prototypical threshold
Anderson_TCD_task5_1 Anderson2021 35.0 (33.1 - 37.0) CNN prototypical probability averaging, median filtering, minimum event length
Bielecki_SMSNG_task5_1 Bielecki2021 8.4 (7.1 - 9.6) CNN prototypical minimum time length threshold, prediction frames elongation
Bielecki_SMSNG_task5_2 Bielecki2021 5.8 (4.9 - 6.7) CNN prototypical min time length threshold, prediction frames elongation
Bielecki_SMSNG_task5_3 Bielecki2021 8.4 (7.1 - 9.7) CNN prototypical min time length threshold, prediction frames elongation
Bielecki_SMSNG_task5_4 Bielecki2021 5.3 (4.4 - 6.2) CNN prototypical min time length threshold, prediction frames elongation
Cheng_BIT_task5_1 Cheng2021 23.8 (21.9 - 25.7) CNN prototypical threshold
Cheng_BIT_task5_2 Cheng2021 12.5 (11.0 - 14.1) CNN prototypical threshold
Cheng_BIT_task5_3 Cheng2021 11.0 (9.4 - 12.6) CNN prototypical threshold
Cheng_BIT_task5_4 Cheng2021 8.0 (6.7 - 9.3) CNN prototypical threshold
Johannsmeier_OVGU_task5_1 Johannsmeier2021 5.5 (4.7 - 6.4) CNN prototypical threshold, gaussian smoothing (adaptive)
Johannsmeier_OVGU_task5_2 Johannsmeier2021 4.5 (3.7 - 5.4) CNN prototypical threshold, gaussian smoothing (adaptive)
Johannsmeier_OVGU_task5_3 Johannsmeier2021 15.2 (13.7 - 16.7) CNN prototypical threshold, gaussian smoothing (adaptive)
Johannsmeier_OVGU_task5_4 Johannsmeier2021 7.1 (5.9 - 8.3) CNN prototypical threshold, gaussian smoothing (adaptive)
zhang_uestc_task5_1 Zhang2021 9.0 (7.8 - 10.2) ResNet prototypical threshold
zhang_uestc_task5_2 Zhang2021 8.3 (7.1 - 9.4) ResNet prototypical threshold
zhang_uestc_task5_3 Zhang2021 16.8 (15.5 - 18.2) ResNet prototypical threshold
zhang_uestc_task5_4 Zhang2021 7.2 (6.0 - 8.4) ResNet prototypical threshold
Zou_PKU_task5_1 Zou2021 33.2 (31.0 - 35.3) CNN Transductive inference peak picking, threshold
Yang_PKU_task5_2 Zou2021 22.4 (20.7 - 24.1) CNN Prototypical network peak picking, threshold
Zou_PKU_task5_3 Zou2021 38.4 (36.2 - 40.6) CNN Transductive inference peak picking, threshold
Zou_PKU_task5_4 Zou2021 33.7 (31.7 - 35.8) CNN Transductive inference peak picking, threshold
Tang_SHNU_task5_1 Tang2021 36.5 (34.0 - 38.9) CNN prototypical peak picking, median filtering
Tang_SHNU_task5_2 Tang2021 35.1 (31.7 - 38.4) CNN prototypical peak picking, median filtering
Tang_SHNU_task5_3 Tang2021 38.3 (36.1 - 40.5) ResNet fine tuning, prototypical peak picking, median filtering

Complexity

Rank Code Technical
Report
Event-based
F-score
(Eval)
Model
complexity
Training time
Baseline_TempMatch_task5_1 34.8 (32.6 - 37.1)
Baseline_PROTO_task5_1 20.1 (18.2 - 21.9)
Anderson_TCD_task5_1 Anderson2021 35.0 (33.1 - 37.0) 132000 30m34s (Nvidia V100 (1) Intel Xeon Gold 5122 @ 3.60GHz 32GB RAM)
Bielecki_SMSNG_task5_1 Bielecki2021 8.4 (7.1 - 9.6) 813600 3h (Generation)
Bielecki_SMSNG_task5_2 Bielecki2021 5.8 (4.9 - 6.7) 1084200 3h (Generation)
Bielecki_SMSNG_task5_3 Bielecki2021 8.4 (7.1 - 9.7) 813600 3h (Generation)
Bielecki_SMSNG_task5_4 Bielecki2021 5.3 (4.4 - 6.2) 813600 3h (Generation)
Cheng_BIT_task5_1 Cheng2021 23.8 (21.9 - 25.7) 6762757 1h
Cheng_BIT_task5_2 Cheng2021 12.5 (11.0 - 14.1) 6762757 1h
Cheng_BIT_task5_3 Cheng2021 11.0 (9.4 - 12.6) 6762757 1h
Cheng_BIT_task5_4 Cheng2021 8.0 (6.7 - 9.3) 6762757 1h
Johannsmeier_OVGU_task5_1 Johannsmeier2021 5.5 (4.7 - 6.4) 389804 300 seconds (single NVIDIA Geforce 1080Ti)
Johannsmeier_OVGU_task5_2 Johannsmeier2021 4.5 (3.7 - 5.4) 389804 300 seconds (single NVIDIA Geforce 1080Ti)
Johannsmeier_OVGU_task5_3 Johannsmeier2021 15.2 (13.7 - 16.7) 389804 300 seconds (single NVIDIA Geforce 1080Ti)
Johannsmeier_OVGU_task5_4 Johannsmeier2021 7.1 (5.9 - 8.3) 1169412 900 seconds (single NVIDIA Geforce 1080Ti), 300 seconds (3GPUs parallel training)
zhang_uestc_task5_1 Zhang2021 9.0 (7.8 - 10.2) 2889984
zhang_uestc_task5_2 Zhang2021 8.3 (7.1 - 9.4) 2889984
zhang_uestc_task5_3 Zhang2021 16.8 (15.5 - 18.2) 2889984
zhang_uestc_task5_4 Zhang2021 7.2 (6.0 - 8.4) 2889984
Zou_PKU_task5_1 Zou2021 33.2 (31.0 - 35.3) 468627 403.5 seconds
Yang_PKU_task5_2 Zou2021 22.4 (20.7 - 24.1) 464531 403.5 seconds
Zou_PKU_task5_3 Zou2021 38.4 (36.2 - 40.6) 468627 403.5 seconds
Zou_PKU_task5_4 Zou2021 33.7 (31.7 - 35.8) 468627 403.5 seconds
Tang_SHNU_task5_1 Tang2021 36.5 (34.0 - 38.9) 2950000 1h (GeForce RTX 2080 Ti)
Tang_SHNU_task5_2 Tang2021 35.1 (31.7 - 38.4) 2950000 45 min (GeForce RTX 2080 Ti)
Tang_SHNU_task5_3 Tang2021 38.3 (36.1 - 40.5) 4750000 45 min (GeForce RTX 2080 Ti)

Technical reports

Bioacoustic Event Detection with Prototypical Networks and Data Augmentation

Mark Anderson and Naomi Harte
Trinity College Dublin, SIGMEDIA, Dublin, Ireland

Abstract

This report presents deep learning and data augmentation techniques used by a system entered into the Few-Shot Bioacoustic Event Detection for the DCASE2021 Challenge. The remit was to develop a few-shot learning system for animal (mammal and bird) vocalisations. Participants were tasked with developing a method that can extract information from five exemplar vocalisations, or shots, of mammals or birds and detect and classify sounds in field recordings. In the system described in this report, prototypical networks are used to learn a metric space, from which classification is performed by computing the distance of a query point to class prototypes, classifying based on shortest distance. We describe the architecture of this network, feature extraction methods, and data augmentation performed on the given dataset and compare our work to the challenge's baseline networks

System characteristics
Data augmentation time warping, time masking, frequency masking
System embeddings False
Subsystem count False
External data usage False
PDF

FEW-SHOT BIOACOUSTIC EVENT DETECTION WITH PROTOTYPICAL NETWORKS , KNOWLEDGE DISTILLATION AND ATTENTION TRANSFER LOSS

Radoslaw Bielecki
Audio Intelligence, Samsung R&D Institute, Warsaw, Poland

Abstract

The report presents the results of submission to Task 5 (Few-shot Bioacoustics Event Detection) of Detection and Classification of Acoustic Scenes and Events Challenge (DCASE) 2021. This task focuses on sound event detection in a few-shot learning setting for animal (mammal and bird) vocalizations. Main issue of this task is the very limited number of training instances. The presented approach is based on prototypical networks built up from the convolutional layers. Main techniques used during model development are knowledge distillation, attention transfer loss and spectrogram augmentation. The best of presented models achieved 55.5% F-measure on the challenge validation set. That is improvement by over 10% in comparison to baseline model.

System characteristics
Data augmentation melspectrogram time masking, frequency masking
System embeddings False
Subsystem count False
External data usage directly as additional training data
PDF

PROTOTYPICAL NETWORK FOR BIOACOUSTIC EVENT DETECTION VIA I-VECTORS

Hao Cheng
Beijing Institute of Technology, School Of Information And Electronics, Beijing, China

Abstract

In this technical report, we present our system for the task 5 of Detection and Classification of Acoustic Scenes and Events 2021 (DCASE2021) challenge, i.e. few-shot bioacoustic event detection. First, per-channel energy normalization (PCEN) and i-vectors are extracted as features. In order to improve the diversity of original audio, some data augmentation methods are adopted, for example, specaugment. Then, the prototypical network with convolutional neural networks (CNN) is used for few-shot detection. Finally, we use aforementioned features as inputs to train our CNN model. We evaluate the proposed systems with overall F-measure for the whole of the evaluation set, and our best F-measure score on the validation set is 46.28.

System characteristics
Data augmentation Specaugment
System embeddings False
Subsystem count False
External data usage False
PDF

FEW-SHOT BIOACOUSTIC EVENT DETECTION VIA SEGMENTATION USING PROTOTYPICAL NETWORKS

Jens Johannsmeier and Sebastian Stober
Otto-von-Guericke-Universität Magdeburg, Faculty of Computer Science, Magdeburg, Germany

Abstract

This report describes our submission to task 5 of the 2021 DCASE challenge. We detail how we processed the data, the model structure as well as the training procedure. We may submit an extended version to the DCASE 2021 workshop.

System characteristics
Data augmentation time stretching, pitch shifting, time shifting
System embeddings False
Subsystem count False
External data usage False
PDF

TWO IMPROVED ARCHITECTURES BASED ON PROTOTYPE NETWORK FOR FEW-SHOT BIOACOUSTIC EVENT DETECTION

Tiantian Tang and Yunhao Liang and Yanhua Long
Shanghai Normal University, The College of Information, Mechanical and Electrical Engineering, Shanghai, China

Abstract

In this technical report, we describe our submission system for DCASE2021 Task5:few-shot bioacoustic event detection. Few improvements are investigated to better the baseline of deep learn- ing prototypical network. Including the N-way 5-shot classification prototypical network training strategy, data augmentation techniques, the proposed embedding propagation and attention similarity approaches. On the official validation set, we demonstrate that the proposed method achieves the overall F-measure score of 54.7% on the validation set.

System characteristics
Data augmentation Specaugment, inference-time augmentation
System embeddings False
Subsystem count 5
External data usage AudioSet
PDF

FEW-SHOT BIOACOUSTIC EVENT DETECTION USING PROTOTYPICAL NETWORK WITH BACKGROUND CLASSs

Yue Zhang and Jun Wang and Dawei Zhang and Feng Deng
University of Electronic Science and Technology of China,ChengDu, China

Abstract

Few-shot bioacoustic event detection is a task to detect and classify bioacoustic events with only a few instances. This task was firstly introduced in DCASE2021 Task 5, which requires participants to create a method that can extract information from five sample sounds (shots) of mammals or birds, and detect sounds in field recordings. In this paper, a prototypical network-based method was proposed for few-shot bioacoustic event detection challenge. In order to detect the target event from the query sequence, we need to distinguish the target event, other events, and background noise with only a few support set. To solve this problem, we propose to sample background noise from the training dataset as the ”NEG” class for small sample learning. To better distinguish between events and background noise, the ”NEG” class is used as a ”way” in each episode of training. Experimental results show that the proposed method can effectively distinguish target events and background noise. The F-measure of sound event detection(SED) in the DCASE2021 Task 5 dataset can reach 57.10%, which is higher than the baseline method(41.48%).

System characteristics
Data augmentation Specaugment
System embeddings False
Subsystem count False
External data usage False
PDF

FEW-SHOT BIOACOUSTIC EVENT DETECTION = A GOOD TRANSDUCTIVE INFERENCE IS ALL YOU NEED

Dongchao Yang and Helin Wang and Zhongjie Ye and Yuexian Zou
Peking University, Shcool of ECE, Shenzhen,China

Abstract

In this technical report, we describe our few-shot bioacoustic event detection methods submitted to Detection and Classification of Acoustic Scenes and Events Challenge 2021 Task 5. We analyze the reason why Prototypical networks cannot perform well, and propose to use transductive inference for few shot learning. Our method maximizes the mutual information between the query features and their label predictions for a given few-shot task, in con- junction with a supervision loss based on the support set. Furthermore, we propose a mutual learning framework, which makes feature extractor and classifier to help each other. Experimental results indicate our transductive inference method get better performance than baseline, and F1 score is about 50.8% on evaluation set. Furthermore, our mutual learning framework brings about 5% improvement over the transductive inference method. We will release our code on https://github.com/yangdongchao/ DCASE2021Task5.

System characteristics
Data augmentation False
System embeddings False
Subsystem count False
External data usage False
PDF