Few-shot Bioacoustic Event Detection


Challenge results

Task description

This challenge focuses on sound event detection in a few-shot learning setting for animal (mammal and bird) vocalisations. Participants will be expected to create a method that can extract information from five exemplar vocalisations (shots) of mammals or birds and detect and classify sounds in field recordings. The main objective is to find reliable algorithms that are capable of dealing with data sparsity, class imbalance, and noisy/busy environments.

More detailed task description can be found in the task description page

Systems ranking

Rank Submission
code
Submission
name
Technical
Report
Event-based
F-score
with 95% confidence interval
(Evaluation dataset)
Event-based
F-score
(Validation dataset)
Baseline_TempMatch_task5_1 Baseline Template Matching 12.3 (11.5 - 12.8) 3.4
Baseline_PROTO_task5_1 Baseline Prototypical Network 5.3 ( - )
Wu_SHNU_task5_1 Continual_learning Wu2022 40.9 (40.5 - 41.3) 53.9
Zhang_CQU_task5_1 Zhang_CQU_task5_1 Zhang2022 1.2 (0.9 - 1.3) 46.5
Zhang_CQU_task5_2 Zhang_CQU_task5_2 Zhang2022 0.9 (0.0 - 1.0) 45.5
Zhang_CQU_task5_3 Zhang_CQU_task5_3 Zhang2022 1.9 (1.0 - 2.0) 44.2
Zhang_CQU_task5_4 Zhang_CQU_task5_4 Zhang2022 4.3 (3.7 - 4.6) 44.2
Kang_ET_task5_1 FewShot_using_good_embedding_model Kang2022 2.4 (2.4 - 2.4)
Kang_ET_task5_2 FewShot_using_good_embedding_model Kang2022 2.8 (2.8 - 2.9)
Hertkorn_ZF_task5_1 ZF_CNN1 Hertkorn2022 43.4 (42.9 - 43.8) 60.6
Hertkorn_ZF_task5_2 ZF_CNN2 Hertkorn2022 44.4 (45.0 - 45.4) 61.8
Hertkorn_ZF_task5_3 ZF_CNN3 Hertkorn2022 41.4 (41.9 - 42.3) 67.9
Hertkorn_ZF_task5_4 ZF_CNN4 Hertkorn2022 33.8 (32.4 - 34.6) 60.5
Zou_PKU_task5_1 TI_1 Yang2022 19.2 (18.9 - 19.5) 52.0
Zou_PKU_task5_2 TI_2 Yang2022 18.7 (18.4 - 19.0) 52.0
Zou_PKU_task5_3 TI_3 Yang2022 18.9 (18.6 - 19.2) 52.0
Zou_PKU_task5_4 TI_4 Yang2022 15.8 (15.4 - 16.1) 52.0
Tan_WHU_task5_1 Knowledge trasnfer 75% training 10 iteration adaptive (8) Tan2022 8.1 (7.3 - 8.5) 52.4
Tan_WHU_task5_2 Knowledge transfer 90% training 15 iteration Tan2022 16.9 (16.4 - 17.2) 53.9
Tan_WHU_task5_3 Knowledge Transfer 90 training (4) Tan2022 17.1 (16.7 - 17.4) 54.9
Tan_WHU_task5_4 Knowledge Transfer 90 training adaptive (4) Tan2022 17.2 (16.8 - 17.6) 54.5
Liu_BIT-SRCB_task5_1 TI-PN ensemble Liu2022 44.1 (43.6 - 44.5) 61.2
Liu_BIT-SRCB_task5_2 TI-PN ensemble_2 Liu2022 41.9 (41.6 - 42.2) 63.3
Liu_BIT-SRCB_task5_3 TI_scalable Liu2022 36.8 (36.5 - 37.2) 43.5
Liu_BIT-SRCB_task5_4 pretrained TI-PN ensemble Liu2022 44.3 (43.9 - 44.6) 64.8
Willbo_RISE_task5_1 willbo_supervised_1 Willbo2022 17.9 (17.6 - 18.2) 51.4
Willbo_RISE_task5_2 willbo_supervised_2 Willbo2022 20.4 (20.1 - 20.7) 57.5
Willbo_RISE_task5_3 willbo_semi_1 Willbo2022 20.2 (19.9 - 20.5) 50.8
Willbo_RISE_task5_4 willbo_semi_2 Willbo2022 21.7 (21.3 - 22.0) 47.9
ZGORZYNSKI_SRPOL_task5_1 Siamese Network with fully connected head Zgorzynski2022 28.1 (27.6 - 28.5) 67.3
ZGORZYNSKI_SRPOL_task5_2 Siamese Network with fully connected head Zgorzynski2022 16.3 (15.1 - 16.9) 59.4
ZGORZYNSKI_SRPOL_task5_3 Siamese Network with fully connected head Zgorzynski2022 29.9 (29.3 - 30.3) 60.0
ZGORZYNSKI_SRPOL_task5_4 Siamese Network with fully connected head Zgorzynski2022 33.2 (32.7 - 33.7) 57.2
Huang_SCUT_task5_1 Transductive learning and modified central difference convolution Huang2022 18.3 (18.0 - 18.6) 54.6
Martinsson_RISE_task5_1 Adaptive prototypical ensemble Martinsson2022 48.0 (47.5 - 48.4) 60.0
Martinsson_RISE_task5_2 Adaptive prototypical ensemble Martinsson2022 45.4 (44.9 - 45.9) 30.6
Martinsson_RISE_task5_3 Adaptive prototypical ensemble Martinsson2022 19.4 (18.6 - 20.0) 44.6
Martinsson_RISE_task5_4 Adaptive prototypical ensemble Martinsson2022 32.5 (31.7 - 33.1) 13.3
Liu_Surrey_task5_1 Haohe_Liu_S1 Liu2022a 43.1 (42.7 - 43.4) 58.5
Liu_Surrey_task5_2 Haohe_Liu_S2 Liu2022a 48.2 (48.5 - 48.9) 50.0
Liu_Surrey_task5_3 Haohe_Liu_S3 Liu2022a 36.9 (36.5 - 37.2) 40.7
Liu_Surrey_task5_4 Haohe_Liu_S4 Liu2022a 45.5 (45.8 - 46.2) 60.2
Li_QMUL_task5_1 Prototypical Network with ResNet and SpecAugment Li2022 15.5 (15.2 - 15.8) 47.9
Mariajohn_DSPC_task5_1 Prototypical-1 Mariajohn2022 25.7 (25.4 - 25.9) 43.9
Du_NERCSLIP_task5_1 Segment-level embedding learning Du2022a 36.5 (35.6 - 37.0) 68.2
Du_NERCSLIP_task5_2 Frame-level embedding learning 1 Du2022a 60.2 (59.7 - 61.7) 74.4
Du_NERCSLIP_task5_3 event filtering Du2022a 42.9 (42.4 - 43.4) 53.4
Du_NERCSLIP_task5_4 Frame-level embedding learning 2 Du2022a 60.0 (58.5 - 61.5) 74.4

Dataset wise metrics

Rank Submission
code
Submission
name
Technical
Report
Event-based
F-score
with 95% confidence interval
(Evaluation dataset)
Event-based
F-score
(CHE dataset)
Event-based
F-score
(CT dataset)
Event-based
F-score
(MGE dataset)
Event-based
F-score
(MS dataset)
Event-based
F-score
(QU dataset)
Event-based
F-score
(DC dataset)
Baseline_TempMatch_task5_1 Baseline Template Matching 12.3 (11.5 - 12.8) 21.1 7.1 44.1 8.0 9.7 35.0
Baseline_PROTO_task5_1 Baseline Prototypical Network 5.3 ( - ) 42.6 8.0 3.8 11.6 1.6 40.1
Wu_SHNU_task5_1 Continual_learning Wu2022 40.9 (40.5 - 41.3) 65.0 37.2 38.2 38.9 38.1 44.8
Zhang_CQU_task5_1 Zhang_CQU_task5_1 Zhang2022 1.2 (0.9 - 1.3) 30.3 24.6 5.8 1.1 0.3 25.4
Zhang_CQU_task5_2 Zhang_CQU_task5_2 Zhang2022 0.9 (0.0 - 1.0) 26.8 38.3 11.1 0.2 14.6 9.1
Zhang_CQU_task5_3 Zhang_CQU_task5_3 Zhang2022 1.9 (1.0 - 2.0) 29.2 26.0 55.6 0.4 15.8 17.5
Zhang_CQU_task5_4 Zhang_CQU_task5_4 Zhang2022 4.3 (3.7 - 4.6) 29.6 17.6 55.3 0.9 18.2 30.2
Kang_ET_task5_1 FewShot_using_good_embedding_model Kang2022 2.4 (2.4 - 2.4) 11.0 0.7 3.3 3.5 4.3 4.7
Kang_ET_task5_2 FewShot_using_good_embedding_model Kang2022 2.8 (2.8 - 2.9) 8.7 0.9 3.3 3.9 5.3 4.7
Hertkorn_ZF_task5_1 ZF_CNN1 Hertkorn2022 43.4 (42.9 - 43.8) 70.2 37.8 68.4 64.1 22.5 51.2
Hertkorn_ZF_task5_2 ZF_CNN2 Hertkorn2022 44.4 (45.0 - 45.4) 70.3 37.1 63.8 58.6 25.9 57.4
Hertkorn_ZF_task5_3 ZF_CNN3 Hertkorn2022 41.4 (41.9 - 42.3) 66.7 40.0 76.4 74.0 18.2 57.9
Hertkorn_ZF_task5_4 ZF_CNN4 Hertkorn2022 33.8 (32.4 - 34.6) 64.6 15.0 84.9 71.0 21.5 58.8
Zou_PKU_task5_1 TI_1 Yang2022 19.2 (18.9 - 19.5) 33.4 22.8 59.7 44.0 6.8 22.9
Zou_PKU_task5_2 TI_2 Yang2022 18.7 (18.4 - 19.0) 32.9 22.6 60.7 42.7 6.6 22.4
Zou_PKU_task5_3 TI_3 Yang2022 18.9 (18.6 - 19.2) 30.9 24.0 60.9 43.8 6.7 22.1
Zou_PKU_task5_4 TI_4 Yang2022 15.8 (15.4 - 16.1) 43.8 9.3 57.2 30.9 6.3 31.4
Tan_WHU_task5_1 Knowledge trasnfer 75% training 10 iteration adaptive (8) Tan2022 8.1 (7.3 - 8.5) 39.0 43.9 2.4 10.3 15.0 12.7
Tan_WHU_task5_2 Knowledge transfer 90% training 15 iteration Tan2022 16.9 (16.4 - 17.2) 31.5 32.8 8.0 15.3 15.4 39.8
Tan_WHU_task5_3 Knowledge Transfer 90 training (4) Tan2022 17.1 (16.7 - 17.4) 25.5 40.3 8.4 15.7 18.0 28.6
Tan_WHU_task5_4 Knowledge Transfer 90 training adaptive (4) Tan2022 17.2 (16.8 - 17.6) 26.2 40.3 8.4 15.7 18.0 29.6
Liu_BIT-SRCB_task5_1 TI-PN ensemble Liu2022 44.1 (43.6 - 44.5) 54.6 45.7 47.3 51.5 32.4 48.5
Liu_BIT-SRCB_task5_2 TI-PN ensemble_2 Liu2022 41.9 (41.6 - 42.2) 54.6 56.3 47.3 51.5 24.0 48.5
Liu_BIT-SRCB_task5_3 TI_scalable Liu2022 36.8 (36.5 - 37.2) 52.2 41.0 51.6 49.3 22.2 33.6
Liu_BIT-SRCB_task5_4 pretrained TI-PN ensemble Liu2022 44.3 (43.9 - 44.6) 54.6 45.0 48.0 53.9 32.5 47.7
Willbo_RISE_task5_1 willbo_supervised_1 Willbo2022 17.9 (17.6 - 18.2) 43.8 19.1 24.6 20.9 12.2 12.8
Willbo_RISE_task5_2 willbo_supervised_2 Willbo2022 20.4 (20.1 - 20.7) 47.1 17.4 31.1 21.4 12.2 21.9
Willbo_RISE_task5_3 willbo_semi_1 Willbo2022 20.2 (19.9 - 20.5) 44.0 14.8 24.8 24.9 13.9 22.1
Willbo_RISE_task5_4 willbo_semi_2 Willbo2022 21.7 (21.3 - 22.0) 48.8 14.9 31.1 25.9 13.9 25.5
ZGORZYNSKI_SRPOL_task5_1 Siamese Network with fully connected head Zgorzynski2022 28.1 (27.6 - 28.5) 51.0 52.9 13.9 33.4 27.4 33.7
ZGORZYNSKI_SRPOL_task5_2 Siamese Network with fully connected head Zgorzynski2022 16.3 (15.1 - 16.9) 51.2 39.8 4.2 48.4 34.7 46.3
ZGORZYNSKI_SRPOL_task5_3 Siamese Network with fully connected head Zgorzynski2022 29.9 (29.3 - 30.3) 49.7 23.7 15.5 60.9 35.9 41.7
ZGORZYNSKI_SRPOL_task5_4 Siamese Network with fully connected head Zgorzynski2022 33.2 (32.7 - 33.7) 58.8 31.1 19.7 41.1 38.4 40.4
Huang_SCUT_task5_1 Transductive learning and modified central difference convolution Huang2022 18.3 (18.0 - 18.6) 17.9 20.6 65.6 56.0 7.4 22.1
Martinsson_RISE_task5_1 Adaptive prototypical ensemble Martinsson2022 48.0 (47.5 - 48.4) 71.7 48.4 77.6 70.6 24.6 53.1
Martinsson_RISE_task5_2 Adaptive prototypical ensemble Martinsson2022 45.4 (44.9 - 45.9) 56.3 37.6 61.5 70.7 29.5 49.4
Martinsson_RISE_task5_3 Adaptive prototypical ensemble Martinsson2022 19.4 (18.6 - 20.0) 67.1 4.7 65.5 73.3 34.7 45.0
Martinsson_RISE_task5_4 Adaptive prototypical ensemble Martinsson2022 32.5 (31.7 - 33.1) 50.9 13.4 47.8 71.2 34.1 42.5
Liu_Surrey_task5_1 Haohe_Liu_S1 Liu2022a 43.1 (42.7 - 43.4) 81.9 58.4 46.4 48.4 22.8 52.0
Liu_Surrey_task5_2 Haohe_Liu_S2 Liu2022a 48.2 (48.5 - 48.9) 76.9 57.4 48.0 60.7 28.9 56.8
Liu_Surrey_task5_3 Haohe_Liu_S3 Liu2022a 36.9 (36.5 - 37.2) 83.0 52.2 29.1 53.5 18.5 53.7
Liu_Surrey_task5_4 Haohe_Liu_S4 Liu2022a 45.5 (45.8 - 46.2) 80.5 61.8 38.8 47.7 30.3 53.8
Li_QMUL_task5_1 Prototypical Network with ResNet and SpecAugment Li2022 15.5 (15.2 - 15.8) 39.5 35.0 11.9 17.9 6.9 30.7
Mariajohn_DSPC_task5_1 Prototypical-1 Mariajohn2022 25.7 (25.4 - 25.9) 27.4 23.6 55.4 65.5 19.4 14.9
Du_NERCSLIP_task5_1 Segment-level embedding learning Du2022a 36.5 (35.6 - 37.0) 53.6 43.9 43.0 57.5 17.7 46.7
Du_NERCSLIP_task5_2 Frame-level embedding learning 1 Du2022a 60.2 (59.7 - 61.7) 71.7 48.4 89.1 66.3 48.7 57.3
Du_NERCSLIP_task5_3 event filtering Du2022a 42.9 (42.4 - 43.4) 57.4 48.6 62.3 42.4 23.5 52.2
Du_NERCSLIP_task5_4 Frame-level embedding learning 2 Du2022a 60.0 (58.5 - 61.5) 73.3 49.6 91.3 64.4 46.3 57.7

Teams ranking

Table including only the best performing system per submitting team.

Rank Submission
code
Submission
name
Technical
Report
Event-based
F-score
with 95% confidence interval
(Evaluation dataset)
Event-based
F-score
(Development dataset)
Baseline_TempMatch_task5_1 Baseline Template Matching 12.3 (11.5 - 12.8) 3.4
Baseline_PROTO_task5_1 Baseline Prototypical Network 5.3 ( - )
Wu_SHNU_task5_1 Continual_learning Wu2022 40.9 (40.5 - 41.3) 53.9
Zhang_CQU_task5_4 Zhang_CQU_task5_4 Zhang2022 4.3 (3.7 - 4.6) 44.2
Kang_ET_task5_2 FewShot_using_good_embedding_model Kang2022 2.8 (2.8 - 2.9)
Hertkorn_ZF_task5_2 ZF_CNN2 Hertkorn2022 44.4 (45.0 - 45.4) 61.8
Zou_PKU_task5_1 TI_1 Yang2022 19.2 (18.9 - 19.5) 52.0
Tan_WHU_task5_4 Knowledge Transfer 90 training adaptive (4) Tan2022 17.2 (16.8 - 17.6) 54.5
Liu_BIT-SRCB_task5_4 pretrained TI-PN ensemble Liu2022 44.3 (43.9 - 44.6) 64.8
Willbo_RISE_task5_4 willbo_semi_2 Willbo2022 21.7 (21.3 - 22.0) 47.9
ZGORZYNSKI_SRPOL_task5_4 Siamese Network with fully connected head Zgorzynski2022 33.2 (32.7 - 33.7) 57.2
Huang_SCUT_task5_1 Transductive learning and modified central difference convolution Huang2022 18.3 (18.0 - 18.6) 54.6
Martinsson_RISE_task5_1 Adaptive prototypical ensemble Martinsson2022 48.0 (47.5 - 48.4) 60.0
Liu_Surrey_task5_2 Haohe_Liu_S2 Liu2022a 48.2 (48.5 - 48.9) 50.0
Li_QMUL_task5_1 Prototypical Network with ResNet and SpecAugment Li2022 15.5 (15.2 - 15.8) 47.9
Mariajohn_DSPC_task5_1 Prototypical-1 Mariajohn2022 25.7 (25.4 - 25.9) 43.9
Du_NERCSLIP_task5_2 Frame-level embedding learning 1 Du2022a 60.2 (59.7 - 61.7) 74.4

System characteristics

General characteristics

Rank Code Technical
Report
Event-based
F-score
with 95% confidence interval
(Evaluation dataset)
Sampling
rate
Data
augmentation
Features
Baseline_TempMatch_task5_1 12.3 (11.5 - 12.8) any spectrogram
Baseline_PROTO_task5_1 5.3 ( - ) 22.05 KHz PCEN
Wu_SHNU_task5_1 Wu2022 40.9 (40.5 - 41.3) any Time masking, Frequency masking PCEN
Zhang_CQU_task5_1 Zhang2022 1.2 (0.9 - 1.3) 22.05 KHz Spectrogram
Zhang_CQU_task5_2 Zhang2022 0.9 (0.0 - 1.0) 22.05 KHz Spectrogram
Zhang_CQU_task5_3 Zhang2022 1.9 (1.0 - 2.0) 22.05 KHz Spectrogram
Zhang_CQU_task5_4 Zhang2022 4.3 (3.7 - 4.6) 22.05 KHz Spectrogram
Kang_ET_task5_1 Kang2022 2.4 (2.4 - 2.4) 16 KHz specaugment PCEN
Kang_ET_task5_2 Kang2022 2.8 (2.8 - 2.9) 16 KHz Specaugment PCEN
Hertkorn_ZF_task5_1 Hertkorn2022 43.4 (42.9 - 43.8) any Spectrogram
Hertkorn_ZF_task5_2 Hertkorn2022 44.4 (45.0 - 45.4) any Spectrogram
Hertkorn_ZF_task5_3 Hertkorn2022 41.4 (41.9 - 42.3) any Spectrogram
Hertkorn_ZF_task5_4 Hertkorn2022 33.8 (32.4 - 34.6) any Spectrogram
Zou_PKU_task5_1 Yang2022 19.2 (18.9 - 19.5) 22.05 KHz time and frequency masking, mixup Spectrogram
Zou_PKU_task5_2 Yang2022 18.7 (18.4 - 19.0) 22.05 KHz time and frequency masking, mixup Spectrogram
Zou_PKU_task5_3 Yang2022 18.9 (18.6 - 19.2) 22.05 KHz time and frequency masking, mixup Spectrogram
Zou_PKU_task5_4 Yang2022 15.8 (15.4 - 16.1) 22.05 KHz time masking, frequency masking, mixup Spectrogram
Tan_WHU_task5_1 Tan2022 8.1 (7.3 - 8.5) 22.05 KHz PCEN
Tan_WHU_task5_2 Tan2022 16.9 (16.4 - 17.2) 22.05 KHz PCEN
Tan_WHU_task5_3 Tan2022 17.1 (16.7 - 17.4) 22.05 KHz PCEN
Tan_WHU_task5_4 Tan2022 17.2 (16.8 - 17.6) 22.05 KHz PCEN
Liu_BIT-SRCB_task5_1 Liu2022 44.1 (43.6 - 44.5) 22.05 KHz Specaugment PCEN
Liu_BIT-SRCB_task5_2 Liu2022 41.9 (41.6 - 42.2) 22.05 KHz Specaugment PCEN
Liu_BIT-SRCB_task5_3 Liu2022 36.8 (36.5 - 37.2) 22.05 KHz PCEN
Liu_BIT-SRCB_task5_4 Liu2022 44.3 (43.9 - 44.6) 22.05 KHz Specaugment PCEN
Willbo_RISE_task5_1 Willbo2022 17.9 (17.6 - 18.2) any Mel-spectrogram, PCEN
Willbo_RISE_task5_2 Willbo2022 20.4 (20.1 - 20.7) any Mel-spectrogram, PCEN
Willbo_RISE_task5_3 Willbo2022 20.2 (19.9 - 20.5) any Mel-spectrogram, PCEN
Willbo_RISE_task5_4 Willbo2022 21.7 (21.3 - 22.0) any Mel-spectrogram, PCEN
ZGORZYNSKI_SRPOL_task5_1 Zgorzynski2022 28.1 (27.6 - 28.5) 48 KHz Noise mixing, Random Crop Mel-spectrogram, PCEN
ZGORZYNSKI_SRPOL_task5_2 Zgorzynski2022 16.3 (15.1 - 16.9) 48 KHz Noise mixing Mel-spectrogram
ZGORZYNSKI_SRPOL_task5_3 Zgorzynski2022 29.9 (29.3 - 30.3) 48 KHz Noise mixing Mel-spectrogram
ZGORZYNSKI_SRPOL_task5_4 Zgorzynski2022 33.2 (32.7 - 33.7) 48 KHz Noise mixing Mel-spectrogram
Huang_SCUT_task5_1 Huang2022 18.3 (18.0 - 18.6) 22.05 KHz Specaugment PCEN
Martinsson_RISE_task5_1 Martinsson2022 48.0 (47.5 - 48.4) 22.05 KHz Log-Mel energies, PCEN
Martinsson_RISE_task5_2 Martinsson2022 45.4 (44.9 - 45.9) 22.05 KHz Log-Mel energies, PCEN
Martinsson_RISE_task5_3 Martinsson2022 19.4 (18.6 - 20.0) 22.05 KHz PCEN
Martinsson_RISE_task5_4 Martinsson2022 32.5 (31.7 - 33.1) 22.05 KHz PCEN
Liu_Surrey_task5_1 Liu2022a 43.1 (42.7 - 43.4) 22.05 KHz Dynamic dataloader PCEN, Delta-MFCC
Liu_Surrey_task5_2 Liu2022a 48.2 (48.5 - 48.9) 22.05 KHz Dynamic dataloader PCEN, Delta-MFCC
Liu_Surrey_task5_3 Liu2022a 36.9 (36.5 - 37.2) 22.05 KHz Dynamic dataloader PCEN, Delta-MFCC
Liu_Surrey_task5_4 Liu2022a 45.5 (45.8 - 46.2) 22.05 KHz Dynamic dataloader PCEN, Delta-MFCC
Li_QMUL_task5_1 Li2022 15.5 (15.2 - 15.8) any time masking, frequency masking, time warping PCEN, Spectrogram
Mariajohn_DSPC_task5_1 Mariajohn2022 25.7 (25.4 - 25.9) any time shifting, segment level mirroring Log-Mel spectrogram
Du_NERCSLIP_task5_1 Du2022a 36.5 (35.6 - 37.0) 22.05 KHz SpecAugment PCEN
Du_NERCSLIP_task5_2 Du2022a 60.2 (59.7 - 61.7) 22.05 KHz PCEN
Du_NERCSLIP_task5_3 Du2022a 42.9 (42.4 - 43.4) 22.05 KHz PCEN
Du_NERCSLIP_task5_4 Du2022a 60.0 (58.5 - 61.5) 22.05 KHz PCEN



Machine learning characteristics

Rank Code Technical
Report
Event-based
F-score
(Eval)
Classifier Few-shot approach Post-processing
Baseline_TempMatch_task5_1 12.3 (11.5 - 12.8) template matching template matching peak picking, threshold
Baseline_PROTO_task5_1 5.3 ( - ) ResNet prototypical threshold
Wu_SHNU_task5_1 Wu2022 40.9 (40.5 - 41.3) Continual Learning prototypical, weight generator threshold
Zhang_CQU_task5_1 Zhang2022 1.2 (0.9 - 1.3) CNN prototypical peak picking, threshold
Zhang_CQU_task5_2 Zhang2022 0.9 (0.0 - 1.0) CNN prototypical peak picking, threshold
Zhang_CQU_task5_3 Zhang2022 1.9 (1.0 - 2.0) CNN prototypical peak picking, threshold
Zhang_CQU_task5_4 Zhang2022 4.3 (3.7 - 4.6) CNN prototypical peak picking, threshold
Kang_ET_task5_1 Kang2022 2.4 (2.4 - 2.4) TDNN Fine tuning
Kang_ET_task5_2 Kang2022 2.8 (2.8 - 2.9) TDNN Fine tuning
Hertkorn_ZF_task5_1 Hertkorn2022 43.4 (42.9 - 43.8) CNN threshold, duration threshold, event stitching
Hertkorn_ZF_task5_2 Hertkorn2022 44.4 (45.0 - 45.4) CNN threshold, duration threshold, event stitching
Hertkorn_ZF_task5_3 Hertkorn2022 41.4 (41.9 - 42.3) CNN threshold, duration threshold, event stitching
Hertkorn_ZF_task5_4 Hertkorn2022 33.8 (32.4 - 34.6) CNN threshold, duration threshold, event stitching
Zou_PKU_task5_1 Yang2022 19.2 (18.9 - 19.5) CNN prototypical threshold, peak picking
Zou_PKU_task5_2 Yang2022 18.7 (18.4 - 19.0) CNN prototypical threshold, peak picking
Zou_PKU_task5_3 Yang2022 18.9 (18.6 - 19.2) CNN prototypical threshold, peak picking
Zou_PKU_task5_4 Yang2022 15.8 (15.4 - 16.1) CNN prototypical threshold, peak picking
Tan_WHU_task5_1 Tan2022 8.1 (7.3 - 8.5) CNN prototypical, transductive inference threshold, minimum event length
Tan_WHU_task5_2 Tan2022 16.9 (16.4 - 17.2) CNN prototypical, transductive inference threshold
Tan_WHU_task5_3 Tan2022 17.1 (16.7 - 17.4) CNN prototypical, transductive inference threshold
Tan_WHU_task5_4 Tan2022 17.2 (16.8 - 17.6) CNN prototypical, transductive inference threshold, minimum event length
Liu_BIT-SRCB_task5_1 Liu2022 44.1 (43.6 - 44.5) CNN prototypical, transductive inference peak picking, threshold, VAD
Liu_BIT-SRCB_task5_2 Liu2022 41.9 (41.6 - 42.2) CNN prototypical, transductive inference peak picking, threshold, VAD
Liu_BIT-SRCB_task5_3 Liu2022 36.8 (36.5 - 37.2) CNN Transductive inference peak picking, threshold
Liu_BIT-SRCB_task5_4 Liu2022 44.3 (43.9 - 44.6) CNN prototypical, transductive inference peak picking, threshold, VAD
Willbo_RISE_task5_1 Willbo2022 17.9 (17.6 - 18.2) ResNet prototypical median filtering, minimum event length, threshold
Willbo_RISE_task5_2 Willbo2022 20.4 (20.1 - 20.7) ResNet prototypical, threshold fitting median filtering, minimum event length, threshold
Willbo_RISE_task5_3 Willbo2022 20.2 (19.9 - 20.5) ResNet prototypical median filtering, minimum event length, threshold
Willbo_RISE_task5_4 Willbo2022 21.7 (21.3 - 22.0) ResNet prototypical, threshold fitting median filtering, minimum event length, threshold
ZGORZYNSKI_SRPOL_task5_1 Zgorzynski2022 28.1 (27.6 - 28.5) CNN Siamese network with fully connected head, fine tuning peak picking, threshold, Gaussian filter
ZGORZYNSKI_SRPOL_task5_2 Zgorzynski2022 16.3 (15.1 - 16.9) CNN Siamese network with fully connected head, fine tuning threshold, Gaussian filter
ZGORZYNSKI_SRPOL_task5_3 Zgorzynski2022 29.9 (29.3 - 30.3) CNN Siamese network with fully connected head, fine tuning threshold, Gaussian filter
ZGORZYNSKI_SRPOL_task5_4 Zgorzynski2022 33.2 (32.7 - 33.7) CNN Siamese network with fully connected head, fine tuning threshold, Gaussian filter
Huang_SCUT_task5_1 Huang2022 18.3 (18.0 - 18.6) transductive learning transductive learning peak picking, threshold
Martinsson_RISE_task5_1 Martinsson2022 48.0 (47.5 - 48.4) Ensemble, CNN prototypical, input size threshold, merging, filter too small, filter too big
Martinsson_RISE_task5_2 Martinsson2022 45.4 (44.9 - 45.9) Ensemble, CNN prototypical, input size threshold, merging, filter too small, filter too big
Martinsson_RISE_task5_3 Martinsson2022 19.4 (18.6 - 20.0) CNN prototypical threshold, merging, filter too small, filter too big
Martinsson_RISE_task5_4 Martinsson2022 32.5 (31.7 - 33.1) CNN prototypical threshold, merging, filter too small, filter too big
Liu_Surrey_task5_1 Liu2022a 43.1 (42.7 - 43.4) CNN, ensemble prototypical threshold, filter by length, split long, remove long
Liu_Surrey_task5_2 Liu2022a 48.2 (48.5 - 48.9) CNN prototypical threshold, filter by length, remove long, padding
Liu_Surrey_task5_3 Liu2022a 36.9 (36.5 - 37.2) CNN prototypical threshold, filter by length, split long, remove long, merge short, padding
Liu_Surrey_task5_4 Liu2022a 45.5 (45.8 - 46.2) CNN prototypical threshold, filter by length, remove long
Li_QMUL_task5_1 Li2022 15.5 (15.2 - 15.8) CNN prototypical peak picking, threshold
Mariajohn_DSPC_task5_1 Mariajohn2022 25.7 (25.4 - 25.9) CNN prototypical threshold
Du_NERCSLIP_task5_1 Du2022a 36.5 (35.6 - 37.0) CNN fine tuning peak picking, threshold
Du_NERCSLIP_task5_2 Du2022a 60.2 (59.7 - 61.7) CNN fine tuning peak picking, threshold
Du_NERCSLIP_task5_3 Du2022a 42.9 (42.4 - 43.4) CNN fine tuning peak picking, threshold
Du_NERCSLIP_task5_4 Du2022a 60.0 (58.5 - 61.5) CNN fine tuning peak picking, threshold

Complexity

Rank Code Technical
Report
Event-based
F-score
(Eval)
Model
complexity
Training time
Baseline_TempMatch_task5_1 12.3 (11.5 - 12.8)
Baseline_PROTO_task5_1 5.3 ( - )
Wu_SHNU_task5_1 Wu2022 40.9 (40.5 - 41.3) 443520 2.5h
Zhang_CQU_task5_1 Zhang2022 1.2 (0.9 - 1.3) 90min
Zhang_CQU_task5_2 Zhang2022 0.9 (0.0 - 1.0) 90min
Zhang_CQU_task5_3 Zhang2022 1.9 (1.0 - 2.0) 90min
Zhang_CQU_task5_4 Zhang2022 4.3 (3.7 - 4.6) 90min
Kang_ET_task5_1 Kang2022 2.4 (2.4 - 2.4)
Kang_ET_task5_2 Kang2022 2.8 (2.8 - 2.9)
Hertkorn_ZF_task5_1 Hertkorn2022 43.4 (42.9 - 43.8) 54979 6 min/ wav file
Hertkorn_ZF_task5_2 Hertkorn2022 44.4 (45.0 - 45.4) 54979 6 min/ wav file
Hertkorn_ZF_task5_3 Hertkorn2022 41.4 (41.9 - 42.3) 54979 6 min/ wav file
Hertkorn_ZF_task5_4 Hertkorn2022 33.8 (32.4 - 34.6) 54979 6 min/ wav file
Zou_PKU_task5_1 Yang2022 19.2 (18.9 - 19.5) 468627 30 min
Zou_PKU_task5_2 Yang2022 18.7 (18.4 - 19.0) 468627 30 min
Zou_PKU_task5_3 Yang2022 18.9 (18.6 - 19.2) 468627 30 min
Zou_PKU_task5_4 Yang2022 15.8 (15.4 - 16.1) 468627 30 min
Tan_WHU_task5_1 Tan2022 8.1 (7.3 - 8.5) 700k 1h
Tan_WHU_task5_2 Tan2022 16.9 (16.4 - 17.2) 700k 1h
Tan_WHU_task5_3 Tan2022 17.1 (16.7 - 17.4) 700k 1h
Tan_WHU_task5_4 Tan2022 17.2 (16.8 - 17.6) 700k 1h
Liu_BIT-SRCB_task5_1 Liu2022 44.1 (43.6 - 44.5) 9627177 1.5h
Liu_BIT-SRCB_task5_2 Liu2022 41.9 (41.6 - 42.2) 9627177 1.5h
Liu_BIT-SRCB_task5_3 Liu2022 36.8 (36.5 - 37.2) 8757077 1.5h
Liu_BIT-SRCB_task5_4 Liu2022 44.3 (43.9 - 44.6) 9914068 1.5h
Willbo_RISE_task5_1 Willbo2022 17.9 (17.6 - 18.2)
Willbo_RISE_task5_2 Willbo2022 20.4 (20.1 - 20.7)
Willbo_RISE_task5_3 Willbo2022 20.2 (19.9 - 20.5)
Willbo_RISE_task5_4 Willbo2022 21.7 (21.3 - 22.0)
ZGORZYNSKI_SRPOL_task5_1 Zgorzynski2022 28.1 (27.6 - 28.5) 76700357 9h
ZGORZYNSKI_SRPOL_task5_2 Zgorzynski2022 16.3 (15.1 - 16.9) 76700357 9h
ZGORZYNSKI_SRPOL_task5_3 Zgorzynski2022 29.9 (29.3 - 30.3) 76700357 9h
ZGORZYNSKI_SRPOL_task5_4 Zgorzynski2022 33.2 (32.7 - 33.7) 76700357 9h
Huang_SCUT_task5_1 Huang2022 18.3 (18.0 - 18.6) 492206 50min, RTX3090
Martinsson_RISE_task5_1 Martinsson2022 48.0 (47.5 - 48.4) 25994880
Martinsson_RISE_task5_2 Martinsson2022 45.4 (44.9 - 45.9) 25994880
Martinsson_RISE_task5_3 Martinsson2022 19.4 (18.6 - 20.0) 1732992
Martinsson_RISE_task5_4 Martinsson2022 32.5 (31.7 - 33.1) 1732992
Liu_Surrey_task5_1 Liu2022a 43.1 (42.7 - 43.4) 724096 91 min, NVIDIA GeForce 3070
Liu_Surrey_task5_2 Liu2022a 48.2 (48.5 - 48.9) 724096 91 min, NVIDIA GeForce 3070
Liu_Surrey_task5_3 Liu2022a 36.9 (36.5 - 37.2) 724096 91 min, NVIDIA GeForce 3070
Liu_Surrey_task5_4 Liu2022a 45.5 (45.8 - 46.2) 724096 91 min, NVIDIA GeForce 3070
Li_QMUL_task5_1 Li2022 15.5 (15.2 - 15.8) 40 min, Colab pro Tesla p100
Mariajohn_DSPC_task5_1 Mariajohn2022 25.7 (25.4 - 25.9) 2h
Du_NERCSLIP_task5_1 Du2022a 36.5 (35.6 - 37.0) 464531 5 minutes, TeslaP40-24GB
Du_NERCSLIP_task5_2 Du2022a 60.2 (59.7 - 61.7) 469654 1 hour, TeslaV100-32GB
Du_NERCSLIP_task5_3 Du2022a 42.9 (42.4 - 43.4) 12091947 1 hour, TeslaV100-32GB
Du_NERCSLIP_task5_4 Du2022a 60.0 (58.5 - 61.5) 12091947 1 hour, TeslaV100-32GB

Technical reports

BIOACOUSTIC FEW SHOT LEARNING WITH CLASS AUGMENTATION Technical Report

Mariajohn, Aaquila

Abstract

This document details the results and techniques used for the submission for the DCASE 2022 Task 5 challenge. The goal is to identify positive shots of the required sample throughout the audio clip using few-shot learning. Prototypical networks are used for the few-shot learning training and inference models. The lack of data was compensated with augmentations.

System characteristics
Data augmentation time shifting, segment level mirroring
System embeddings False
Subsystem count False
External data usage directly as additional training data
PDF

FEW-SHOT EMBEDDING LEARNING AND EVENT FILTERING FOR BIOACOUSTIC EVENT DETECTION Technical Report

Tang,Jigang and Xueyang,Zhang and Gao,Tian and Liu,Diyuan and Fang,Xin and Pan,Jia and Wang,Qing and Du,Jan and Xu,Kele and Pan,Qinghua
iFLYTEK Research Institute

Abstract

In this technical report, we describe our submission system for DCASE2022 Task5: few-shot bioacoustic event detection.We propose several methods to improve the representational ability of embedding under limited positive samples. Including the segmentlevel and frame-level embedding learning strategy, model adaptation technology and embedding-guided event filtering approach. The event filtering task is independently trained on each test file to improve the discrimination of embeddings between similar events. The proposed system is evaluated on the official validation set, and the best overall F-measure score is 74.4%.

System characteristics
Data augmentation False
System embeddings False
Subsystem count False
External data usage False
PDF

FEW-SHOT BIOACOUSTIC EVENT DETECTION : DON ' T WASTE INFORMATION Technical Report

Hertkorn, Michael
ZF Friedrichshafen AG

Abstract

In the past a lot of attention has been dedicated into finding a good neural network architecture, mainly adopting large NN architectures found in image processing.[1] The parameters in the fixed preprocess, which usually consists of a short-time Fourier transform (STFT) and optionally adding a Mel or Mel frequency cepstral coefficient (MFCC) transformation, can be made trainable[2], however some major parameters stay fixed, like the window size and the fact that the absolute of the complex output of the Fourier transformation is calculated. Also, a learnable frontend is not desirable for a few-shot training setting. This investigation shall demonstrate the importance of choosing suitable parameters for the acoustic preprocess. In order to do this, a standard CNN with a minor tweak is used and the pretraining with training data has been skipped which means that the model is only trained on the 5 shots provided in the validation and evaluation datasets, similar to the pattern matching baseline.

System characteristics
System embeddings False
Subsystem count False
External data usage False
PDF

FEW-SHOT BIO-ACOUSTIC EVENT DETECTION BASED ON TRANSDUCTIVE LEARNING AND ADAPTED CENTRAL DIFFERENCE CONVOLUTION Technical Report

Huang, Qisheng and Li, Yanxiong and Cao, Wenchang and Chen, Hao
School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China

Abstract

In this technical report, we present our submitted system for DCASE2022 Task5: few-shot bio-acoustic event detection. Our system employs the transductive learning strategy, data augmentation and an adapted version of central difference convolution (CDC). Evaluated on the validation set, our method achieves the overall F-measure score of 41.1%.

System characteristics
Data augmentation False
System embeddings False
Subsystem count False
External data usage False
PDF

FEW-SHOT BIOACOUSTIC EVENT DETECTION USING GOOD EMBEDDING MODEL Technical report

Kang, Taein
Chung-Ang University, Seoul, South Korea

Abstract

Few-shot learning is widely used as benchmarks for meta-learning. Few-shot learning is a learning algorithm that attempts to show how quickly it adapts to test tasks with limited data. Unlike general image new-shot learning, DCASE 2022 Task 5 [1] examines whether it can detect the corresponding sound at the back of audio data when five annotations are given in audio data. In this paper, we would like to demonstrate whether an embedding model well-learned bioacoustic information can perform few-shot learning well even with a simple classifier.

System characteristics
Data augmentation Specaugment, inference-time augmentation
System embeddings False
Subsystem count 5
External data usage AudioSet
PDF

FEW-SHOT BIOACOUSTIC EVENT DETECTION USING PROTOTYPICAL NETWORKS WITH RESNET CLASSIFIER Technical Report

Li, Ren and Liang, Jinhua and Phan, Huy
Queen Mary University of London, United Kingdom

Abstract

In this technical report, we describe our submission system for the few-shot bioacoustic event detection in the DCASE2022 task5. Participants are expected to develop a few-shot learning system for detecting mammal and birds sounds from audio recordings. In our system, Prototypical Networks are used to embed spectrograms into an embedding space and learn a non-linear mapping between data samples. We leverage various data augmentation techniques on Mel-spectrograms and introduce a ResNet variant as the classifier. Our experiments demonstrate that the system can achieve the F1-score of 47.88% on the vali-dation data.

System characteristics
Data augmentation time warping, time masking, frequency masking
System embeddings False
Subsystem count False
External data usage False
PDF

BIT SRCB TEAM ' S SUBMISSION FOR DCASE2022 TASK5 - FEW-SHOT BIOACOUSTIC EVENT DETECTION Technical Report

Liu, Miao and Zhang, Jianqian and Wang, Lizhong and Peng, Jiawei and Hu, Chenguang and Li, Kaige and Wang, Jing and Ma, Qiuyue
Beijing Institute of Technology, Beijing, China,Samsung Research China-Beijing (SRC-B), Beijing, China

Abstract

In this technical report, we present our system for the task 5 of Detection and Classification of Acoustic Scenes and Events 2022 (DCASE2022) challenge, i.e. few-shot bioacoustic event detection. First, per-channel energy normalization (PCEN) is extracted as features. In order to improve the diversity of original audio, some data augmentation methods are adopted, for example, specaugment. Then, the prototypical network with convolutional neural networks (CNN) and the transductive inference method are used for few-shot detection in our systems. Finally, we use aforementioned features as inputs to train our CNN model. Moreover, we merge the prediction results of improved prototypical network and transductive inference method for better performance. We evaluate the proposed systems with overall F-measure for the whole of the evaluation set, and our best F-measure score on the validation set is 64.77%.

System characteristics
Data augmentation False
System embeddings False
Subsystem count False
External data usage False
PDF

SURREY SYSTEM FOR DCASE 2022 TASK 5 : FEW-SHOT BIOACOUSTIC EVENT DETECTION WITH SEGMENT-LEVEL METRIC LEARNING Technical Report

Liu, Haohe and Liu, Xubo and Mei, Xinhao and Kong, Qiuqiang and Wang, Wenwu and Plumbley, Mark D
University of Surrey

Abstract

Few-shot audio event detection is a task that detects the occurrence time of a novel sound class given a few examples. In this work, we propose a system based on segment-level metric learning for DCASE 2022 challenge few-shot bioacoustic event detection (task 5). We make better utilization of the negative data within each sound class to build the loss function, and use transductive inference to gain better adaptation on the evaluation set. For the input feature, we find the per-channel energy normalization concatenated with delta melfrequency cepstral coefficients to be the most effective combination. We also introduce new data augmentation and post-processing procedures for this task. Our final system achieves an f-measure of 68.74 on the DCASE task 5 validation set, outperforming the baseline performance of 29.5 by a large margin. Our system is fully open-sourced1

System characteristics
Data augmentation False
System embeddings False
Subsystem count False
External data usage False
PDF

FEW-SHOT BIOACOUSTIC EVENT DETECTION USING A PROTOTYPICAL NETWORK ENSEMBLE WITH ADAPTIVE EMBEDDING FUNCTIONS Technical Report

Martinsson, John and Willbo, Martin and Pirinen, Aleksis and Mogren, Olof and Sandsten, Maria
Computer Science, RISE Research Institutes of Sweden, Sweden, Centre for Mathematical Sciences, Lund University, Sweden

Abstract

In this report we present our method for the DCASE 2022 challenge on few-shot bioacoustic event detection. We use an ensemble of prototypical neural networks with adaptive embedding functions and show that both ensemble and adaptive embedding functions can be used to improve results from an average F-score of 41.3% to an average F-score of 60.0% on the validation dataset.

System characteristics
Data augmentation False
System embeddings False
Subsystem count False
External data usage False
PDF

A NEW TRANSDUCTIVE FRAMEWORK FOR FEW-SHOT BIOACOUSTIC EVENT DETECTION TASK Technical Report

Tan, Yizhou and Xu, Lifan and Zhu, Chenyang and Li, Shengchen and Ai, Haojun and Shao, Xi
Wuhan University, School of Cyber Science and Engineering, Wuhan, China,Xi’an Jiaotong-Liverpool University, Department of Intelligent Science School of Advanced Engineering, Suzhou, China,Jiangnan University, School of Artificial Intelligence and Computer Science,Wuxi, China, Nanjing University of Posts and Telecommunications,School of Communication and Information Engineering, Nanjing, China,

Abstract

Few-shot learning is introduced to reduce the requirements of data availability in machine learning, especially when the labelling is labour expensive. Few-shot learning algorithms usually suffer from the extraordinary feature distribution of the query class, especially in few-shot bioacoustic event detection task. In this work, Knowledge transfer technique is introduced into the transductive inference process to restrict the feature distribution of newly appeared class to a dedicated sub-space, while adapts the feature distribution for existing classes. The proposed system outperforms the traditional few-shot learning system according to the development dataset provided by bioacoustics event detection (Task 5) in DCASE data challenge 2022. The f-measure score of the validation in development dataset successfully reaches 57.40.

System characteristics
Data augmentation False
System embeddings False
Subsystem count False
External data usage False
PDF

WIDE RESNET MODELS FOR FEW-SHOT SOUND EVENT DETECTION Technical report

Willbo, Martin and Martinsson, John and Pirinen, Aleksis and Mogren, Olof
Computer Science, RISE Research Institutes of Sweden, Sweden, Centre for Mathematical Sciences, Lund University, Sweden

Abstract

In this technical report we describe our few-shot sound event detection (SED) systems used to generate predictions for the DCASE 2022 task 5 challenge. At the core of the SED systems is a wider variant of ResNet-18, i.e., each block throughout the depth of the network have more convolutional filters. In addition to this, for one of the submissions we include what we believe to be a novel approach to semi-supervised learning for prototypical networks. For both the fully supervised and semi-supervised methods we showcase the importance of calibrating the probability thresholds in the few-shot learning tasks, and provide a simple implementation of how to find these.

System characteristics
Data augmentation False
System embeddings False
Subsystem count False
External data usage False
PDF

FEW-SHOT CONTINUAL LEARNING FOR BIOACOUSTIC EVENT DETECTION Technical Report

Wu, Xiaoxiao and Long, Yanhua
Shanghai Normal University, Shanghai, China

Abstract

In this technical report, we describe our submission system for DCASE2022 Task5: few-shot bioacoustic event detection. In this submission, a few-shot continual learning framework is used for our bioacoustic event detection, where we can continuously expand a trained base classifier to detect novel classes with only few labeled data at inference time. On the official validation set, the proposed continual learning achieves the overall F-measure score of 53.876%.

System characteristics
Data augmentation Time masking,Frequency masking
System embeddings False
Subsystem count False
External data usage False
PDF

IMPROVED PROTOTYPICAL NETWORK WITH DATA AUGMENTATION Technical Report

Dongchao Yang and Helin Wang and Zhongjie Ye and Yuexian Zou
Peking University, Shcool of ECE, Shenzhen,China, Xiaomi Corporation, Beijing, China

Abstract

In this technical report, we describe our few-shot bioacoustic event detection methods submitted to Detection and Classification of Acoustic Scenes and Events Challenge 2022 Task 5. We follow our previous work, and further improve our model through data augmentation strategy. Specifically, we analyze the reason why Prototypical networks cannot perform well, and propose to use transductive inference for few shot learning. Our method maximizes the mutual information between the query features and their label predictions for a given few-shot task, in conjunction with a supervision loss based on the support set. Furthermore, we use multiple data augmentation strategies to improve the feature extractor, including time and frequency masking, mixup, and so on. Experimental results indicate our model gets better performance than baseline, and F1 score is about 51.9% on evaluation set

System characteristics
Data augmentation False
System embeddings False
Subsystem count False
External data usage False
PDF

SIAMESE NETWORK FOR FEW-SHOT BIOACOUSTIC EVENT DETECTION Technical report

Zgorzynski, Bartlomiej and Matuszewski, Mateusz
Samsung R&D Institute, Poland

System characteristics
Data augmentation False
System embeddings False
Subsystem count False
External data usage False
PDF

A META-LEARNING FRAMEWORK FOR FEW-SHOT SOUND EVENT DETECTION Technical Report

Zhang, Tianyang and Wang, Yuyang and Wang, Ying
Chongqing University, Shapingba

Abstract

The report presents our submission to Detection and Classification of Acoustic Scenes and Events challenges 2022 (DCASE2022) task 5. This task focuses on sound event detection in a few-shot learning setting for animal (mammal and bird) vocalisations. Main issue of this task is that only five exemplar vocalisations (shots) of mammals or birds are available. In this paper, we propose a metalearning framework for few-shot bioacoustic event detection challenge. Maximizing inter-class distance and minimizing intra-class distance (MIMI) are used as a criteria to fine-tune embedded network for few-shot tasks. Experimental results indicate our framework get better performance than baseline, and F1 score is about 46.51% on evaluation set.

System characteristics
System embeddings False
Subsystem count False
External data usage False
PDF