Task description
More detailed task description can be found in the task description page
Systems ranking
Rank |
Submission code |
Submission name |
Technical Report |
Event-based F-score with 95% confidence interval (Evaluation dataset) |
Event-based F-score (Validation dataset) |
---|---|---|---|---|---|
Baseline_TempMatch_task5_1 | Baseline Template Matching | 14.9 (14.0 - 15.3) | |||
Baseline_PROTO_task5_1 | Baseline Prototypical Network | 41.6 (41.0 - 42.1) | 52.1 | ||
KAO_NTHU_task5_2 | cosine similarity_Chromagram | kao_nthu2024 | 30.27 ( 29.42 - 30.81 ) | 32.2 | |
KAO_NTHU_task5_3 | cosine similarity_Mel-Spectrograms\u222AChromagram | kao_nthu2024 | 40.1 (39.2 - 40.7) | 36.2 | |
KAO_NTHU_task5_1 | cosine similarity_Mel-Spectrograms | kao_nthu2024 | 46.9 (45.6 - 47.9) | 44.2 | |
Lu_AILab_task5_1 | Lu_AILab_task5_1 | lu_ailab2024 | 56.7 (56.2 - 57.1) | 44.1 | |
Lu_AILab_task5_2 | Lu_AILab_task5_2 | lu_ailab2024 | 31.4 (31.0 - 31.7) | 56.4 | |
Lu_AILab_task5_3 | Lu_AILab_task5_3 | lu_ailab2024 | 37.7 (37.3 - 38.1) | 56.4 | |
Lu_AILab_task5_4 | Lu_AILab_task5_4 | lu_ailab2024 | 43.0 (42.3 - 43.4) | 44.1 | |
XF_NUDT_task5_1 | Unet-Frame-Level | xf_nudt2024 | 64.1 (63.8 - 64.5) | 66.7 | |
XF_NUDT_task5_4 | AAPM-Seg-Level | xf_nudt2024 | 28.4 (27.4 - 29.0) | 63.2 | |
XF_NUDT_task5_3 | pcenBase-Frame-Level | xf_nudt2024 | 61.5 (61.1 - 61.9) | 61.7 | |
XF_NUDT_task5_2 | logmelBase-Frame-Level | xf_nudt2024 | 65.2 (64.9 - 65.5) | 70.6 | |
Latifi_IDMT_task5_1 | Cross-correlation baseline | latifi_idmt2024 | 0.7 (0.6 - 0.7) | 47.0 | |
QianHu_BHEBIT_task5_1 | QianHu_DYXS_task5_1 | qianhu_bhebit2024 | 21.7 (21.1 - 22.2) | 51.4 | |
QianHu_BIT_task5_3 | QianHu_DYXS_task5_3 | qianhu_bit2024 | 21.7 (21.2 - 22.1) | 51.8 | |
QianHu_BHEBIT_task5_4 | QianHu_DYXS_task5_4 | qianhu_bhebit2024 | 42.5 (41.7 - 43.0) | 55.5 | |
QianHu_BHEBIT_task5_2 | QianHu_DYXS_task5_2 | qianhu_bhebit2024 | 34.1 (33.6 - 34.5) | 53.7 | |
Hoffman_ESP_task5_4 | In-context learning | hoffman_esp2024 | 8.1 (7.6 - 8.4) | 44.4 | |
Hoffman_ESP_task5_2 | In-context learning | hoffman_esp2024 | 3.3 (3.0 - 3.5) | 58.4 | |
Hoffman_ESP_task5_3 | In-context learning | hoffman_esp2024 | 1.2 (0.9 - 1.3) | 55.9 | |
Hoffman_ESP_task5_1 | In-context learning | hoffman_esp2024 | 6.5 (6.1 - 6.9) | 53.3 | |
Bordoux_WUR_task5_1 | AVESproto direct | bordoux_wur2024 | 42.0 (41.3 - 42.5) | 39.3 |
Dataset wise metrics
Rank |
Submission code |
Submission name |
Technical Report |
Event-based F-score with 95% confidence interval (Evaluation dataset) |
Event-based F-score (CHE dataset) |
Event-based F-score (CT dataset) |
Event-based F-score (MGE dataset) |
Event-based F-score (MS dataset) |
Event-based F-score (QU dataset) |
Event-based F-score (DC dataset) |
Event-based F-score (CHE23 dataset) |
Event-based F-score (CW dataset) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Baseline_TempMatch_task5_1 | Baseline Template Matching | 14.9 (14.0 - 15.3) | 21.1 | 7.2 | 44.1 | 8.0 | 9.7 | 34.9 | 36.1 | 44.2 | ||
Baseline_PROTO_task5_1 | Baseline Prototypical Network | 41.6 (41.0 - 42.1) | 37.9 | 28.6 | 41.3 | 58.7 | 26.3 | 47.2 | 68.3 | 62.7 | ||
KAO_NTHU_task5_2 | cosine similarity_Chromagram | kao_nthu2024 | 30.27 ( 29.42 - 30.81 ) | 78.4 | 14.8 | 52.6 | 41.3 | 16.2 | 45.1 | 33.9 | 47.2 | |
KAO_NTHU_task5_3 | cosine similarity_Mel-Spectrograms\u222AChromagram | kao_nthu2024 | 40.1 (39.2 - 40.7) | 64.8 | 16.0 | 60.0 | 52.9 | 32.8 | 54.5 | 54.9 | 57.5 | |
KAO_NTHU_task5_1 | cosine similarity_Mel-Spectrograms | kao_nthu2024 | 46.9 (45.6 - 47.9) | 92.5 | 16.6 | 83.0 | 59.7 | 44.5 | 61.1 | 71.5 | 70.3 | |
Lu_AILab_task5_1 | Lu_AILab_task5_1 | lu_ailab2024 | 56.7 (56.2 - 57.1) | 54.6 | 47.6 | 70.5 | 62.8 | 47.7 | 52.1 | 78.0 | 69.0 | |
Lu_AILab_task5_2 | Lu_AILab_task5_2 | lu_ailab2024 | 31.4 (31.0 - 31.7) | 48.4 | 40.2 | 33.0 | 39.5 | 39.7 | 18.5 | 55.8 | 19.4 | |
Lu_AILab_task5_3 | Lu_AILab_task5_3 | lu_ailab2024 | 37.7 (37.3 - 38.1) | 38.6 | 48.1 | 63.0 | 59.5 | 47.5 | 21.2 | 71.9 | 21.2 | |
Lu_AILab_task5_4 | Lu_AILab_task5_4 | lu_ailab2024 | 43.0 (42.3 - 43.4) | 56.7 | 30.1 | 35.5 | 39.5 | 39.6 | 48.9 | 65.8 | 68.2 | |
XF_NUDT_task5_1 | Unet-Frame-Level | xf_nudt2024 | 64.1 (63.8 - 64.5) | 87.6 | 62.4 | 76.4 | 73.7 | 37.2 | 67.3 | 79.9 | 76.8 | |
XF_NUDT_task5_4 | AAPM-Seg-Level | xf_nudt2024 | 28.4 (27.4 - 29.0) | 74.1 | 11.1 | 76.0 | 52.1 | 12.3 | 32.9 | 78.4 | 67.7 | |
XF_NUDT_task5_3 | pcenBase-Frame-Level | xf_nudt2024 | 61.5 (61.1 - 61.9) | 85.9 | 59.7 | 83.8 | 68.6 | 41.1 | 60.5 | 59.9 | 72.4 | |
XF_NUDT_task5_2 | logmelBase-Frame-Level | xf_nudt2024 | 65.2 (64.9 - 65.5) | 64.1 | 69.4 | 93.1 | 78.9 | 41.2 | 64.2 | 79.7 | 75.5 | |
Latifi_IDMT_task5_1 | Cross-correlation baseline | latifi_idmt2024 | 0.7 (0.6 - 0.7) | 1.9 | 0.7 | 0.7 | 6.5 | 0.7 | 0.5 | 0.6 | 0.3 | |
QianHu_BHEBIT_task5_1 | QianHu_DYXS_task5_1 | qianhu_bhebit2024 | 21.7 (21.1 - 22.2) | 36.4 | 21.9 | 62.5 | 38.8 | 5.5 | 40.1 | 58.3 | 59.9 | |
QianHu_BIT_task5_3 | QianHu_DYXS_task5_3 | qianhu_bit2024 | 21.7 (21.2 - 22.1) | 50.3 | 31.5 | 69.9 | 29.2 | 5.5 | 34.9 | 64.3 | 32.6 | |
QianHu_BHEBIT_task5_4 | QianHu_DYXS_task5_4 | qianhu_bhebit2024 | 42.5 (41.7 - 43.0) | 75.0 | 27.3 | 42.6 | 38.7 | 31.0 | 43.8 | 64.2 | 78.0 | |
QianHu_BHEBIT_task5_2 | QianHu_DYXS_task5_2 | qianhu_bhebit2024 | 34.1 (33.6 - 34.5) | 28.9 | 26.1 | 60.8 | 40.3 | 18.6 | 39.8 | 54.8 | 59.9 | |
Hoffman_ESP_task5_4 | In-context learning | hoffman_esp2024 | 8.1 (7.6 - 8.4) | 55.9 | 43.5 | 3.0 | 27.0 | 2.2 | 31.7 | 56.7 | 38.9 | |
Hoffman_ESP_task5_2 | In-context learning | hoffman_esp2024 | 3.3 (3.0 - 3.5) | 76.4 | 42.1 | 1.8 | 26.4 | 0.6 | 51.7 | 65.9 | 39.9 | |
Hoffman_ESP_task5_3 | In-context learning | hoffman_esp2024 | 1.2 (0.9 - 1.3) | 39.3 | 2.1 | 21.6 | 0.2 | 45.0 | 60.6 | 47.1 | ||
Hoffman_ESP_task5_1 | In-context learning | hoffman_esp2024 | 6.5 (6.1 - 6.9) | 68.2 | 39.5 | 4.5 | 36.2 | 1.2 | 41.3 | 72.6 | 39.0 | |
Bordoux_WUR_task5_1 | AVESproto direct | bordoux_wur2024 | 42.0 (41.3 - 42.5) | 60.6 | 18.1 | 66.4 | 68.4 | 36.0 | 51.0 | 53.9 | 52.5 |
Teams ranking
Table including only the best performing system per submitting team.
Rank |
Submission code |
Submission name |
Technical Report |
Event-based F-score with 95% confidence interval (Evaluation dataset) |
Event-based F-score (Development dataset) |
---|---|---|---|---|---|
Baseline_TempMatch_task5_1 | Baseline Template Matching | 14.9 (14.0 - 15.3) | |||
Baseline_PROTO_task5_1 | Baseline Prototypical Network | 41.6 (41.0 - 42.1) | |||
KAO_NTHU_task5_1 | cosine similarity_Mel-Spectrograms | kao_nthu2024 | 46.9 (45.6 - 47.9) | 44.2 | |
Lu_AILab_task5_1 | Lu_AILab_task5_1 | lu_ailab2024 | 56.7 (56.2 - 57.1) | 44.1 | |
XF_NUDT_task5_2 | logmelBase-Frame-Level | xf_nudt2024 | 65.2 (64.9 - 65.5) | 70.6 | |
Latifi_IDMT_task5_1 | Cross-correlation baseline | latifi_idmt2024 | 0.7 (0.6 - 0.7) | 47.0 | |
QianHu_BHEBIT_task5_4 | QianHu_DYXS_task5_4 | qianhu_bhebit2024 | 42.5 (41.7 - 43.0) | 55.5 | |
Hoffman_ESP_task5_4 | In-context learning | hoffman_esp2024 | 8.1 (7.6 - 8.4) | 44.4 | |
Bordoux_WUR_task5_1 | AVESproto direct | bordoux_wur2024 | 42.0 (41.3 - 42.5) | 39.3 |
System characteristics
General characteristics
Rank | Code |
Technical Report |
Event-based F-score with 95% confidence interval (Evaluation dataset) |
Sampling rate |
Data augmentation |
Features |
---|---|---|---|---|---|---|
Baseline_TempMatch_task5_1 | 14.9 (14.0 - 15.3) | any | spectrogram | |||
Baseline_PROTO_task5_1 | 41.6 (41.0 - 42.1) | 22.05 KHz | PCEN | |||
KAO_NTHU_task5_2 | kao_nthu2024 | 30.27 ( 29.42 - 30.81 ) | melspectrogram, chromagram. | |||
KAO_NTHU_task5_3 | kao_nthu2024 | 40.1 (39.2 - 40.7) | melspectrogram, chromagram. | |||
KAO_NTHU_task5_1 | kao_nthu2024 | 46.9 (45.6 - 47.9) | melspectrogram, chromagram. | |||
Lu_AILab_task5_1 | lu_ailab2024 | 56.7 (56.2 - 57.1) | add_gussionNoise,FilterAugment | log-mel | ||
Lu_AILab_task5_2 | lu_ailab2024 | 31.4 (31.0 - 31.7) | add_gussionNoise,FilterAugment | PCEN | ||
Lu_AILab_task5_3 | lu_ailab2024 | 37.7 (37.3 - 38.1) | add_gussionNoise,FilterAugment | PCEN | ||
Lu_AILab_task5_4 | lu_ailab2024 | 43.0 (42.3 - 43.4) | add_gussionNoise,FilterAugment | log-mel | ||
XF_NUDT_task5_1 | xf_nudt2024 | 64.1 (63.8 - 64.5) | mixup | log-mel | ||
XF_NUDT_task5_4 | xf_nudt2024 | 28.4 (27.4 - 29.0) | FBANK | |||
XF_NUDT_task5_3 | xf_nudt2024 | 61.5 (61.1 - 61.9) | Mixup | PCEN | ||
XF_NUDT_task5_2 | xf_nudt2024 | 65.2 (64.9 - 65.5) | Mixup | log-mel | ||
Latifi_IDMT_task5_1 | latifi_idmt2024 | 0.7 (0.6 - 0.7) | any | [block mixing] | spectrogram | |
QianHu_BHEBIT_task5_1 | qianhu_bhebit2024 | 21.7 (21.1 - 22.2) | delta MFCC & PCEN | |||
QianHu_BIT_task5_3 | qianhu_bit2024 | 21.7 (21.2 - 22.1) | delta MFCC & PCEN | |||
QianHu_BHEBIT_task5_4 | qianhu_bhebit2024 | 42.5 (41.7 - 43.0) | delta MFCC & PCEN | |||
QianHu_BHEBIT_task5_2 | qianhu_bhebit2024 | 34.1 (33.6 - 34.5) | delta MFCC & PCEN | |||
Hoffman_ESP_task5_4 | hoffman_esp2024 | 8.1 (7.6 - 8.4) | mixup, resampling, time reversal, block mixing, | spectrogram | ||
Hoffman_ESP_task5_2 | hoffman_esp2024 | 3.3 (3.0 - 3.5) | mixup, resampling, time reversal, block mixing, | spectrogram | ||
Hoffman_ESP_task5_3 | hoffman_esp2024 | 1.2 (0.9 - 1.3) | mixup, resampling, time reversal, block mixing, | spectrogram | ||
Hoffman_ESP_task5_1 | hoffman_esp2024 | 6.5 (6.1 - 6.9) | mixup, resampling, time reversal, block mixing, | spectrogram | ||
Bordoux_WUR_task5_1 | bordoux_wur2024 | 42.0 (41.3 - 42.5) | waveform |
Machine learning characteristics
Rank | Code |
Technical Report |
Event-based F-score (Eval) |
Classifier | Few-shot approach | Post-processing |
---|---|---|---|---|---|---|
Baseline_TempMatch_task5_1 | 14.9 (14.0 - 15.3) | template matching | template matching | peak picking, threshold | ||
Baseline_PROTO_task5_1 | 41.6 (41.0 - 42.1) | ResNet | prototypical | threshold | ||
KAO_NTHU_task5_2 | kao_nthu2024 | 30.27 ( 29.42 - 30.81 ) | supervised learning | Cosine similarity based | Customized Merging [median filtering, time aggregation...] | |
KAO_NTHU_task5_3 | kao_nthu2024 | 40.1 (39.2 - 40.7) | supervised learning | Cosine similarity based | Customized Merging [median filtering, time aggregation...] | |
KAO_NTHU_task5_1 | kao_nthu2024 | 46.9 (45.6 - 47.9) | supervised learning | Cosine similarity based | Customized Merging [median filtering, time aggregation...] | |
Lu_AILab_task5_1 | lu_ailab2024 | 56.7 (56.2 - 57.1) | CNN | fine tuning | peak picking, threshold | |
Lu_AILab_task5_2 | lu_ailab2024 | 31.4 (31.0 - 31.7) | CNN | fine tuning | peak picking, threshold | |
Lu_AILab_task5_3 | lu_ailab2024 | 37.7 (37.3 - 38.1) | CNN | fine tuning | peak picking, threshold | |
Lu_AILab_task5_4 | lu_ailab2024 | 43.0 (42.3 - 43.4) | CNN | fine tuning | peak picking, threshold | |
XF_NUDT_task5_1 | xf_nudt2024 | 64.1 (63.8 - 64.5) | CNN | fine tuning | median filtering, threshold | |
XF_NUDT_task5_4 | xf_nudt2024 | 28.4 (27.4 - 29.0) | transformer | fine tuning | median filtering, threshold | |
XF_NUDT_task5_3 | xf_nudt2024 | 61.5 (61.1 - 61.9) | CNN | fine tuning | median filtering, threshold | |
XF_NUDT_task5_2 | xf_nudt2024 | 65.2 (64.9 - 65.5) | CNN | fine tuning | median filtering, threshold | |
Latifi_IDMT_task5_1 | latifi_idmt2024 | 0.7 (0.6 - 0.7) | template matching | [fine tuning] | peak picking, threshold | |
QianHu_BHEBIT_task5_1 | qianhu_bhebit2024 | 21.7 (21.1 - 22.2) | CNN | prototypical | threshold=0.15 | |
QianHu_BIT_task5_3 | qianhu_bit2024 | 21.7 (21.2 - 22.1) | CNN | prototypical | threshold=0.16 | |
QianHu_BHEBIT_task5_4 | qianhu_bhebit2024 | 42.5 (41.7 - 43.0) | CNN | prototypical | threshold=0.15 | |
QianHu_BHEBIT_task5_2 | qianhu_bhebit2024 | 34.1 (33.6 - 34.5) | prot | |||
Hoffman_ESP_task5_4 | hoffman_esp2024 | 8.1 (7.6 - 8.4) | transformer | in-context | threshold (data dependent), smoothing (data dependent) | |
Hoffman_ESP_task5_2 | hoffman_esp2024 | 3.3 (3.0 - 3.5) | transformer | in-context | threshold (data dependent), smoothing (data dependent) | |
Hoffman_ESP_task5_3 | hoffman_esp2024 | 1.2 (0.9 - 1.3) | transformer | in-context | threshold (data dependent), smoothing (data dependent) | |
Hoffman_ESP_task5_1 | hoffman_esp2024 | 6.5 (6.1 - 6.9) | transformer | in-context | threshold (data dependent), smoothing (data dependent) | |
Bordoux_WUR_task5_1 | bordoux_wur2024 | 42.0 (41.3 - 42.5) | transformer based auto encoder | prototypical | threshold 70% of annotation length |
Complexity
Rank | Code |
Technical Report |
Event-based F-score (Eval) |
Model complexity |
Training time |
---|---|---|---|---|---|
Baseline_TempMatch_task5_1 | 14.9 (14.0 - 15.3) | ||||
Baseline_PROTO_task5_1 | 41.6 (41.0 - 42.1) | ||||
KAO_NTHU_task5_2 | kao_nthu2024 | 30.27 ( 29.42 - 30.81 ) | 120 hours | ||
KAO_NTHU_task5_3 | kao_nthu2024 | 40.1 (39.2 - 40.7) | 120 hours | ||
KAO_NTHU_task5_1 | kao_nthu2024 | 46.9 (45.6 - 47.9) | 120 hours | ||
Lu_AILab_task5_1 | lu_ailab2024 | 56.7 (56.2 - 57.1) | 27M | 1 hour(A6000-48G) | |
Lu_AILab_task5_2 | lu_ailab2024 | 31.4 (31.0 - 31.7) | 27M | 1 hour(A6000-48G) | |
Lu_AILab_task5_3 | lu_ailab2024 | 37.7 (37.3 - 38.1) | 27M | 1 hour(A6000-48G) | |
Lu_AILab_task5_4 | lu_ailab2024 | 43.0 (42.3 - 43.4) | 27M | 1 hour(A6000-48G) | |
XF_NUDT_task5_1 | xf_nudt2024 | 64.1 (63.8 - 64.5) | 880k | 56minutes | |
XF_NUDT_task5_4 | xf_nudt2024 | 28.4 (27.4 - 29.0) | 89M | 3h | |
XF_NUDT_task5_3 | xf_nudt2024 | 61.5 (61.1 - 61.9) | 359k | 48minuates | |
XF_NUDT_task5_2 | xf_nudt2024 | 65.2 (64.9 - 65.5) | 359k | 48minutes | |
Latifi_IDMT_task5_1 | latifi_idmt2024 | 0.7 (0.6 - 0.7) | 12046462 | 12 hours 8GB GPU | |
QianHu_BHEBIT_task5_1 | qianhu_bhebit2024 | 21.7 (21.1 - 22.2) | 724k | 4 hours on RTX 4070 | |
QianHu_BIT_task5_3 | qianhu_bit2024 | 21.7 (21.2 - 22.1) | 726k | 4 hours on RTX 4070 | |
QianHu_BHEBIT_task5_4 | qianhu_bhebit2024 | 42.5 (41.7 - 43.0) | 724k | 4 hours on RTX 4070 | |
QianHu_BHEBIT_task5_2 | qianhu_bhebit2024 | 34.1 (33.6 - 34.5) | 724k | 4 hours on RTX 4070 | |
Hoffman_ESP_task5_4 | hoffman_esp2024 | 8.1 (7.6 - 8.4) | 33.5 Million unfrozen, plus 80 Million frozen | 24 hours, A100 GPU | |
Hoffman_ESP_task5_2 | hoffman_esp2024 | 3.3 (3.0 - 3.5) | 33.5 Million unfrozen, plus 80 Million frozen | 24 hours, A100 GPU | |
Hoffman_ESP_task5_3 | hoffman_esp2024 | 1.2 (0.9 - 1.3) | 33.5 Million unfrozen, plus 80 Million frozen | 24 hours, A100 GPU | |
Hoffman_ESP_task5_1 | hoffman_esp2024 | 6.5 (6.1 - 6.9) | 33.5 Million unfrozen, plus 80 Million frozen | 24 hours, A100 GPU | |
Bordoux_WUR_task5_1 | bordoux_wur2024 | 42.0 (41.3 - 42.5) | 90M |
Technical reports
ADAPTABLE INPUT LENGTH USING MODEL TRAINED ON WAVEFORM Technical Report
Bordoux,Valentin
Marine Animal Ecology,Wageningen University
Abstract
This report presents a method for bioacoustic sound event detec- tion using few-shot learning, developed for the DCASE 2024 Task 5. Our approach experiments with pretrained models that take waveforms as input. These models serve as feature extrac- tors, and prototypical loss is used for prediction. Initially, we employed direct predictions with openly available pretrained models. Subsequently, we attempted to fine-tune the models for each file, using only the first five annotations as training set. The direct prediction system achieved 40% F-measure score, 12 points under the baseline system proposed by the organizers. Fine-tuning did not improve the model's performance over direct prediction.While our proposed method can be applied directly without extensive parameter tuning or additional training, the results indicate that it does not achieve the generalizability required for this challenge when compared to the baseline method. This work suggests how state-of-the-art models, despite their high perfor- mance on other datasets or benchmarks, may still perform subop- timal on sound event detection using few-shot learning for cer- tain taxonomic groups present in the DCASE challenge datasets.
System characteristics
System embeddings | AVES (wav2wec2) |
Subsystem count | False |
External data usage | False |
Toward in-context bioacoustic sound event detection
Hoffman,Benjamin and Robinson,David
Earth Species Project
Abstract
We introduce an in-context learning approach to bioacoustic sound event detection. Our approach consists of a large pre-trained transformer model which, when prompted with a small amount of labeled audio, directly predicts de- tection labels on unlabeled audio. To train our model, we constructed a large audio database, which we used to generate acoustic scenes with temporally fine-grained detection labels. On the validation set for the 2024 DCASE Few-shot bioacous- tic event detection challenge, our best-performing submission achieves an average F1 score of 0.584, improving on the challenge baseline by 0.063.
System characteristics
Data augmentation | mixup, resampling, time reversal, block mixing |
System embeddings | ATST-FRAME |
Subsystem count | False |
External data usage | True |
Cosine similarity based Few-shot Bioacoustic Event Detection with Automatic Frequency Range Identification in Mel-Spectrograms Technical Report
Kao,Sheng-Lun and Liu,Yi-Wen
Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan
Abstract
In response to the Few-shot Bioacoustic Event Detection chal- lenge, we have developed a detection system comprising three key components. First, an algorithm has been devised for auto- matically identifying frequency ranges of the positive (POS) sig- nal within the mel-spectrogram. Secondly, the cosine similarity between POS and negative (NEG) events is computed across the entire audio file. Thirdly, predictions of POS events are made based on the results of cosine similarity. Remarkably, this ap- proach does not rely on any training data from the development dataset, external data, or pretrained models. The proposed system achieved an F1-score of 44.187% on the 2023 validation set.
System characteristics
System embeddings | CNN |
Subsystem count | False |
External data usage | False |
CROSS-ADAPT: CROSS-DATASET GENERATION AND DOMAIN ADAPTATION TECHNIQUE FOR FEW-SHOT LEARNING Technical Report
Bidarouni,Amir Latifi and Abeßer,Jakob
Semantic Music Technologies Group, Fraunhofer IDMIT, Ilmenau, Germany
Latifi_IDMT_task5_1
CROSS-ADAPT: CROSS-DATASET GENERATION AND DOMAIN ADAPTATION TECHNIQUE FOR FEW-SHOT LEARNING Technical Report
Bidarouni,Amir Latifi and Abeßer,Jakob
Semantic Music Technologies Group, Fraunhofer IDMIT, Ilmenau, Germany
Abstract
Bioacoustic monitoring is an invaluable tool for understanding wildlife well-being. However, the scarcity of annotated data for effective model training coupled with domain shifts resulting from data recorded at various sensor locations with diverse acoustic en- vironments poses significant challenges for deep learning-based audio classification systems. In this paper, we propose a novel cross-dataset data augmentation technique designed to effectively use the limited annotated data available, exemplified by the few- shot learning task 5 of the DCASE challenge. Furthermore, we employ Instance-wise Feature Projection-based Domain Adapta- tion (IFPDA) to mitigate the domain shifts caused by variations in recording locations or devices. We use a modified ResNet model architecture for a multitask learning setting, which combines multi- class species classification on a patch level and binary classification for frame-level sound event detection.
System characteristics
Data augmentation | Block mixing |
System embeddings | Modified ResNet50 |
Subsystem count | False |
External data usage | False |
FEW-SHOT BIOACOUSTIC EVENT DETECTION AT THE DCASE 2024 CHALLENGE Technical Report
Liu Wei, and Liu,Hy and Lin,Fl and Liu,Hs and Gao,Tian and Fang,Xin and Liu Jh
iFLYTEK Research Institute, HeFei, China, and National University of Defense Technology, ChangSha, China
XF_NUDT_task5_1 XF_NUDT_task5_2 XF_NUDT_task5_3
FEW-SHOT BIOACOUSTIC EVENT DETECTION AT THE DCASE 2024 CHALLENGE Technical Report
Liu Wei, and Liu,Hy and Lin,Fl and Liu,Hs and Gao,Tian and Fang,Xin and Liu Jh
iFLYTEK Research Institute, HeFei, China, and National University of Defense Technology, ChangSha, China
Abstract
In this technical report, we describe the submission system for DCASE2024 Task 5: Few-shot Bioacoustic Event Detection. In previous work, we proposed a frame-level embedding learning system and achieved the best performance in DCASE2022 Task 5. In this task, we propose several methods to improve the representational capacity of embeddings under limited positive samples. Three methods are proposed based on the pre-training fine-tuning process, including the AAPM segment-level embedding learning method, the Baseline framework-level embedding learning method, and the Unet network-based framework-level embedding learning method. Compared to our previous work, our new system achieved better results on the official 2023 validation set (F-measure 76.8%, No ML). The proposed system was evaluated on the newly released official 2024 validation set, with a best overall F-measure score of 70.56%.
System characteristics
Data augmentation | Mixup |
Subsystem count | False |
External data usage | False |
FEW-SHOT BIOACOUSTIC EVENT DETECTION WITH FRAME-LEVEL EMBEDDING LEARNING SYSTEM Technical Report
Zhao,Peng Yuan and Lu,Cheng Wei and Zou, Liang
China University of Mining and Technology, XuZhou, China
Lu_AILab_task5_1 Lu_AILab_task5_2 Lu_AILab_task5_3 Lu_AILab_task5_4
FEW-SHOT BIOACOUSTIC EVENT DETECTION WITH FRAME-LEVEL EMBEDDING LEARNING SYSTEM Technical Report
Zhao,Peng Yuan and Lu,Cheng Wei and Zou, Liang
China University of Mining and Technology, XuZhou, China
Abstract
In this technical report, we describe our submission system for the few-shot bioacoustic event detection in the DCASE2022 task5. Participants are expected to develop a few-shot learning system for detecting mammal and birds sounds from audio recordings. In our system, Prototypical Networks are used to embed spectrograms into an embedding space and learn a non-linear mapping between data samples. We leverage various data augmentation techniques on Mel-spectrograms and introduce a ResNet variant as the classifier. Our experiments demonstrate that the system can achieve the F1-score of 47.88\% on the vali-dation data.
System characteristics
Data augmentation | add_gussionNoise,FilterAugment |
System embeddings | False |
Subsystem count | False |
External data usage | False |
LIF-PROTONET: PROTOTYPICAL NETWORK WITH LEAKY INTEGRATE-AND-FIRE NEURON AND SQUEEZE-AND-EXCITATION BLOCKS FOR BIOACOUSTIC EVENT DETECTION Technical report
Sun,Mengkai and Zhang,Haojie and Qian,Kun and Hu1, Bin
Key Laboratory of Brain Health Intelligent Evaluation and Intervention,Ministry of Education (Beijing Institute of Technology), P. R. China and School of Medical Technology, Beijing Institute of Technology, P. R. China
QianHu_BHEBIT_task5_1QianHu_BHEBIT_task5_2 QianHu_BHEBIT_task5_3 QianHu_BHEBIT_task5_4
LIF-PROTONET: PROTOTYPICAL NETWORK WITH LEAKY INTEGRATE-AND-FIRE NEURON AND SQUEEZE-AND-EXCITATION BLOCKS FOR BIOACOUSTIC EVENT DETECTION Technical report
Sun,Mengkai and Zhang,Haojie and Qian,Kun and Hu1, Bin
Key Laboratory of Brain Health Intelligent Evaluation and Intervention,Ministry of Education (Beijing Institute of Technology), P. R. China and School of Medical Technology, Beijing Institute of Technology, P. R. China
Abstract
In this technical repot, we describe our submission system for DCASE2024 Task5: Few-shot Bioacoustic Event Detection. We propose a metric learning method to construct a novel prototypical network, based on Leaky Integrate-and-Fire Neuron and Squeeze- and-Excitation (SE) blocks. We make better utilization of the nega- tive data, which can be used to construct the loss function and pro- vide much more semantic information. Most importantly, we pro- pose to use SE blocks to adaptively recalibrate channel-wise feature response, by explicitly modeling interdependencies between chan- nels, which improves f-measure to 53.72 %. For the input feature, we use combination of per-channel energy normalization (PCEN) and delta mel-frequency cepstral coefficients (∆MFCC), then the features were first transformed through Leaky Integrate-and-Fire Neuron to mimic brain function. Our system performs better than the baseline given by the officials, on the DCASE2024 task 5 vali- dation set. Our final score reaches an f-measure of 55.49 %, outper- forming the baseline performance.
System characteristics
System embeddings | CNN |
Subsystem count | False |
External data usage | False |