Bird audio detection


Challenge results

Task description

The task is to design a system that, given a short audio recording, returns a binary decision for the presence/absence of bird sound (bird sound of any kind).

An important goal of this task is generalisation to new conditions. To explore this we provide 3 separate development datasets, and 3 evaluation datasets, each recorded under differing conditions. The datasets have different balances of positive/negative cases, different bird species, different background sounds, different recording equipment.

More detailed task description can be found in the task description page

Teams ranking

Table including only the best performing system per submitting team.

Rank Submission
name
Technical
Report
AUC
with 95% confidence interval
(Evaluation dataset)
Lasseck_MfN_1 Lasseck_MfN 89.0 (87.7 - 89.9)
bulbul_DCASE_1 bulbul_DCASE 88.5 (86.9 - 89.1)
SpeechLab_UKY_3 SpeechLab_UKY 83.9 (81.7 - 84.7)
JiananSong_BUPT_1 JiananSong_BUPT 82.1 (80.3 - 83.0)
Himawan_QUT_1 Himawan_QUT 81.7 (80.3 - 82.8)
Bai_NPU_1 Bai_NPU 81.5 (80.1 - 82.8)
Baseline_Surrey_1 Baseline_Surrey 80.9 (79.1 - 82.4)
Berger_JKU_1 Berger_JKU 80.8 (79.2 - 82.5)
Mukherjee_IITKgp_2 Mukherjee_IITKgp 80.7 (79.5 - 82.3)
Yu_LR_2 Yu_LR 80.6 (78.5 - 81.4)
Thakur_IITMANDI_1 Thakur_IITMANDI 79.2 (76.7 - 79.5)
Vesperini_A3Lab_1 Vesperini_A3Lab 78.8 (77.4 - 80.2)
Tao_IITLAB_2 Tao_IITLAB 75.4 (73.2 - 77.1)
skfl_DCASE_1 skfl_DCASE 73.4 (72.0 - 75.3)
smacpy_DCASE_1 smacpy_DCASE 51.7 (50.5 - 52.5)
Jamali_HUT_1 Jamali_HUT 48.9 (46.4 - 49.6)

Prize winners

The two prize winners receive £250 in recognition of their contribution.

1: Highest-scoring open-source/reproducible method award

  • Winner: Liaquat et al (University of Kentucky, USA) - This student team re-implemented the "bulbul" system (last year's winner) and then evaluated various ideas for improving it. Although the individual modifications did not improve the score, an ensemble of the resulting systems led to an improved final score. The tech report gives a discussion of the techniques tried, including a domain adaptation method and signal enhancement.

Code

2: Judges' award for the method considered by the judges to be the most interesting or innovative.

  • Winner: Vesperini et al (Università Politecnica delle Marche, Italy) - The authors use "capsule networks", a new idea for routing between modules in neural networks. The paper gives a clear introduction to the concept, and it's encouraging that this rather new idea gets respectable performance on the challenge data (78.8%).

Special mention: Berger et al (Johannes Kepler University, Austria) - The authors use a bulbul-like model, and they describe an interesting domain-adaptation technique, which gives them approximately a 1% boost over their base model.

Systems ranking

Table including all systems officially submitted (up to 4 per team).

Rank Submission
name
Technical
Report
AUC
with 95% confidence interval
(Evaluation datasets)
Lasseck_MfN_1 Lasseck_MfN 89.0 (87.7 - 89.9)
bulbul_DCASE_1 bulbul_DCASE 88.5 (86.9 - 89.1)
SpeechLab_UKY_1 SpeechLab_UKY 82.5 (81.0 - 83.5)
JiananSong_BUPT_1 JiananSong_BUPT 82.1 (80.3 - 83.0)
Himawan_QUT_1 Himawan_QUT 81.7 (80.3 - 82.8)
Bai_NPU_1 Bai_NPU 81.5 (80.1 - 82.8)
Baseline_Surrey_1 Baseline_Surrey 80.9 (79.1 - 82.4)
Berger_JKU_1 Berger_JKU 80.8 (79.2 - 82.5)
Yu_LR_1 Yu_LR 80.5 (78.6 - 81.5)
Mukherjee_IITKgp_1 Mukherjee_IITKgp 80.4 (79.0 - 82.0)
Thakur_IITMANDI_1 Thakur_IITMANDI 79.2 (76.7 - 79.5)
Vesperini_A3Lab_1 Vesperini_A3Lab 78.8 (77.4 - 80.2)
Tao_IITLAB_1 Tao_IITLAB 74.9 (73.4 - 76.7)
skfl_DCASE_1 skfl_DCASE 73.4 (72.0 - 75.3)
smacpy_DCASE_1 smacpy_DCASE 51.7 (50.5 - 52.5)
Jamali_HUT_1 Jamali_HUT 48.9 (46.4 - 49.6)
SpeechLab_UKY_2 SpeechLab_UKY 82.7 (79.8 - 83.6)
Himawan_QUT_2 Himawan_QUT 81.3 (80.0 - 82.7)
Bai_NPU_2 Bai_NPU 80.9 (79.5 - 82.2)
Mukherjee_IITKgp_2 Mukherjee_IITKgp 80.7 (79.5 - 82.3)
Yu_LR_2 Yu_LR 80.6 (78.5 - 81.4)
Vesperini_A3Lab_2 Vesperini_A3Lab 75.9 (73.0 - 78.0)
Tao_IITLAB_2 Tao_IITLAB 75.4 (73.2 - 77.1)
Thakur_IITMANDI_2 Thakur_IITMANDI 75.4 (72.1 - 77.6)
Baseline_Surrey_2 Baseline_Surrey 74.8 (72.8 - 76.3)
Berger_JKU_2 Berger_JKU 70.8 (68.2 - 71.8)
JiananSong_BUPT_2 JiananSong_BUPT 51.5 (49.2 - 52.6)
SpeechLab_UKY_3 SpeechLab_UKY 83.9 (81.7 - 84.7)
Bai_NPU_3 Bai_NPU 81.5 (80.1 - 82.8)
Himawan_QUT_3 Himawan_QUT 80.6 (78.7 - 81.5)
Yu_LR_3 Yu_LR 80.0 (77.7 - 80.6)
Tao_IITLAB_3 Tao_IITLAB 74.1 (72.3 - 76.0)
Thakur_IITMANDI_3 Thakur_IITMANDI 72.9 (70.0 - 74.1)
SpeechLab_UKY_4 SpeechLab_UKY 83.6 (81.4 - 84.6)
Bai_NPU_4 Bai_NPU 81.4 (80.0 - 82.7)
Himawan_QUT_4 Himawan_QUT 78.4 (76.8 - 79.9)
Thakur_IITMANDI_4 Thakur_IITMANDI 77.7 (76.2 - 79.7)

Technical reports

CIAIC-BAD SYSTEM FOR DCASE2018 CHALLENGE TASK 3

Bai, Jisheng and Wu, Ru and Wang, Mou and Li, Dexin and Li, Di and Han, Xueyu and Wang, Qian and Liu, Qing and Wang, Bolun and Fu, Zhonghua
Northwestern Polytechnical University, Xi'an, China

Abstract

In this technical report, we present our system for the task 3 of Detection and Classification of Acoustic Scenes and Events 2018 (DCASE2018) challenge, i.e. bird audio detection(BAD). First, log mel-spectrogram and mel-frequency cepstral coefficients (MFCC) are extracted as features. In order to improve the quality of original audio, same denoising methods are adopted, for example, adaptive denoising in Adobe Audition. Then, convolutional recurrent neural networks (CRNN) with customized activation function is used for detection. Finally, we use aforementioned features as inputs to train our CRNN model and make a fusion on three subsystems to further improve the performance. We evaluate the proposed systems on the dataset with area under the ROC curve (AUC) measure, and our best AUC score on leaderboard dataset is 85.67.

PDF

Bird Audio Detection - DCASE 2018

Franz Berger and William Freillinger and Paul Primus and Wolfgang Reisinger
Johannes Kepler University, Linz

Abstract

In this paper we explore three approaches on bird audio detection. We establish a simple baseline, experiment with handcrafted features and finally move to Convolutional Neural Networks.

PDF

3D CONVOLUTIONAL RECURRENT NEURAL NETWORKS FOR BIRD SOUND DETECTION

Ivan Himawan and Michael Towsey and Paul Roe
Queensland University of Technology

Abstract

With the increasing use of a high quality acoustic devices to monitor wildlife population, it has become imperative to develop techniques for analyzing animals’ calls automatically. Bird sound detection is one example of a long-term monitoring project where data are collected in continuous periods, often cover multiple sites at the same time. Inspired by the success of deep learning approaches in various audio classification tasks, this paper first review previous works exploiting deep learning for bird audio detection, and then proposes a novel 3-dimensional (3D) convolutional and recurrent neural networks. We employed 3D convolutions for extracting spa- tial and temporal information simultaneously. In order to leverage powerful and compact features of 3D convolution, we employ se- parate RNNs, acting on each filter of the last convolutional layers rather than stacking the feature maps in the typical combined CNN and RNN architectures.

PDF

Bird Audio Detection using Supervised Weighted NMF

Soroush Jamali and Juan Ahmadpanah and Ghasem Alipoor
Hamedan University of Technology

Abstract

This paper reports on the results of our bird audio detection system, developed for Task 3 of the DCACE 2018, challenge that is defined as a binary classification problem. Our proposed method is based on supervised non- negative matrix factorization (NMF) of the constant-Q transform (CQT) spectrogram. Two dictionaries are trained over the training data available for the bird and environment classes. Test samples are then linearly decomposed using a combined dictionary, generated by concatenating these two dictionaries. Classification is performed based on the energy of the activations relevant to each class. However, to further improve the classification performance, we propose to weight each activation coefficient according to the contribution of its corresponding basis in constructing each class. A scheme is proposed to extract this contribution weights from the activation coefficients of the training data. The developed system, evaluated over the development dataset of the challenge, results in up to 80% accuracy.

PDF

Bird Audio Detection using Convolutional Neural Networks and Binary Neural Networks

Jinan Song and Shengchen Li
Beijing University of Posts and Telecommunications

Abstract

For the bird audio detection task of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE2017), we propose a audio classification method for bird species identification using Convolutional Neural Networks (CNNs) and Binarized Neural Networks (BNNs).Although deep learning networks is currently popular in bird audio detection[1], the complex network structure makes it difficult to design the hardware of the detection system. Therefore, after the design of the CNNs, the convolutional layer and the fully connected layer are binarized on the basis of the original network, and both network structures are tested. Finally Area Under ROC Curve (AUC) score is used as the evaluation index. The results of using CNNs and BNNs in the preview score are 88.75% and 68.60%.

PDF

DCASE 2018 Challenge Surrey Cross-Task convolutional neural network baseline

Qiuqiang Kong, Iqbal Turab, Xu Yong, Wenwu Wang and Mark D. Plumbley
Centre for Vission, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, UK

Abstract

Detection and classification of acoustic scenes and events (DCASE) 2018 challenge is a well known IEEE AASP challenge consists of several audio classification and sound event detection tasks. DCASE 2018 challenge includes five tasks: 1) Acoustic scene classification, 2) Audio tagging of Freesound, 3) Bird audio detection, 4) Weakly labeled semi-supervised sound event detection and 5) Multi-channel audio tagging. In this paper we open source the python code of all of Task 1 - 5 of DCASE 2018 challenge. The baseline source code contains the implementation of the convolutioanl neural networks (CNNs) including the AlexNetish and the VGGish from the image processing area. We researched how the performance varies from task to task when the configuration of the neural networks are the same. The experiment shows deeper VGGish network performs better than AlexNetish on Task 2 - 5 except Task 1 where VGGish and AlexNetish network perform similar. With the VGGish network, we achieve an accuracy of 0.680 on Task 1, a mean average precision (mAP) of 0.928 on Task 2, an area under the curve (AUC) of 0.854 on Task 3, a sound event detection F1 score of 20.8% on Task 4 and a F1 score of 87.75% on Task 5.

System characteristics
Input mono
Sampling rate 44.1kHz
Features log-mel energies
Classifier VGGish 8 layer CNN with global max pooling; AlexNetish 4 layer CNN with global max pooling
PDF

ACOUSTIC BIRD DETECTION WITH DEEP CONVOLUTIONAL NEURAL NETWORKS

Mario Lasseck
Museum fuer Naturkunde, Berlin

Abstract

This paper presents deep learning techniques for acoustic bird detection. Deep Convolutional Neural Networks (DCNNs), originally designed for image classification, are adapted and fine-tuned to detect the presence of birds in audio recordings. Various data augmentation techniques are applied to increase model performance and improve generalization to unknown recording conditions and new habitats. The proposed approach is evaluated in the Bird Audio Detection task which is part of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2018. It provides the best system for the task and surpasses previous state-of-the-art achieving an area under the curve (AUC) above 95 % on the public challenge leaderboard [1].

PDF

CONVOLUTIONAL RECURRENT NEURAL NETWORK BASED BIRD AUDIO DETECTION

Rajdeep Mukherjee and Dipyaman Banerjee and Kuntal Dey and Niloy Ganguly
Indian Institute of Technology, Kharagpur and IBM Research, New Delhi

Abstract

We propose a Convolutional Recurrent Neural Network (CRNN) based approach, implemented as a Convolutional Neural Network (CNN) followed by a Recurrent Neural Network (RNN), for the task of detecting the presence of birds in audio recordings. As part of the IEEE DCASE 2018 Challenge, we were provided with three sep- arate development datasets containing recordings from three very different bird sound monitoring projects. We performed a stratified 3-way cross-validation mechanism for training our model by con- sidering two datasets for training and the remaining one for valida- tion in each fold in order to generalize our model well when exposed to data from unseen conditions. We obtained an Area Under Curve (AUC) measure of 88.7% on the leaderboard test set. We compare our results with the CNN version of our model which achieves an AUC measure of 87.74% on the same test set.

PDF

DOMAIN TUNING METHODS FOR BIRD AUDIO DETECTION

Sidrah Liaqat and Narjes Bozorg and Neenu Jose and Patrick Conrey and Antony Tamasi and Michael T. Johnson
University of Kentucky Speech and Signal Processing Lab

Abstract

This paper presents several feature extraction and normal- ization methods implemented for the DCASE 2018 Bird Audio Detection challenge, a binary audio classification task to identify whether a ten second audio segment from a specified dataset contains one or more bird vocaliza- tions. Our baseline system is adapted from the Convolu- tional Neural Network system of last year’s challenge winner bulbul [1]. We introduce one feature modification, an increase in temporal resolution of the Mel-spectrogram feature matrix, tailored to the fast-changing temporal structure of many song-bird vocalizations. Additionally, we introduce two feature normalization approaches, a front-end signal enhancement method to reduce differ- ences in dataset noise characteristics and an explicit do- main adaptation method based on covariance normaliza- tion. Overall results show that none of these approaches gave significant improvements for either a within-dataset training/testing paradigm or a cross-dataset train- ing/testing paradigm.

Awards: Highest-scoring open source / reproducible method

PDF

BIRD AUDIO DETECTION FOR DCASE 2018 CHALLENGE TECHNICAL REPORT

Lianjie Tao and Xinxing Chen
Chongqing University

Abstract

The 2018 BAD challenge [1] requires to determine bird audio in a 10 seconds sound clips, the organizer gave us three development datasets for training our NN, and three evaluation datasets to evaluate our NN. The goal of the challenge is to maximize the recognition of audio in the birds.

PDF

LEARNED AGGREGATION IN CNN: ALL-CONV NET FOR BIRD ACTIVITY DETECTION

Anshul Thakur and Arjun Pankajakshan and Padmanabhan Rajan
Indian Institute of Technology Mandi

Abstract

The task 3 of DCASE 2018 i.e. bird activity detection (BAD) deals with identifying the presence or absence of bird vocaliza- tions in a given audio recording. In this submission, we utilize an all-convolutional neural network (all-conv net) for BAD. The network is characterized by the utilization of convolutional oper- ations to implement aggregation/pooling and dense layers. The ag- gregation operation implemented by convolution helps in capturing the inter feature-map correlations which are ignored in traditional max/average pooling. This helps in learning a function which ag- gregates the complementary information in various feature maps, leading to better bird activity detection. Building on the all-conv net, we utilize four different derivative systems which provide good validation and preview scores.

PDF

A CAPSULE NEURAL NETWORKS BASED APPROACH FOR BIRD AUDIO DETECTION

Fabio Vesperini and Leonardo Gabrielli and Emanuele Principi and Stefano Squartini
Università Politecnica delle Marche, Ancona

Abstract

We propose a system for bird audio detection based on the innova- tive CapsNet architecture. It is our contribution to the third task of the DCASE2018 Challenge. The task consists on a binary detec- tion of presence/absence of bird sounds on audio files belonging to different datasets. Spectral acoustic features are extracted from the acoustic signals, successively a deep neural network which com- prehend capsule units is trained by means of supervised learning using binary annotations of bird song activity as target vector in combination with the dynamic routing mechanism. This procedure has the aim to incentive the network to learn global coherence im- plicitly and to identify part-whole relationships between capsules, thereby improving generalization performance in detecting the pres- ence bird songs from various environmental conditions. We achieve a harmonic mean of the Area Under Roc Curve (AUC) score equal to 85.08 from the cross-validation performed on the development dataset, while we obtain an AUC equal to 84.43 as preview score from a subset of the unseen evaluation data.

Awards: Judges' award

PDF

DCASE 2018 CHALLENGE TECHNICAL REPORT

Chenchen Yu and Yu Hao and Wenbo Yang and Bo Fu
AI Lab, Lenovo Research

Abstract

For the task of Bird Audio Detection in the DCASE Challenge 2018[1], we present three approaches that all use convolutional neural networks on Mel-spectrogram. We obtained Area Under Curve (AUC) measure of 0.8610, 0.8548, 0.8464 on preview score which is calculated using approximate 1000 files randomly selected from the Chernobyl and warblrb10k data.

PDF