Acoustic-Based Traffic Monitoring


Challenge results

Task description

This task aims to design an acoustic-based traffic monitoring solution an essential parts of smart city development to monitor the usage and condition of roadway infrastructures and detect anomalies. The challenge focuses on developng models to count the number of vehicles, per vehicle type (car or commercial vehicle) and per direction of travel (left or right).

More detailed task description can be found in the task description page

Systems ranking

Submission information Rank Kendall's Tau car left per location
Kendall's Tau car right per location
Kendall's Tau cv left per location
Kendall's Tau cv right per location
RMSE car left per location
RMSE car right per location
RMSE cv left per location
RMSE cv right per location
Rank Submission label Name Technical
Report
Official
rank
Rank
score
loc
1
loc
2
loc
3
loc
4
loc
5
loc
6
loc
1
loc
2
loc
3
loc
4
loc
5
loc
6
loc
1
loc
2
loc
3
loc
4
loc
5
loc
6
loc
1
loc
2
loc
3
loc
4
loc
5
loc
6
loc
1
loc
2
loc
3
loc
4
loc
5
loc
6
loc
1
loc
2
loc
3
loc
4
loc
5
loc
6
loc
1
loc
2
loc
3
loc
4
loc
5
loc
6
loc
1
loc
2
loc
3
loc
4
loc
5
loc
6
Baseline_Bosch_task10 Baseline_Bosch Baseline2024dcaseT10 5 5.17 0.47 0.45 0.62 0.46 0.48 0.82 0.48 0.22 0.59 0.25 0.57 0.74 0.23 0.14 0.10 0.09 0.71 0.19 -0.03 0.27 0.44 0.11 0.65 2.45 3.31 1.63 1.70 0.66 1.67 2.69 3.56 1.21 2.21 0.61 1.95 0.73 0.47 0.31 0.55 0.49 0.54 0.78 0.61 0.20 0.73 0.68 0.44
Bai_JLESS_task10_1 Bai_JLESS Bai2024dcaseT10 3 4.44 0.40 0.51 0.62 0.47 0.49 0.82 0.42 0.38 0.60 0.24 0.66 0.72 0.14 0.13 0.23 0.40 0.12 0.73 0.15 0.17 0.38 0.46 0.12 0.71 2.78 3.02 1.58 1.45 0.64 1.76 3.31 2.69 1.17 2.25 0.53 1.91 0.91 0.56 0.30 0.50 0.27 0.50 0.80 0.61 0.18 0.63 0.21 0.46
Betton-Ployon_ACSTB_task10_1 Betton-Ployon_ACSTB Betton2024dcaseT10 9 7.89 0.48 0.11 0.39 0.76 -0.01 0.20 0.46 0.04 0.38 0.78 -0.03 0.26 0.11 0.22 0.06 0.01 0.05 0.10 0.15 -0.04 0.04 0.29 -0.00 0.12 2.41 4.57 2.24 0.92 1.16 5.64 2.75 4.41 2.20 0.94 1.14 5.24 0.76 0.47 0.31 0.55 0.27 1.11 0.74 0.62 0.21 0.56 0.26 1.23
Cai_NCUT_task10_1 Cai_NCUT Cai2024dcaseT10 10 8.14 0.46 0.12 0.57 0.03 0.81 0.46 0.15 0.46 0.22 0.65 0.07 0.01 0.04 0.01 0.56 0.11 0.10 0.17 0.43 2.48 4.37 1.71 2.12 0.93 1.74 2.78 3.38 1.32 2.49 0.84 2.34 0.79 0.48 0.29 0.55 0.24 0.72 0.80 0.64 0.20 0.74 0.22 0.64
Guan_GISP-HEU_task10_1 Guan_HEU_1 Guan2024dcaseT10 2 4.33 0.50 0.61 0.62 0.34 0.56 0.84 0.51 0.54 0.59 0.45 0.48 0.70 0.16 0.19 0.14 -0.08 -0.01 0.72 0.18 0.08 0.33 0.61 0.13 0.57 2.41 2.52 1.62 1.70 0.58 1.63 2.70 2.30 1.15 1.76 0.68 2.11 0.77 0.46 0.32 0.75 0.25 0.52 0.72 0.62 0.20 0.65 0.22 0.55
Guan_GISP-HEU_task10_2 Guan_HEU_2 Guan2024dcaseT10 1 3.98 0.49 0.63 0.62 0.03 0.59 0.84 0.50 0.50 0.60 0.39 0.60 0.72 0.22 0.06 0.15 0.03 0.41 0.67 0.19 0.29 0.36 0.53 0.39 0.56 2.41 2.58 1.62 2.22 0.56 1.53 2.65 2.64 1.17 1.79 0.56 2.01 0.74 0.57 0.44 0.73 0.16 0.56 0.74 0.59 0.20 0.65 0.22 0.56
Guan_GISP-HEU_task10_3 Guan_HEU_3 Guan2024dcaseT10 7 6.27 0.50 0.45 0.60 0.22 0.02 0.63 0.51 0.41 0.55 0.29 -0.09 0.61 0.23 0.26 0.06 0.23 0.12 0.40 0.22 0.11 0.24 0.36 0.28 0.28 2.52 3.09 1.67 1.89 0.93 3.23 2.78 3.16 1.22 1.96 0.93 2.61 0.81 0.63 0.30 0.53 0.22 0.86 0.75 0.61 0.29 0.72 0.21 0.71
Park_KT_task10_1 Park_KT_1 Park2024dcaseT10 8 6.85 0.42 0.45 0.57 0.32 0.51 0.81 0.42 0.03 0.56 0.21 0.40 0.71 0.18 0.00 0.07 -0.24 0.18 0.68 0.13 0.13 0.21 0.43 0.03 0.61 2.63 3.50 1.73 1.91 0.66 1.82 2.93 4.37 1.24 2.51 0.71 1.99 0.99 0.48 0.32 0.59 0.23 0.52 0.90 0.61 0.18 0.65 0.24 0.46
Park_KT_task10_2 Park_KT_2 Park2024dcaseT10 11 8.21 0.42 0.44 0.44 0.34 0.46 0.59 0.43 0.00 0.39 0.06 0.34 0.54 0.22 0.08 0.04 -0.34 0.16 0.24 0.24 -0.02 0.14 0.25 0.05 0.17 2.72 3.76 2.05 1.93 0.69 3.16 2.86 4.57 1.46 2.68 0.72 3.10 0.75 0.46 0.32 0.58 0.25 0.94 0.71 0.62 0.23 0.72 0.26 0.79
Park_KT_task10_3 Park_KT_3 Park2024dcaseT10 6 5.67 0.46 0.57 0.59 0.40 0.62 0.81 0.46 0.08 0.58 0.17 0.60 0.71 0.16 -0.01 0.08 -0.28 0.19 0.72 0.13 0.02 0.15 0.31 0.08 0.63 2.52 2.73 1.69 1.82 0.54 1.77 2.77 3.97 1.20 2.51 0.55 2.02 0.80 0.48 0.29 0.57 0.19 0.53 0.77 0.62 0.18 0.71 0.22 0.46
Takahashi_TMU-NEE_task10_1 Takahashi_TMU Takahashi2024dcaseT10 4 4.77 0.50 0.65 0.62 -0.22 0.42 0.87 0.51 0.62 0.61 0.10 0.28 0.76 0.20 0.15 -0.03 0.16 -0.06 0.82 0.22 0.09 0.01 0.37 -0.09 0.77 2.39 2.63 1.60 3.08 0.69 1.26 2.69 3.71 1.14 2.46 0.81 1.75 0.74 0.49 0.30 0.42 0.21 0.41 0.71 0.63 0.21 0.83 0.29 0.40

Teams ranking

Submission information Rank Kendall's Tau car left per location
Kendall's Tau car right per location
Kendall's Tau cv left per location
Kendall's Tau cv right per location
RMSE car left per location
RMSE car right per location
RMSE cv left per location
RMSE cv right per location
Rank Submission label Name Technical
Report
Official
rank
Rank
score
loc
1
loc
2
loc
3
loc
4
loc
5
loc
6
loc
1
loc
2
loc
3
loc
4
loc
5
loc
6
loc
1
loc
2
loc
3
loc
4
loc
5
loc
6
loc
1
loc
2
loc
3
loc
4
loc
5
loc
6
loc
1
loc
2
loc
3
loc
4
loc
5
loc
6
loc
1
loc
2
loc
3
loc
4
loc
5
loc
6
loc
1
loc
2
loc
3
loc
4
loc
5
loc
6
loc
1
loc
2
loc
3
loc
4
loc
5
loc
6
Baseline_Bosch_task10 Baseline_Bosch Baseline2024dcaseT10 4 5.17 0.47 0.45 0.62 0.46 0.48 0.82 0.48 0.22 0.59 0.25 0.57 0.74 0.23 0.14 0.10 0.09 0.71 0.19 -0.03 0.27 0.44 0.11 0.65 2.45 3.31 1.63 1.70 0.66 1.67 2.69 3.56 1.21 2.21 0.61 1.95 0.73 0.47 0.31 0.55 0.49 0.54 0.78 0.61 0.20 0.73 0.68 0.44
Bai_JLESS_task10_1 Bai_JLESS Bai2024dcaseT10 2 4.44 0.40 0.51 0.62 0.47 0.49 0.82 0.42 0.38 0.60 0.24 0.66 0.72 0.14 0.13 0.23 0.40 0.12 0.73 0.15 0.17 0.38 0.46 0.12 0.71 2.78 3.02 1.58 1.45 0.64 1.76 3.31 2.69 1.17 2.25 0.53 1.91 0.91 0.56 0.30 0.50 0.27 0.50 0.80 0.61 0.18 0.63 0.21 0.46
Betton-Ployon_ACSTB_task10_1 Betton-Ployon_ACSTB Betton2024dcaseT10 6 7.89 0.48 0.11 0.39 0.76 -0.01 0.20 0.46 0.04 0.38 0.78 -0.03 0.26 0.11 0.22 0.06 0.01 0.05 0.10 0.15 -0.04 0.04 0.29 -0.00 0.12 2.41 4.57 2.24 0.92 1.16 5.64 2.75 4.41 2.20 0.94 1.14 5.24 0.76 0.47 0.31 0.55 0.27 1.11 0.74 0.62 0.21 0.56 0.26 1.23
Cai_NCUT_task10_1 Cai_NCUT Cai2024dcaseT10 7 8.14 0.46 0.12 0.57 0.03 0.81 0.46 0.15 0.46 0.22 0.65 0.07 0.01 0.04 0.01 0.56 0.11 0.10 0.17 0.43 2.48 4.37 1.71 2.12 0.93 1.74 2.78 3.38 1.32 2.49 0.84 2.34 0.79 0.48 0.29 0.55 0.24 0.72 0.80 0.64 0.20 0.74 0.22 0.64
Guan_GISP-HEU_task10_2 Guan_HEU_2 Guan2024dcaseT10 1 3.98 0.49 0.63 0.62 0.03 0.59 0.84 0.50 0.50 0.60 0.39 0.60 0.72 0.22 0.06 0.15 0.03 0.41 0.67 0.19 0.29 0.36 0.53 0.39 0.56 2.41 2.58 1.62 2.22 0.56 1.53 2.65 2.64 1.17 1.79 0.56 2.01 0.74 0.57 0.44 0.73 0.16 0.56 0.74 0.59 0.20 0.65 0.22 0.56
Park_KT_task10_3 Park_KT_3 Park2024dcaseT10 5 5.67 0.46 0.57 0.59 0.40 0.62 0.81 0.46 0.08 0.58 0.17 0.60 0.71 0.16 -0.01 0.08 -0.28 0.19 0.72 0.13 0.02 0.15 0.31 0.08 0.63 2.52 2.73 1.69 1.82 0.54 1.77 2.77 3.97 1.20 2.51 0.55 2.02 0.80 0.48 0.29 0.57 0.19 0.53 0.77 0.62 0.18 0.71 0.22 0.46
Takahashi_TMU-NEE_task10_1 Takahashi_TMU Takahashi2024dcaseT10 3 4.77 0.50 0.65 0.62 -0.22 0.42 0.87 0.51 0.62 0.61 0.10 0.28 0.76 0.20 0.15 -0.03 0.16 -0.06 0.82 0.22 0.09 0.01 0.37 -0.09 0.77 2.39 2.63 1.60 3.08 0.69 1.26 2.69 3.71 1.14 2.46 0.81 1.75 0.74 0.49 0.30 0.42 0.21 0.41 0.71 0.63 0.21 0.83 0.29 0.40

System characteristics

Summary of the submitted system characteristics.

Rank Submission
label
Technical
Report
Input sampling rate Acoustic
Features
Data
Augmentation
Model Pipeline System complexity External Data
5 Baseline_Bosch_task10 Baseline2024dcaseT10 16kHz Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram Acoustic Traffic Simulation CRNN pre-training, fine-tuning
3 Bai_JLESS_task10_1 Bai2024dcaseT10 16kHz Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram CRNN training
9 Betton-Ployon_ACSTB_task10_1 Betton2024dcaseT10 16kHz Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram CRNN training external non-supervised counting algorithm
10 Cai_NCUT_task10_1 Cai2024dcaseT10 96kHz Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram CRNN training
2 Guan_GISP-HEU_task10_1 Guan2024dcaseT10 16kHz Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram PANNs, GAT training pre-trained model
1 Guan_GISP-HEU_task10_2 Guan2024dcaseT10 16kHz Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram SpecAugment PANNs, GAT training pre-trained model
7 Guan_GISP-HEU_task10_3 Guan2024dcaseT10 16kHz Log Mel Spectrogram SpecAugment, phase shifting PANNs, GAT training pre-trained model
8 Park_KT_task10_1 Park2024dcaseT10 16kHz Spectrogram, STHD (Short-Time Homomorphic Deconvolution) Simulation Sound Synthesis CRNN pre-training, transfer learning pre-trained model
11 Park_KT_task10_2 Park2024dcaseT10 16kHz Spectrogram, STHD (Short-Time Homomorphic Deconvolution) Simulation Sound Synthesis CRNN pre-training, transfer learning pre-trained model
6 Park_KT_task10_3 Park2024dcaseT10 16kHz Spectrogram, STHD (Short-Time Homomorphic Deconvolution) Simulation Sound Synthesis CRNN pre-training, transfer learning pre-trained model
4 Takahashi_TMU-NEE_task10_1 Takahashi2024dcaseT10 16kHz Log Power Spectrogram and Cosine-Sine of Phase Difference CRNN training, matching loss



Technical reports

JLESS SUBMISSION TO DCASE2024 TASK10: AN ACOUSTIC-BASED TRAFFIC MONITORING SOLUTION

Dongzhe Zhang, Jisheng Bai, Jianfeng Chen
Northwestern Polytechnical University, China

Abstract

In this technical report, we describe our proposed system for the traffic monitoring challenge. Our solution addresses the critical need for efficient traffic monitoring systems in smart city development, leveraging the advantages of acoustic sensors. Initially, we review various sensor types used in traffic monitoring, emphasizing the benefits of acoustic sensors such as low cost, power efficiency, and robustness in adverse conditions. Given the challenges of collecting and labeling real-world traffic data, we incorporate synthetic data generated via the pyroadacoustics simulator to enhance system performance. We employ multiple data augmentation techniques to create a balanced and comprehensive training dataset. Our approach also includes detailed metadata integration, which provides sensor location IDs, timestamps, sensor array geometry, and vehicle counts. During the training phase, we implement several strategies to improve the system generalization in real-world environments. Our results demonstrate that the proposed system significantly outperforms baseline models in accurately detecting and classifying traffic events, validating the efficacy of our approach using both real and synthetic data.

System characteristics
Acoustic features Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram
PDF

Sound of Traffic: A Dataset for Acoustic Traffic Identification and Counting

Shabnam Ghaffarzadegan, Luca Bondi, Wei-Cheng Lin, Abinaya Kumar, Ho-Hsiang Wu, Hans-Georg Horst, Samarjit Das
Bosch Research, USA

Abstract

We introduce soundoftraffic, the largest publicly available dataset for traffic identification and counting to date. With over 415 hours of multichannel acoustic traffic data recorded in six different locations, it encompasses varying levels of traffic density and environmental conditions. In this work, we discuss strategies for automatic collection and alignment of large amount of labeled data, leveraging existing asynchronous urban sensors such as radar, cameras, and inductive coils. In addition to the dataset, we propose a simple baseline system for vehicle counting divided by type of the vehicle (passenger vs. commercial vehicle) and direction of travel (right-to-left and left-to-right), a fundamental task for traffic analysis. The dataset and baseline system serve as a starting point for researchers to develop more advanced algorithms and models in this field. The dataset can be accessed at https://zenodo.org/records/10700792 and https://zenodo.org/records/11209838.

System characteristics
Acoustic features Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram
Data augmentation Acoustic Traffic Simulation

TRAFFIC COUNTING SYSTEM LEVERAGED WITH A NON-SUPERVISED COUNTING APPROACH

Erwann Betton-Ployon, Abbes Kacem, Jerome Mars
ACOUSTB, France

Abstract

To face the challenges of urban mobility optimisation, safety and disturbance reduction, traffic monitoring flourishes around anthropized areas. Acoustic monitoring can provide a cost-effective traffic counting system, besides using it as a noise monitoring process. One would expect a traffic monitoring system to identify direction and vehicle type while counting pass-bys on audio segments. Main difficulties are related to the variety of sound landscapes and sources near roadways. Generalisation among recording sites is delicate, and the accuracy depends on the amount of labelled data available per site. In this work, we introduce a non-supervised traffic counting algorithm to complement the existing supervised models. Our traffic counting algorithm uses the recording site metadata to estimate a standard GCC-Phat mask for any pass-by. This mask is applied on the cross-correlation signal of the 4 audio channels, permitting a pass-by detection with direction identification. This information is transmitted to the supervised model, which eventually refines its initial output. The addition of our algorithm counting estimation is highly effective on sites with few available labelled data. A significant RMSE reduction is observed when total duration of real labelled data is inferior to 2 hours.

System characteristics
Acoustic features Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram
PDF

DCASE 2024 CHALLENGE TASK10 TECHNICAL REPORT

Zhilong Jiang, Xichang Cai, Ziyi Liu, Menglong Wu
North China University of Technology, China

Abstract

This technical report describes our approach to Challenge 10 of DCASE 2024: acoustic based traffic monitoring. In our work, we use Mel spectrogram and Vgg11 algorithm to extract catego-ry features of vehicles in sound, while using GCC-PATH algo-rithm and CNN algorithm to extract directional features of vehi-cles in sound. Meanwhile, we also optimize the experimental results by continuously adjusting the parameters in the algorithm.

System characteristics
Acoustic features Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram
PDF

FINE-GRAINED AUDIO FEATURE REPRESENTATION WITH PRETRAINED MODEL AND GRAPH ATTENTION FOR TRAFFIC FLOW MONITORING

Shitong Fan, Feiyang Xiao, Shuhan Qi, Qiaoxi Zhu, Wenwu Wang, Jian Guan
Harbin Engineering University, China

Abstract

This technical report describes our submission for DCASE 2024 Challenge Task 10. To enhance audio feature representation for audio event detection, we use pre-trained audio neural networks (PANNs) for audio feature pretraining and a graph attention module (GAT) for audio feature fine-tuning to capture important temporal relations and learn the dependencies among audio features across different time frames. Thus, our method can capture important audio event information in the audio signals, and provide fine-grained audio representation for vehicle type detection. We use this finegrained feature instead of the feature branch in the original baseline to build our systems. In our systems, we apply the SpecAugment strategy for audio data augmentation and introduce an overall phase shift to explore the directional information. Experimental results indicate that our systems show some improved performance among the six locations in the evaluation, except for location 1.

System characteristics
Acoustic features Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram
Data augmentation SpecAugment
PDF

Deep Acoustic Vehicle Counting Model with Short-Time Homomorphic Deconvolution

Yeonseok Park, and TaeWoon Yeo, Baeksan On
KT Corporation, South Korea

Abstract

In the design of urban traffic monitoring solutions aimed at opti-mizing logistics infrastructure, acoustic vehicle counting models have gained attention for their cost-effectiveness and energy effi-ciency. While deep learning has proven effective in visual traffic monitoring, its application in the auditory domain remains under-explored due to the limited availability of real-world data. This study proposes the use of Short Time Homomorphic Deconvolu-tion (STHD) for analyzing sound signals to estimate the direction of vehicle sounds. This algorithm calculates distances between microphones based on sound direction, facilitating the inference of sound direction and movement. We present a strategy for de-signing and training a deep learning model that leverages features derived from this algorithm. The proposed system simultaneously counts cars and commercial vehicles on a two-lane road under moderate traffic density conditions, accurately identifying their directions of travel.

System characteristics
Acoustic features Spectrogram, STHD (Short-Time Homomorphic Deconvolution)
Data augmentation Simulation Sound Synthesis
PDF

NEURAL NETWORK TRAINING WITH MATCHING LOSS FOR RANKING FUNCTION

Tomohiro Takahashi, Natsuki Ueno, Yuma Kinoshita, Yukoh Wakabayashi, Nobutaka Ono, Makiho Sukekawa, Seishi Fukuma, Hiroshi Nakagawa
Tokyo Metropolitan University, Japan

Abstract

In this report, we summarize our approach for DCASE 2024 Challenge Task 10, acoustic-based traffic monitoring. Our approach consists of two improvements from the baseline system. One is the introduction of the matching loss for the ranking function to the loss function of the Convolutional Recurrent Neural Network (CRNN), which aims to improve the Kendall’s Tau Rank Correlation (KTRC). The results indicate that it is also effective in improving the Root Mean Square Error (RMSE). The other improvement is a change in the input features. We also report the estimation performance for the development datasets.

System characteristics
Acoustic features Log Power Spectrogram and Cosine-Sine of Phase Difference
PDF