Task description
This task aims to design an acoustic-based traffic monitoring solution an essential parts of smart city development to monitor the usage and condition of roadway infrastructures and detect anomalies. The challenge focuses on developng models to count the number of vehicles, per vehicle type (car or commercial vehicle) and per direction of travel (left or right).
More detailed task description can be found in the task description page
Systems ranking
Submission information | Rank | Kendall's Tau car left per location |
Kendall's Tau car right per location |
Kendall's Tau cv left per location |
Kendall's Tau cv right per location |
RMSE car left per location |
RMSE car right per location |
RMSE cv left per location |
RMSE cv right per location |
||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Rank | Submission label | Name |
Technical Report |
Official rank |
Rank score |
loc 1 |
loc 2 |
loc 3 |
loc 4 |
loc 5 |
loc 6 |
loc 1 |
loc 2 |
loc 3 |
loc 4 |
loc 5 |
loc 6 |
loc 1 |
loc 2 |
loc 3 |
loc 4 |
loc 5 |
loc 6 |
loc 1 |
loc 2 |
loc 3 |
loc 4 |
loc 5 |
loc 6 |
loc 1 |
loc 2 |
loc 3 |
loc 4 |
loc 5 |
loc 6 |
loc 1 |
loc 2 |
loc 3 |
loc 4 |
loc 5 |
loc 6 |
loc 1 |
loc 2 |
loc 3 |
loc 4 |
loc 5 |
loc 6 |
loc 1 |
loc 2 |
loc 3 |
loc 4 |
loc 5 |
loc 6 |
Baseline_Bosch_task10 | Baseline_Bosch | Baseline2024dcaseT10 | 5 | 5.17 | 0.47 | 0.45 | 0.62 | 0.46 | 0.48 | 0.82 | 0.48 | 0.22 | 0.59 | 0.25 | 0.57 | 0.74 | 0.23 | 0.14 | 0.10 | 0.09 | 0.71 | 0.19 | -0.03 | 0.27 | 0.44 | 0.11 | 0.65 | 2.45 | 3.31 | 1.63 | 1.70 | 0.66 | 1.67 | 2.69 | 3.56 | 1.21 | 2.21 | 0.61 | 1.95 | 0.73 | 0.47 | 0.31 | 0.55 | 0.49 | 0.54 | 0.78 | 0.61 | 0.20 | 0.73 | 0.68 | 0.44 | ||
Bai_JLESS_task10_1 | Bai_JLESS | Bai2024dcaseT10 | 3 | 4.44 | 0.40 | 0.51 | 0.62 | 0.47 | 0.49 | 0.82 | 0.42 | 0.38 | 0.60 | 0.24 | 0.66 | 0.72 | 0.14 | 0.13 | 0.23 | 0.40 | 0.12 | 0.73 | 0.15 | 0.17 | 0.38 | 0.46 | 0.12 | 0.71 | 2.78 | 3.02 | 1.58 | 1.45 | 0.64 | 1.76 | 3.31 | 2.69 | 1.17 | 2.25 | 0.53 | 1.91 | 0.91 | 0.56 | 0.30 | 0.50 | 0.27 | 0.50 | 0.80 | 0.61 | 0.18 | 0.63 | 0.21 | 0.46 | |
Betton-Ployon_ACSTB_task10_1 | Betton-Ployon_ACSTB | Betton2024dcaseT10 | 9 | 7.89 | 0.48 | 0.11 | 0.39 | 0.76 | -0.01 | 0.20 | 0.46 | 0.04 | 0.38 | 0.78 | -0.03 | 0.26 | 0.11 | 0.22 | 0.06 | 0.01 | 0.05 | 0.10 | 0.15 | -0.04 | 0.04 | 0.29 | -0.00 | 0.12 | 2.41 | 4.57 | 2.24 | 0.92 | 1.16 | 5.64 | 2.75 | 4.41 | 2.20 | 0.94 | 1.14 | 5.24 | 0.76 | 0.47 | 0.31 | 0.55 | 0.27 | 1.11 | 0.74 | 0.62 | 0.21 | 0.56 | 0.26 | 1.23 | |
Cai_NCUT_task10_1 | Cai_NCUT | Cai2024dcaseT10 | 10 | 8.14 | 0.46 | 0.12 | 0.57 | 0.03 | 0.81 | 0.46 | 0.15 | 0.46 | 0.22 | 0.65 | 0.07 | 0.01 | 0.04 | 0.01 | 0.56 | 0.11 | 0.10 | 0.17 | 0.43 | 2.48 | 4.37 | 1.71 | 2.12 | 0.93 | 1.74 | 2.78 | 3.38 | 1.32 | 2.49 | 0.84 | 2.34 | 0.79 | 0.48 | 0.29 | 0.55 | 0.24 | 0.72 | 0.80 | 0.64 | 0.20 | 0.74 | 0.22 | 0.64 | ||||||
Guan_GISP-HEU_task10_1 | Guan_HEU_1 | Guan2024dcaseT10 | 2 | 4.33 | 0.50 | 0.61 | 0.62 | 0.34 | 0.56 | 0.84 | 0.51 | 0.54 | 0.59 | 0.45 | 0.48 | 0.70 | 0.16 | 0.19 | 0.14 | -0.08 | -0.01 | 0.72 | 0.18 | 0.08 | 0.33 | 0.61 | 0.13 | 0.57 | 2.41 | 2.52 | 1.62 | 1.70 | 0.58 | 1.63 | 2.70 | 2.30 | 1.15 | 1.76 | 0.68 | 2.11 | 0.77 | 0.46 | 0.32 | 0.75 | 0.25 | 0.52 | 0.72 | 0.62 | 0.20 | 0.65 | 0.22 | 0.55 | |
Guan_GISP-HEU_task10_2 | Guan_HEU_2 | Guan2024dcaseT10 | 1 | 3.98 | 0.49 | 0.63 | 0.62 | 0.03 | 0.59 | 0.84 | 0.50 | 0.50 | 0.60 | 0.39 | 0.60 | 0.72 | 0.22 | 0.06 | 0.15 | 0.03 | 0.41 | 0.67 | 0.19 | 0.29 | 0.36 | 0.53 | 0.39 | 0.56 | 2.41 | 2.58 | 1.62 | 2.22 | 0.56 | 1.53 | 2.65 | 2.64 | 1.17 | 1.79 | 0.56 | 2.01 | 0.74 | 0.57 | 0.44 | 0.73 | 0.16 | 0.56 | 0.74 | 0.59 | 0.20 | 0.65 | 0.22 | 0.56 | |
Guan_GISP-HEU_task10_3 | Guan_HEU_3 | Guan2024dcaseT10 | 7 | 6.27 | 0.50 | 0.45 | 0.60 | 0.22 | 0.02 | 0.63 | 0.51 | 0.41 | 0.55 | 0.29 | -0.09 | 0.61 | 0.23 | 0.26 | 0.06 | 0.23 | 0.12 | 0.40 | 0.22 | 0.11 | 0.24 | 0.36 | 0.28 | 0.28 | 2.52 | 3.09 | 1.67 | 1.89 | 0.93 | 3.23 | 2.78 | 3.16 | 1.22 | 1.96 | 0.93 | 2.61 | 0.81 | 0.63 | 0.30 | 0.53 | 0.22 | 0.86 | 0.75 | 0.61 | 0.29 | 0.72 | 0.21 | 0.71 | |
Park_KT_task10_1 | Park_KT_1 | Park2024dcaseT10 | 8 | 6.85 | 0.42 | 0.45 | 0.57 | 0.32 | 0.51 | 0.81 | 0.42 | 0.03 | 0.56 | 0.21 | 0.40 | 0.71 | 0.18 | 0.00 | 0.07 | -0.24 | 0.18 | 0.68 | 0.13 | 0.13 | 0.21 | 0.43 | 0.03 | 0.61 | 2.63 | 3.50 | 1.73 | 1.91 | 0.66 | 1.82 | 2.93 | 4.37 | 1.24 | 2.51 | 0.71 | 1.99 | 0.99 | 0.48 | 0.32 | 0.59 | 0.23 | 0.52 | 0.90 | 0.61 | 0.18 | 0.65 | 0.24 | 0.46 | |
Park_KT_task10_2 | Park_KT_2 | Park2024dcaseT10 | 11 | 8.21 | 0.42 | 0.44 | 0.44 | 0.34 | 0.46 | 0.59 | 0.43 | 0.00 | 0.39 | 0.06 | 0.34 | 0.54 | 0.22 | 0.08 | 0.04 | -0.34 | 0.16 | 0.24 | 0.24 | -0.02 | 0.14 | 0.25 | 0.05 | 0.17 | 2.72 | 3.76 | 2.05 | 1.93 | 0.69 | 3.16 | 2.86 | 4.57 | 1.46 | 2.68 | 0.72 | 3.10 | 0.75 | 0.46 | 0.32 | 0.58 | 0.25 | 0.94 | 0.71 | 0.62 | 0.23 | 0.72 | 0.26 | 0.79 | |
Park_KT_task10_3 | Park_KT_3 | Park2024dcaseT10 | 6 | 5.67 | 0.46 | 0.57 | 0.59 | 0.40 | 0.62 | 0.81 | 0.46 | 0.08 | 0.58 | 0.17 | 0.60 | 0.71 | 0.16 | -0.01 | 0.08 | -0.28 | 0.19 | 0.72 | 0.13 | 0.02 | 0.15 | 0.31 | 0.08 | 0.63 | 2.52 | 2.73 | 1.69 | 1.82 | 0.54 | 1.77 | 2.77 | 3.97 | 1.20 | 2.51 | 0.55 | 2.02 | 0.80 | 0.48 | 0.29 | 0.57 | 0.19 | 0.53 | 0.77 | 0.62 | 0.18 | 0.71 | 0.22 | 0.46 | |
Takahashi_TMU-NEE_task10_1 | Takahashi_TMU | Takahashi2024dcaseT10 | 4 | 4.77 | 0.50 | 0.65 | 0.62 | -0.22 | 0.42 | 0.87 | 0.51 | 0.62 | 0.61 | 0.10 | 0.28 | 0.76 | 0.20 | 0.15 | -0.03 | 0.16 | -0.06 | 0.82 | 0.22 | 0.09 | 0.01 | 0.37 | -0.09 | 0.77 | 2.39 | 2.63 | 1.60 | 3.08 | 0.69 | 1.26 | 2.69 | 3.71 | 1.14 | 2.46 | 0.81 | 1.75 | 0.74 | 0.49 | 0.30 | 0.42 | 0.21 | 0.41 | 0.71 | 0.63 | 0.21 | 0.83 | 0.29 | 0.40 |
Teams ranking
Submission information | Rank | Kendall's Tau car left per location |
Kendall's Tau car right per location |
Kendall's Tau cv left per location |
Kendall's Tau cv right per location |
RMSE car left per location |
RMSE car right per location |
RMSE cv left per location |
RMSE cv right per location |
||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Rank | Submission label | Name |
Technical Report |
Official rank |
Rank score |
loc 1 |
loc 2 |
loc 3 |
loc 4 |
loc 5 |
loc 6 |
loc 1 |
loc 2 |
loc 3 |
loc 4 |
loc 5 |
loc 6 |
loc 1 |
loc 2 |
loc 3 |
loc 4 |
loc 5 |
loc 6 |
loc 1 |
loc 2 |
loc 3 |
loc 4 |
loc 5 |
loc 6 |
loc 1 |
loc 2 |
loc 3 |
loc 4 |
loc 5 |
loc 6 |
loc 1 |
loc 2 |
loc 3 |
loc 4 |
loc 5 |
loc 6 |
loc 1 |
loc 2 |
loc 3 |
loc 4 |
loc 5 |
loc 6 |
loc 1 |
loc 2 |
loc 3 |
loc 4 |
loc 5 |
loc 6 |
Baseline_Bosch_task10 | Baseline_Bosch | Baseline2024dcaseT10 | 4 | 5.17 | 0.47 | 0.45 | 0.62 | 0.46 | 0.48 | 0.82 | 0.48 | 0.22 | 0.59 | 0.25 | 0.57 | 0.74 | 0.23 | 0.14 | 0.10 | 0.09 | 0.71 | 0.19 | -0.03 | 0.27 | 0.44 | 0.11 | 0.65 | 2.45 | 3.31 | 1.63 | 1.70 | 0.66 | 1.67 | 2.69 | 3.56 | 1.21 | 2.21 | 0.61 | 1.95 | 0.73 | 0.47 | 0.31 | 0.55 | 0.49 | 0.54 | 0.78 | 0.61 | 0.20 | 0.73 | 0.68 | 0.44 | ||
Bai_JLESS_task10_1 | Bai_JLESS | Bai2024dcaseT10 | 2 | 4.44 | 0.40 | 0.51 | 0.62 | 0.47 | 0.49 | 0.82 | 0.42 | 0.38 | 0.60 | 0.24 | 0.66 | 0.72 | 0.14 | 0.13 | 0.23 | 0.40 | 0.12 | 0.73 | 0.15 | 0.17 | 0.38 | 0.46 | 0.12 | 0.71 | 2.78 | 3.02 | 1.58 | 1.45 | 0.64 | 1.76 | 3.31 | 2.69 | 1.17 | 2.25 | 0.53 | 1.91 | 0.91 | 0.56 | 0.30 | 0.50 | 0.27 | 0.50 | 0.80 | 0.61 | 0.18 | 0.63 | 0.21 | 0.46 | |
Betton-Ployon_ACSTB_task10_1 | Betton-Ployon_ACSTB | Betton2024dcaseT10 | 6 | 7.89 | 0.48 | 0.11 | 0.39 | 0.76 | -0.01 | 0.20 | 0.46 | 0.04 | 0.38 | 0.78 | -0.03 | 0.26 | 0.11 | 0.22 | 0.06 | 0.01 | 0.05 | 0.10 | 0.15 | -0.04 | 0.04 | 0.29 | -0.00 | 0.12 | 2.41 | 4.57 | 2.24 | 0.92 | 1.16 | 5.64 | 2.75 | 4.41 | 2.20 | 0.94 | 1.14 | 5.24 | 0.76 | 0.47 | 0.31 | 0.55 | 0.27 | 1.11 | 0.74 | 0.62 | 0.21 | 0.56 | 0.26 | 1.23 | |
Cai_NCUT_task10_1 | Cai_NCUT | Cai2024dcaseT10 | 7 | 8.14 | 0.46 | 0.12 | 0.57 | 0.03 | 0.81 | 0.46 | 0.15 | 0.46 | 0.22 | 0.65 | 0.07 | 0.01 | 0.04 | 0.01 | 0.56 | 0.11 | 0.10 | 0.17 | 0.43 | 2.48 | 4.37 | 1.71 | 2.12 | 0.93 | 1.74 | 2.78 | 3.38 | 1.32 | 2.49 | 0.84 | 2.34 | 0.79 | 0.48 | 0.29 | 0.55 | 0.24 | 0.72 | 0.80 | 0.64 | 0.20 | 0.74 | 0.22 | 0.64 | ||||||
Guan_GISP-HEU_task10_2 | Guan_HEU_2 | Guan2024dcaseT10 | 1 | 3.98 | 0.49 | 0.63 | 0.62 | 0.03 | 0.59 | 0.84 | 0.50 | 0.50 | 0.60 | 0.39 | 0.60 | 0.72 | 0.22 | 0.06 | 0.15 | 0.03 | 0.41 | 0.67 | 0.19 | 0.29 | 0.36 | 0.53 | 0.39 | 0.56 | 2.41 | 2.58 | 1.62 | 2.22 | 0.56 | 1.53 | 2.65 | 2.64 | 1.17 | 1.79 | 0.56 | 2.01 | 0.74 | 0.57 | 0.44 | 0.73 | 0.16 | 0.56 | 0.74 | 0.59 | 0.20 | 0.65 | 0.22 | 0.56 | |
Park_KT_task10_3 | Park_KT_3 | Park2024dcaseT10 | 5 | 5.67 | 0.46 | 0.57 | 0.59 | 0.40 | 0.62 | 0.81 | 0.46 | 0.08 | 0.58 | 0.17 | 0.60 | 0.71 | 0.16 | -0.01 | 0.08 | -0.28 | 0.19 | 0.72 | 0.13 | 0.02 | 0.15 | 0.31 | 0.08 | 0.63 | 2.52 | 2.73 | 1.69 | 1.82 | 0.54 | 1.77 | 2.77 | 3.97 | 1.20 | 2.51 | 0.55 | 2.02 | 0.80 | 0.48 | 0.29 | 0.57 | 0.19 | 0.53 | 0.77 | 0.62 | 0.18 | 0.71 | 0.22 | 0.46 | |
Takahashi_TMU-NEE_task10_1 | Takahashi_TMU | Takahashi2024dcaseT10 | 3 | 4.77 | 0.50 | 0.65 | 0.62 | -0.22 | 0.42 | 0.87 | 0.51 | 0.62 | 0.61 | 0.10 | 0.28 | 0.76 | 0.20 | 0.15 | -0.03 | 0.16 | -0.06 | 0.82 | 0.22 | 0.09 | 0.01 | 0.37 | -0.09 | 0.77 | 2.39 | 2.63 | 1.60 | 3.08 | 0.69 | 1.26 | 2.69 | 3.71 | 1.14 | 2.46 | 0.81 | 1.75 | 0.74 | 0.49 | 0.30 | 0.42 | 0.21 | 0.41 | 0.71 | 0.63 | 0.21 | 0.83 | 0.29 | 0.40 |
System characteristics
Summary of the submitted system characteristics.
Rank |
Submission label |
Technical Report |
Input sampling rate |
Acoustic Features |
Data Augmentation |
Model | Pipeline | System complexity | External Data |
---|---|---|---|---|---|---|---|---|---|
5 | Baseline_Bosch_task10 | Baseline2024dcaseT10 | 16kHz | Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram | Acoustic Traffic Simulation | CRNN | pre-training, fine-tuning | ||
3 | Bai_JLESS_task10_1 | Bai2024dcaseT10 | 16kHz | Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram | CRNN | training | |||
9 | Betton-Ployon_ACSTB_task10_1 | Betton2024dcaseT10 | 16kHz | Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram | CRNN | training | external non-supervised counting algorithm | ||
10 | Cai_NCUT_task10_1 | Cai2024dcaseT10 | 96kHz | Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram | CRNN | training | |||
2 | Guan_GISP-HEU_task10_1 | Guan2024dcaseT10 | 16kHz | Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram | PANNs, GAT | training | pre-trained model | ||
1 | Guan_GISP-HEU_task10_2 | Guan2024dcaseT10 | 16kHz | Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram | SpecAugment | PANNs, GAT | training | pre-trained model | |
7 | Guan_GISP-HEU_task10_3 | Guan2024dcaseT10 | 16kHz | Log Mel Spectrogram | SpecAugment, phase shifting | PANNs, GAT | training | pre-trained model | |
8 | Park_KT_task10_1 | Park2024dcaseT10 | 16kHz | Spectrogram, STHD (Short-Time Homomorphic Deconvolution) | Simulation Sound Synthesis | CRNN | pre-training, transfer learning | pre-trained model | |
11 | Park_KT_task10_2 | Park2024dcaseT10 | 16kHz | Spectrogram, STHD (Short-Time Homomorphic Deconvolution) | Simulation Sound Synthesis | CRNN | pre-training, transfer learning | pre-trained model | |
6 | Park_KT_task10_3 | Park2024dcaseT10 | 16kHz | Spectrogram, STHD (Short-Time Homomorphic Deconvolution) | Simulation Sound Synthesis | CRNN | pre-training, transfer learning | pre-trained model | |
4 | Takahashi_TMU-NEE_task10_1 | Takahashi2024dcaseT10 | 16kHz | Log Power Spectrogram and Cosine-Sine of Phase Difference | CRNN | training, matching loss |
Technical reports
JLESS SUBMISSION TO DCASE2024 TASK10: AN ACOUSTIC-BASED TRAFFIC MONITORING SOLUTION
Dongzhe Zhang, Jisheng Bai, Jianfeng Chen
Northwestern Polytechnical University, China
Bai_JLESS_task10_1
JLESS SUBMISSION TO DCASE2024 TASK10: AN ACOUSTIC-BASED TRAFFIC MONITORING SOLUTION
Dongzhe Zhang, Jisheng Bai, Jianfeng Chen
Northwestern Polytechnical University, China
Abstract
In this technical report, we describe our proposed system for the traffic monitoring challenge. Our solution addresses the critical need for efficient traffic monitoring systems in smart city development, leveraging the advantages of acoustic sensors. Initially, we review various sensor types used in traffic monitoring, emphasizing the benefits of acoustic sensors such as low cost, power efficiency, and robustness in adverse conditions. Given the challenges of collecting and labeling real-world traffic data, we incorporate synthetic data generated via the pyroadacoustics simulator to enhance system performance. We employ multiple data augmentation techniques to create a balanced and comprehensive training dataset. Our approach also includes detailed metadata integration, which provides sensor location IDs, timestamps, sensor array geometry, and vehicle counts. During the training phase, we implement several strategies to improve the system generalization in real-world environments. Our results demonstrate that the proposed system significantly outperforms baseline models in accurately detecting and classifying traffic events, validating the efficacy of our approach using both real and synthetic data.
System characteristics
Acoustic features | Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram |
Sound of Traffic: A Dataset for Acoustic Traffic Identification and Counting
Shabnam Ghaffarzadegan, Luca Bondi, Wei-Cheng Lin, Abinaya Kumar, Ho-Hsiang Wu, Hans-Georg Horst, Samarjit Das
Bosch Research, USA
Baseline_Bosch_task10
Sound of Traffic: A Dataset for Acoustic Traffic Identification and Counting
Shabnam Ghaffarzadegan, Luca Bondi, Wei-Cheng Lin, Abinaya Kumar, Ho-Hsiang Wu, Hans-Georg Horst, Samarjit Das
Bosch Research, USA
Abstract
We introduce soundoftraffic, the largest publicly available dataset for traffic identification and counting to date. With over 415 hours of multichannel acoustic traffic data recorded in six different locations, it encompasses varying levels of traffic density and environmental conditions. In this work, we discuss strategies for automatic collection and alignment of large amount of labeled data, leveraging existing asynchronous urban sensors such as radar, cameras, and inductive coils. In addition to the dataset, we propose a simple baseline system for vehicle counting divided by type of the vehicle (passenger vs. commercial vehicle) and direction of travel (right-to-left and left-to-right), a fundamental task for traffic analysis. The dataset and baseline system serve as a starting point for researchers to develop more advanced algorithms and models in this field. The dataset can be accessed at https://zenodo.org/records/10700792 and https://zenodo.org/records/11209838.
System characteristics
Acoustic features | Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram |
Data augmentation | Acoustic Traffic Simulation |
TRAFFIC COUNTING SYSTEM LEVERAGED WITH A NON-SUPERVISED COUNTING APPROACH
Erwann Betton-Ployon, Abbes Kacem, Jerome Mars
ACOUSTB, France
Abstract
To face the challenges of urban mobility optimisation, safety and disturbance reduction, traffic monitoring flourishes around anthropized areas. Acoustic monitoring can provide a cost-effective traffic counting system, besides using it as a noise monitoring process. One would expect a traffic monitoring system to identify direction and vehicle type while counting pass-bys on audio segments. Main difficulties are related to the variety of sound landscapes and sources near roadways. Generalisation among recording sites is delicate, and the accuracy depends on the amount of labelled data available per site. In this work, we introduce a non-supervised traffic counting algorithm to complement the existing supervised models. Our traffic counting algorithm uses the recording site metadata to estimate a standard GCC-Phat mask for any pass-by. This mask is applied on the cross-correlation signal of the 4 audio channels, permitting a pass-by detection with direction identification. This information is transmitted to the supervised model, which eventually refines its initial output. The addition of our algorithm counting estimation is highly effective on sites with few available labelled data. A significant RMSE reduction is observed when total duration of real labelled data is inferior to 2 hours.
System characteristics
Acoustic features | Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram |
DCASE 2024 CHALLENGE TASK10 TECHNICAL REPORT
Zhilong Jiang, Xichang Cai, Ziyi Liu, Menglong Wu
North China University of Technology, China
Cai_NCUT_task10_1
DCASE 2024 CHALLENGE TASK10 TECHNICAL REPORT
Zhilong Jiang, Xichang Cai, Ziyi Liu, Menglong Wu
North China University of Technology, China
Abstract
This technical report describes our approach to Challenge 10 of DCASE 2024: acoustic based traffic monitoring. In our work, we use Mel spectrogram and Vgg11 algorithm to extract catego-ry features of vehicles in sound, while using GCC-PATH algo-rithm and CNN algorithm to extract directional features of vehi-cles in sound. Meanwhile, we also optimize the experimental results by continuously adjusting the parameters in the algorithm.
System characteristics
Acoustic features | Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram |
FINE-GRAINED AUDIO FEATURE REPRESENTATION WITH PRETRAINED MODEL AND GRAPH ATTENTION FOR TRAFFIC FLOW MONITORING
Shitong Fan, Feiyang Xiao, Shuhan Qi, Qiaoxi Zhu, Wenwu Wang, Jian Guan
Harbin Engineering University, China
Abstract
This technical report describes our submission for DCASE 2024 Challenge Task 10. To enhance audio feature representation for audio event detection, we use pre-trained audio neural networks (PANNs) for audio feature pretraining and a graph attention module (GAT) for audio feature fine-tuning to capture important temporal relations and learn the dependencies among audio features across different time frames. Thus, our method can capture important audio event information in the audio signals, and provide fine-grained audio representation for vehicle type detection. We use this finegrained feature instead of the feature branch in the original baseline to build our systems. In our systems, we apply the SpecAugment strategy for audio data augmentation and introduce an overall phase shift to explore the directional information. Experimental results indicate that our systems show some improved performance among the six locations in the evaluation, except for location 1.
System characteristics
Acoustic features | Generalized Cross-Correlation with Phase transform and Log Mel Spectrogram |
Data augmentation | SpecAugment |
Deep Acoustic Vehicle Counting Model with Short-Time Homomorphic Deconvolution
Yeonseok Park, and TaeWoon Yeo, Baeksan On
KT Corporation, South Korea
Abstract
In the design of urban traffic monitoring solutions aimed at opti-mizing logistics infrastructure, acoustic vehicle counting models have gained attention for their cost-effectiveness and energy effi-ciency. While deep learning has proven effective in visual traffic monitoring, its application in the auditory domain remains under-explored due to the limited availability of real-world data. This study proposes the use of Short Time Homomorphic Deconvolu-tion (STHD) for analyzing sound signals to estimate the direction of vehicle sounds. This algorithm calculates distances between microphones based on sound direction, facilitating the inference of sound direction and movement. We present a strategy for de-signing and training a deep learning model that leverages features derived from this algorithm. The proposed system simultaneously counts cars and commercial vehicles on a two-lane road under moderate traffic density conditions, accurately identifying their directions of travel.
System characteristics
Acoustic features | Spectrogram, STHD (Short-Time Homomorphic Deconvolution) |
Data augmentation | Simulation Sound Synthesis |
NEURAL NETWORK TRAINING WITH MATCHING LOSS FOR RANKING FUNCTION
Tomohiro Takahashi, Natsuki Ueno, Yuma Kinoshita, Yukoh Wakabayashi, Nobutaka Ono, Makiho Sukekawa, Seishi Fukuma, Hiroshi Nakagawa
Tokyo Metropolitan University, Japan
Takahashi_TMU-NEE_task10_1
NEURAL NETWORK TRAINING WITH MATCHING LOSS FOR RANKING FUNCTION
Tomohiro Takahashi, Natsuki Ueno, Yuma Kinoshita, Yukoh Wakabayashi, Nobutaka Ono, Makiho Sukekawa, Seishi Fukuma, Hiroshi Nakagawa
Tokyo Metropolitan University, Japan
Abstract
In this report, we summarize our approach for DCASE 2024 Challenge Task 10, acoustic-based traffic monitoring. Our approach consists of two improvements from the baseline system. One is the introduction of the matching loss for the ranking function to the loss function of the Convolutional Recurrent Neural Network (CRNN), which aims to improve the Kendall’s Tau Rank Correlation (KTRC). The results indicate that it is also effective in improving the Root Mean Square Error (RMSE). The other improvement is a change in the input features. We also report the estimation performance for the development datasets.
System characteristics
Acoustic features | Log Power Spectrogram and Cosine-Sine of Phase Difference |