Mel-spectrogram features for acoustic vehicle detection and speed   estimation

Nikola Bulatovic; Slobodan Djukanovic

arXiv:2204.04013·cs.LG·April 11, 2022

Mel-spectrogram features for acoustic vehicle detection and speed estimation

Nikola Bulatovic, Slobodan Djukanovic

PDF

TL;DR

This paper presents a supervised learning approach using mel-spectrogram features for accurate acoustic vehicle detection and speed estimation from single microphone recordings, achieving promising results in urban environments.

Contribution

It introduces a novel use of mel-spectrogram features for direct vehicle detection and speed estimation without intermediate steps, improving accuracy in real-world scenarios.

Findings

01

Average speed estimation error of 7.87 km/h

02

48.7% accuracy in 10 km/h speed classification

03

91.0% accuracy with one class offset allowed

Abstract

The paper addresses acoustic vehicle detection and speed estimation from single sensor measurements. We predict the vehicle's pass-by instant by minimizing clipped vehicle-to-microphone distance, which is predicted from the mel-spectrogram of input audio, in a supervised learning approach. In addition, mel-spectrogram-based features are used directly for vehicle speed estimation, without introducing any intermediate features. The results show that the proposed features can be used for accurate vehicle detection and speed estimation, with an average error of 7.87 km/h. If we formulate speed estimation as a classification problem, with a 10 km/h discretization interval, the proposed method attains the average accuracy of 48.7% for correct class prediction and 91.0% when an offset of one class is allowed. The proposed method is evaluated on a dataset of 304 urban-environment on-field…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings