Microphone Array Based Surveillance Audio Classification
Dimitri Leandro de Oliveira Silva, Tito Spadini, Ricardo Suyama

TL;DR
This study evaluates classical classifiers and beamforming algorithms for surveillance audio detection using microphone arrays, highlighting the trade-offs between accuracy and computational efficiency under various noise conditions.
Contribution
It compares multiple classifiers and beamforming techniques, identifying effective combinations and analyzing their performance and computational costs in surveillance audio classification.
Findings
SVM with Delay-and-Sum achieved 86% accuracy but with high computational cost.
SGD classifier provided comparable accuracy (~85%) with faster processing (~165 ms).
Performance varies with SNR levels and data augmentation improves results.
Abstract
The work assessed seven classical classifiers and two beamforming algorithms for detecting surveillance sound events. The tests included the use of AWGN with -10 dB to 30 dB SNR. Data Augmentation was also employed to improve algorithms' performance. The results showed that the combination of SVM and Delay-and-Sum (DaS) scored the best accuracy (up to 86.0\%), but had high computational cost ( 402 ms), mainly due to DaS. The use of SGD also seems to be a good alternative since it has achieved good accuracy either (up to 85.3\%), but with quicker processing time ( 165 ms).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Animal Vocal Communication and Behavior
MethodsSupport Vector Machine · Stochastic Gradient Descent
