From Vision to Sound: Advancing Audio Anomaly Detection with Vision-Based Algorithms
Manuel Barusco, Francesco Borsatti, Davide Dalle Pezze, Francesco, Paissan, Elisabetta Farella, Gian Antonio Susto

TL;DR
This paper adapts visual anomaly detection algorithms to audio signals, enhancing anomaly localization and explainability in audio anomaly detection systems through spectrogram analysis.
Contribution
It introduces a novel approach that applies vision-based algorithms to audio data, enabling fine-grained temporal-frequency localization of anomalies for improved interpretability.
Findings
Effective detection of audio anomalies on benchmarks
Enhanced explainability through localized anomaly identification
Improved accuracy over existing AAD methods
Abstract
Recent advances in Visual Anomaly Detection (VAD) have introduced sophisticated algorithms leveraging embeddings generated by pre-trained feature extractors. Inspired by these developments, we investigate the adaptation of such algorithms to the audio domain to address the problem of Audio Anomaly Detection (AAD). Unlike most existing AAD methods, which primarily classify anomalous samples, our approach introduces fine-grained temporal-frequency localization of anomalies within the spectrogram, significantly improving explainability. This capability enables a more precise understanding of where and when anomalies occur, making the results more actionable for end users. We evaluate our approach on industrial and environmental benchmarks, demonstrating the effectiveness of VAD techniques in detecting anomalies in audio signals. Moreover, they improve explainability by enabling localized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Digital Media Forensic Detection · Music and Audio Processing
