ASD-Diffusion: Anomalous Sound Detection with Diffusion Models
Fengrun Zhang, Xiang Xie, Kai Guo

TL;DR
This paper introduces ASD-Diffusion, a novel unsupervised anomaly detection method using diffusion models to reconstruct normal sound patterns from corrupted features, with improved speed and accuracy for factory environments.
Contribution
The paper presents a new application of diffusion models for unsupervised sound anomaly detection, including a reconstruction-based pipeline and a speed-optimized inference process.
Findings
Outperforms baseline by 7.75% on DCASE 2023 dataset
Uses diffusion models for reconstructing normal acoustic patterns
Introduces a post-processing filter for anomaly detection
Abstract
Unsupervised Anomalous Sound Detection (ASD) aims to design a generalizable method that can be used to detect anomalies when only normal sounds are given. In this paper, Anomalous Sound Detection based on Diffusion Models (ASD-Diffusion) is proposed for ASD in real-world factories. In our pipeline, the anomalies in acoustic features are reconstructed from their noisy corrupted features into their approximate normal pattern. Secondly, a post-processing anomalies filter algorithm is proposed to detect anomalies that exhibit significant deviation from the original input after reconstruction. Furthermore, denoising diffusion implicit model is introduced to accelerate the inference speed by a longer sampling interval of the denoising process. The proposed method is innovative in the application of diffusion models as a new scheme. Experimental results on the development set of DCASE 2023…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
MethodsSparse Evolutionary Training · Diffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
