ASD-Diffusion: Anomalous Sound Detection with Diffusion Models

Fengrun Zhang; Xiang Xie; Kai Guo

arXiv:2409.15957·cs.SD·September 25, 2024

ASD-Diffusion: Anomalous Sound Detection with Diffusion Models

Fengrun Zhang, Xiang Xie, Kai Guo

PDF

Open Access

TL;DR

This paper introduces ASD-Diffusion, a novel unsupervised anomaly detection method using diffusion models to reconstruct normal sound patterns from corrupted features, with improved speed and accuracy for factory environments.

Contribution

The paper presents a new application of diffusion models for unsupervised sound anomaly detection, including a reconstruction-based pipeline and a speed-optimized inference process.

Findings

01

Outperforms baseline by 7.75% on DCASE 2023 dataset

02

Uses diffusion models for reconstructing normal acoustic patterns

03

Introduces a post-processing filter for anomaly detection

Abstract

Unsupervised Anomalous Sound Detection (ASD) aims to design a generalizable method that can be used to detect anomalies when only normal sounds are given. In this paper, Anomalous Sound Detection based on Diffusion Models (ASD-Diffusion) is proposed for ASD in real-world factories. In our pipeline, the anomalies in acoustic features are reconstructed from their noisy corrupted features into their approximate normal pattern. Secondly, a post-processing anomalies filter algorithm is proposed to detect anomalies that exhibit significant deviation from the original input after reconstruction. Furthermore, denoising diffusion implicit model is introduced to accelerate the inference speed by a longer sampling interval of the denoising process. The proposed method is innovative in the application of diffusion models as a new scheme. Experimental results on the development set of DCASE 2023…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis

MethodsSparse Evolutionary Training · Diffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings