Exploring Self-Supervised Audio Models for Generalized Anomalous Sound Detection

Bing Han; Anbai Jiang; Xinhu Zheng; Wei-Qiang Zhang; Jia Liu; Pingyi Fan; Yanmin Qian

arXiv:2508.12230·cs.SD·August 19, 2025

Exploring Self-Supervised Audio Models for Generalized Anomalous Sound Detection

Bing Han, Anbai Jiang, Xinhu Zheng, Wei-Qiang Zhang, Jia Liu, Pingyi Fan, Yanmin Qian

PDF

Open Access

TL;DR

This paper presents a robust self-supervised audio model for generalized anomalous sound detection, leveraging pre-trained models, LoRA fine-tuning, and a machine-aware adapter to improve performance across diverse datasets.

Contribution

It introduces a novel ASD framework combining self-supervised pre-training, LoRA adaptation, and a machine-aware adapter to enhance generalization and effectiveness in diverse acoustic environments.

Findings

01

Significant performance improvements on all benchmark datasets.

02

Effective mitigation of overfitting with LoRA fine-tuning.

03

Enhanced generalization through machine-aware adapters.

Abstract

Machine anomalous sound detection (ASD) is a valuable technique across various applications. However, its generalization performance is often limited due to challenges in data collection and the complexity of acoustic environments. Inspired by the success of large pre-trained models in numerous fields, this paper introduces a robust ASD model that leverages self-supervised pre-trained models trained on large-scale speech and audio datasets. Although there are inconsistencies between the pre-training datasets and the ASD task, our findings indicate that pre-training still provides substantial benefits for ASD. To mitigate overfitting and retain learned knowledge when fine-tuning with limited data, we explore Fully-Connected Low-Rank Adaptation (LoRA) as an alternative to full fine-tuning. Additionally, we propose a Machine-aware Group Adapter module, which enables the model to capture…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Anomaly Detection Techniques and Applications