Temporal Pooling Strategies for Training-Free Anomalous Sound Detection with Self-Supervised Audio Embeddings
Kevin Wilkinghoff, Sarthak Yadav, Zheng-Hua Tan

TL;DR
This paper systematically evaluates temporal pooling strategies for training-free anomalous sound detection using pre-trained audio embeddings, introducing adaptive and hybrid pooling methods that outperform traditional mean pooling.
Contribution
It introduces relative deviation pooling and a hybrid pooling strategy, significantly improving detection performance over standard mean pooling in training-free ASD.
Findings
Proposed pooling methods outperform mean pooling across multiple datasets.
Achieved state-of-the-art results on DCASE2025 ASD dataset.
Surpassed all previously reported trained systems and ensembles.
Abstract
Training-free anomalous sound detection (ASD) based on pre-trained audio embedding models has recently garnered significant attention, as it enables the detection of anomalous sounds using only normal reference data while offering improved robustness under domain shifts. However, existing embedding-based approaches almost exclusively rely on temporal mean pooling, while alternative pooling strategies have so far only been explored for spectrogram-based representations. Consequently, the role of temporal pooling in training-free ASD with pre-trained embeddings remains insufficiently understood. In this paper, we present a systematic evaluation of temporal pooling strategies across multiple state-of-the-art audio embedding models. We propose relative deviation pooling (RDP), an adaptive pooling method that emphasizes informative temporal deviations, and introduce a hybrid pooling strategy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Anomaly Detection Techniques and Applications · Speech and Audio Processing
