Dimensionality-Aware Anomaly Detection in Learned Representations of Self-Supervised Speech Models
Sandra Arcos-Holzinger, Sarah M. Erfani, James Bailey, Sanjeev Khudanpur

TL;DR
This paper introduces GRIDS, a framework using local intrinsic dimensionality to analyze how perturbations affect learned speech representations and their relation to ASR performance.
Contribution
It presents a novel dimensionality-aware analysis method for understanding local geometric changes in self-supervised speech models under perturbations.
Findings
LID increases with all low SNR perturbations and diverges at high SNR.
Benign noise converges to clean profile, adversarial inputs retain elevated LID.
Layer-wise LID features enable effective anomaly detection with high AUROC.
Abstract
Self-supervised speech models (S3Ms) achieve strong downstream performance, yet their learned representations remain poorly understood under natural and adversarial perturbations. Prior studies rely on representation similarity or global dimensionality, offering limited visibility into local geometric changes. We ask: how do perturbations deform local geometry, and do these shifts track downstream automatic speech recognition (ASR) degradation? To address this, we present GRIDS, a framework using Local Intrinsic Dimensionality (LID) across layer-wise representations in WavLM and wav2vec 2.0. We find that LID increases for all low signal-to noise ratio (SNR) perturbations and diverges at high SNR: benign noise converges toward the clean profile, while adversarial inputs retain early-layer LID elevation. We show LID elevation co-occurs with increased WER, and that layer-wise LID features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
