Assessing the Impact of Anisotropy in Neural Representations of Speech: A Case Study on Keyword Spotting
Guillaume Wisniewski (LLF - UMR7110), S\'everine Guillaume (LACITO), Clara Rosina Fern\'andez (LACITO)

TL;DR
This paper investigates how anisotropy in pretrained speech models affects keyword spotting, demonstrating that despite anisotropy, models like wav2vec2 effectively identify words and capture phonetic structures.
Contribution
It provides the first detailed analysis of anisotropy's impact on downstream speech tasks, showing robustness of pretrained models in keyword spotting.
Findings
Wav2vec2 embeddings effectively identify words despite anisotropy.
Pretrained speech models capture phonetic structures and generalize across speakers.
Anisotropy does not hinder the utility of speech representations in keyword spotting.
Abstract
Pretrained speech representations like wav2vec2 and HuBERT exhibit strong anisotropy, leading to high similarity between random embeddings. While widely observed, the impact of this property on downstream tasks remains unclear. This work evaluates anisotropy in keyword spotting for computational documentary linguistics. Using Dynamic Time Warping, we show that despite anisotropy, wav2vec2 similarity measures effectively identify words without transcription. Our results highlight the robustness of these representations, which capture phonetic structures and generalize across speakers. Our results underscore the importance of pretraining in learning rich and invariant speech representations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques
