Mechanistic Interpretability of ASR models using Sparse Autoencoders
Dan Pluth, Zachary Nicholas Houghton, Yu Zhou, and Vijay K. Gurbani

TL;DR
This paper applies Sparse Autoencoders to a Transformer-based speech recognition model, revealing its encoding of linguistic features and enabling cross-lingual feature manipulation.
Contribution
It demonstrates the first use of Sparse Autoencoders on an audio speech recognition model, uncovering linguistic features in Whisper.
Findings
SAE uncovers diverse linguistic and non-linguistic features in Whisper
Cross-lingual feature steering is demonstrated
Whisper encodes rich linguistic information
Abstract
Understanding the internal machinations of deep Transformer-based NLP models is more crucial than ever as these models see widespread use in various domains that affect the public at large, such as industry, academia, finance, health. While these models have advanced rapidly, their internal mechanisms remain largely a mystery. Techniques such as Sparse Autoencoders (SAE) have emerged to understand these mechanisms by projecting dense representations into a sparse vector. While existing research has demonstrated the viability of the SAE in interpreting text-based Large Language Models (LLMs), there are no equivalent studies that demonstrate the application of a SAE to audio processing models like Automatic Speech Recognizers (ASRs). In this work, a SAE is applied to Whisper, a Transformer-based ASR, training a high-dimensional sparse latent space on frame-level embeddings extracted from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
