Mechanistic Interpretability of ASR models using Sparse Autoencoders

Dan Pluth; Zachary Nicholas Houghton; Yu Zhou; and Vijay K. Gurbani

arXiv:2605.12225·cs.CL·May 13, 2026

Mechanistic Interpretability of ASR models using Sparse Autoencoders

Dan Pluth, Zachary Nicholas Houghton, Yu Zhou, and Vijay K. Gurbani

PDF

TL;DR

This paper applies Sparse Autoencoders to a Transformer-based speech recognition model, revealing its encoding of linguistic features and enabling cross-lingual feature manipulation.

Contribution

It demonstrates the first use of Sparse Autoencoders on an audio speech recognition model, uncovering linguistic features in Whisper.

Findings

01

SAE uncovers diverse linguistic and non-linguistic features in Whisper

02

Cross-lingual feature steering is demonstrated

03

Whisper encodes rich linguistic information

Abstract

Understanding the internal machinations of deep Transformer-based NLP models is more crucial than ever as these models see widespread use in various domains that affect the public at large, such as industry, academia, finance, health. While these models have advanced rapidly, their internal mechanisms remain largely a mystery. Techniques such as Sparse Autoencoders (SAE) have emerged to understand these mechanisms by projecting dense representations into a sparse vector. While existing research has demonstrated the viability of the SAE in interpreting text-based Large Language Models (LLMs), there are no equivalent studies that demonstrate the application of a SAE to audio processing models like Automatic Speech Recognizers (ASRs). In this work, a SAE is applied to Whisper, a Transformer-based ASR, training a high-dimensional sparse latent space on frame-level embeddings extracted from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.