Probing self-attention in self-supervised speech models for cross-linguistic differences
Sai Gopinath, Joselyn Rodriguez

TL;DR
This paper investigates how self-attention mechanisms in self-supervised speech models differ across languages, revealing diverse attention patterns and phonological learning even in small models, with implications for language independence.
Contribution
It provides an in-depth analysis of attention patterns in a small speech transformer, highlighting language-specific differences and phonological information learning.
Findings
Attention heads vary from diagonal to global patterns regardless of language
Models learn important phonological information during pretraining
Diagonal heads are crucial for phoneme classification across languages
Abstract
Speech models have gained traction thanks to increase in accuracy from novel transformer architectures. While this impressive increase in performance across automatic speech recognition (ASR) benchmarks is noteworthy, there is still much that is unknown about the use of attention mechanisms for speech-related tasks. For example, while it is assumed that these models are learning language-independent (i.e., universal) speech representations, there has not yet been an in-depth exploration of what it would mean for the models to be language-independent. In the current paper, we explore this question within the realm of self-attention mechanisms of one small self-supervised speech transformer model (TERA). We find that even with a small model, the attention heads learned are diverse ranging from almost entirely diagonal to almost entirely global regardless of the training language. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling
MethodsSoftmax · Attention Is All You Need
