Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers

Tzu-Quan Lin; Hsi-Chun Cheng; Hung-yi Lee; Hao Tang

arXiv:2506.21712·cs.CL·June 30, 2025

Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers

Tzu-Quan Lin, Hsi-Chun Cheng, Hung-yi Lee, Hao Tang

PDF

Open Access

TL;DR

This paper investigates how self-supervised speech Transformers encode speaker information, identifying specific neurons linked to speaker traits, and demonstrates their importance by preserving performance through targeted pruning.

Contribution

The study uncovers neurons in feed-forward layers that encode speaker information and shows how protecting these neurons maintains speaker-related task performance.

Findings

01

Neurons correlated with speaker traits can be identified via clustering.

02

Protecting speaker-related neurons preserves speaker task performance.

03

Clusters correspond to phonetic and gender classes.

Abstract

In recent years, the impact of self-supervised speech Transformers has extended to speaker-related applications. However, little research has explored how these models encode speaker information. In this work, we address this gap by identifying neurons in the feed-forward layers that are correlated with speaker information. Specifically, we analyze neurons associated with k-means clusters of self-supervised features and i-vectors. Our analysis reveals that these clusters correspond to broad phonetic and gender classes, making them suitable for identifying neurons that represent speakers. By protecting these neurons during pruning, we can significantly preserve performance on speaker-related task, demonstrating their crucial role in encoding speaker information.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and dialogue systems