What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model
Mu Yang, Ram C. M. C. Shekar, Okim Kang, John H. L. Hansen

TL;DR
This paper investigates how fine-tuning a Wav2vec2-based model for accent identification enhances phoneme and prosody representations, revealing layer-specific encoding changes and accent-specific features through systematic probing analyses.
Contribution
It provides a detailed layer-wise analysis of SSL model representations, showing how accent identification fine-tuning enriches phonetic and prosodic encoding in specific layers.
Findings
Top layers learn richer phoneme and prosody features after fine-tuning.
Strong accent-specific phoneme representations are found in layer 9.
Fine-tuning effects are similar to those observed in automatic speech recognition tasks.
Abstract
This study is focused on understanding and quantifying the change in phoneme and prosody information encoded in the Self-Supervised Learning (SSL) model, brought by an accent identification (AID) fine-tuning task. This problem is addressed based on model probing. Specifically, we conduct a systematic layer-wise analysis of the representations of the Transformer layers on a phoneme correlation task, and a novel word-level prosody prediction task. We compare the probing performance of the pre-trained and fine-tuned SSL models. Results show that the AID fine-tuning task steers the top 2 layers to learn richer phoneme and prosody representation. These changes share some similarities with the effects of fine-tuning with an Automatic Speech Recognition task. In addition, we observe strong accent-specific phoneme representations in layer 9. To sum up, this study provides insights into the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and dialogue systems
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Absolute Position Encodings · Softmax · Byte Pair Encoding · Residual Connection · Adam · Dropout
