When Less Is More? Diagnosing ASR Predictions in Sardinian via Layer-Wise Decoding
Domenico De Cristofaro, Alessandro Vietti, Marianne Pouplier, Aleese Block

TL;DR
This paper investigates how intermediate layers in a pretrained speech model better preserve phonetic details for Sardinian, showing that truncating layers can improve phoneme recognition and serve as a diagnostic tool.
Contribution
It introduces a layer-wise decoding approach revealing that earlier layers outperform final layers in phoneme prediction for low-resource languages.
Findings
Intermediate layers yield lower phoneme error rates.
Early layers better preserve segmental identity and reduce errors.
Overgeneration and regressions are identified as limitations of final-layer predictions.
Abstract
Recent studies have shown that intermediate layers in multilingual speech models often encode more phonetically accurate representations than the final output layer. In this work, we apply a layer-wise decoding strategy to a pretrained Wav2Vec2 model to investigate how phoneme-level predictions evolve across encoder layers, focusing on Campidanese Sardinian, a low-resource language. We show that truncating upper transformer layers leads to improved Phoneme Error Rates (PER), with the best performance achieved not at the final layer, but two layers earlier. Through fine-grained alignment analysis, we find that intermediate predictions better preserve segmental identity, avoid overgeneration, and reduce certain classes of phonological errors. We also introduce the notion of regressive errors, cases where correct predictions at intermediate layers are overwritten by errors at the final…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhonetics and Phonology Research · Speech Recognition and Synthesis · Language and cultural evolution
