Tone recognition in low-resource languages of North-East India: peeling the layers of SSL-based speech models

Parismita Gogoi; Sishir Kalita; Wendy Lalhminghlui; Viyazonuo Terhiija; Moakala Tzudir; Priyankoo Sarmah; S. R. M. Prasanna

arXiv:2506.03606·eess.AS·June 5, 2025

Tone recognition in low-resource languages of North-East India: peeling the layers of SSL-based speech models

Parismita Gogoi, Sishir Kalita, Wendy Lalhminghlui, Viyazonuo Terhiija, Moakala Tzudir, Priyankoo Sarmah, S. R. M. Prasanna

PDF

Open Access

TL;DR

This paper investigates the effectiveness of SSL speech models for tone recognition in low-resource North-East Indian languages, revealing layer-wise insights and factors affecting performance.

Contribution

It provides the first detailed analysis of SSL model performance for tone recognition in low-resource tonal languages, highlighting layer importance and linguistic influences.

Findings

01

Middle layers are most effective for tone recognition.

02

Performance varies across languages, best for Mizo.

03

Tone inventory and dialects influence recognition accuracy.

Abstract

This study explores the use of self-supervised learning (SSL) models for tone recognition in three low-resource languages from North Eastern India: Angami, Ao, and Mizo. We evaluate four Wav2vec2.0 base models that were pre-trained on both tonal and non-tonal languages. We analyze tone-wise performance across the layers for all three languages and compare the different models. Our results show that tone recognition works best for Mizo and worst for Angami. The middle layers of the SSL models are the most important for tone recognition, regardless of the pre-training language, i.e. tonal or non-tonal. We have also found that the tone inventory, tone types, and dialectal variations affect tone recognition. These findings provide useful insights into the strengths and weaknesses of SSL-based embeddings for tonal languages and highlight the potential for improving tone recognition in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhonetics and Phonology Research · Animal Vocal Communication and Behavior · Speech Recognition and Synthesis

MethodsBalanced Selection