The Tonogenesis Continuum in Tibetan: A Computational Investigation

Siyu Liang; Zhaxi Zerong

arXiv:2510.22485·cs.CL·October 28, 2025

The Tonogenesis Continuum in Tibetan: A Computational Investigation

Siyu Liang, Zhaxi Zerong

PDF

TL;DR

This study uses computational methods to analyze the gradual evolution of tonal features in Tibetan languages, revealing a continuum from non-tonal to tonal systems through speech recognition performance.

Contribution

It introduces a novel computational approach to quantify the functional role of pitch during tonogenesis, providing empirical evidence of a gradual transition in Tibetan dialects.

Findings

01

Atonal Amdo dialects tolerate pitch removal best

02

Fully tonal U-Tsang dialects show severe degradation with pitch flattening

03

Intermediate Kham dialects exhibit intermediate sensitivity

Abstract

Tonogenesis-the historical process by which segmental contrasts evolve into lexical tone-has traditionally been studied through comparative reconstruction and acoustic phonetics. We introduce a computational approach that quantifies the functional role of pitch at different stages of this sound change by measuring how pitch manipulation affects automatic speech recognition (ASR) performance. Through analysis on the sensitivity to pitch-flattening from a set of closely related Tibetan languages, we find evidence of a tonogenesis continuum: atonal Amdo dialects tolerate pitch removal the most, while fully tonal U-Tsang varieties show severe degradation, and intermediate Kham dialects fall measurably between these extremes. These gradient effects demonstrate how ASR models implicitly learn the shifting functional load of pitch as languages transition from consonant-based to tone-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.