Context-Aware Dynamic Chunking for Streaming Tibetan Speech Recognition

Chao Wang; Yuqing Cai; Renzeng Duojie; Jin Zhang; Yutong Liu; Nyima Tashi

arXiv:2511.09085·cs.CL·November 13, 2025

Context-Aware Dynamic Chunking for Streaming Tibetan Speech Recognition

Chao Wang, Yuqing Cai, Renzeng Duojie, Jin Zhang, Yutong Liu, Nyima Tashi

PDF

Open Access

TL;DR

This paper introduces a context-aware dynamic chunking approach for streaming Tibetan speech recognition, improving accuracy and latency by adaptively adjusting chunk sizes and incorporating linguistic and language model information.

Contribution

It presents a novel adaptive chunking mechanism combined with linguistic units and external language models for Tibetan speech recognition.

Findings

01

Achieved 6.23% WER on test set

02

48.15% relative improvement over fixed-chunk baseline

03

Reduced recognition latency significantly

Abstract

In this work, we propose a streaming speech recognition framework for Amdo Tibetan, built upon a hybrid CTC/Atten-tion architecture with a context-aware dynamic chunking mechanism. The proposed strategy adaptively adjusts chunk widths based on encoding states, enabling flexible receptive fields, cross-chunk information exchange, and robust adaptation to varying speaking rates, thereby alleviating the context truncation problem of fixed-chunk methods. To further capture the linguistic characteristics of Tibetan, we construct a lexicon grounded in its orthographic principles, providing linguistically motivated modeling units. During decoding, an external language model is integrated to enhance semantic consistency and improve recognition of long sentences. Experimental results show that the proposed framework achieves a word error rate (WER) of 6.23% on the test set, yielding a 48.15%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research