Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing
An-Ci Peng, Kuan-Tang Huang, Tien-Hong Lo, Hung-Shin Lee, Hsin-Min Wang, Berlin Chen

TL;DR
This paper introduces a dialect-aware RNN-T framework for Taiwanese Hakka speech recognition, effectively disentangling dialectal variation from linguistic content to improve accuracy across scripts and dialects.
Contribution
It presents the first unified model that jointly handles dialectal variability and dual writing systems in low-resource Hakka ASR, leveraging parameter-efficient strategies.
Findings
Achieved 57.00% relative error reduction on Hanzi ASR
Achieved 40.41% relative error reduction on Pinyin ASR
First systematic study of Hakka dialectal effects on ASR
Abstract
Taiwanese Hakka is a low-resource, endangered language that poses significant challenges for automatic speech recognition (ASR), including high dialectal variability and the presence of two distinct writing systems (Hanzi and Pinyin). Traditional ASR models often encounter difficulties in this context, as they tend to conflate essential linguistic content with dialect-specific variations across both phonological and lexical dimensions. To address these challenges, we propose a unified framework grounded in the Recurrent Neural Network Transducers (RNN-T). Central to our approach is the introduction of dialect-aware modeling strategies designed to disentangle dialectal "style" from linguistic "content", which enhances the model's capacity to learn robust and generalized representations. Additionally, the framework employs parameter-efficient prediction networks to concurrently model ASR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Authorship Attribution and Profiling · Natural Language Processing Techniques
