TG-ASR: Translation-Guided Learning with Parallel Gated Cross Attention for Low-Resource Automatic Speech Recognition

Cheng-Yeh Yang; Chien-Chun Wang; Li-Wei Chen; Hung-Shin Lee; Hsin-Min Wang; Berlin Chen

arXiv:2602.22039·eess.AS·February 26, 2026

TG-ASR: Translation-Guided Learning with Parallel Gated Cross Attention for Low-Resource Automatic Speech Recognition

Cheng-Yeh Yang, Chien-Chun Wang, Li-Wei Chen, Hung-Shin Lee, Hsin-Min Wang, Berlin Chen

PDF

Open Access

TL;DR

This paper introduces TG-ASR, a translation-guided learning framework with a novel cross-attention mechanism, to improve low-resource Taiwanese Hokkien speech recognition by leveraging multilingual translation embeddings.

Contribution

It proposes the PGCA mechanism for integrating auxiliary language embeddings into ASR, and provides a new Taiwanese Hokkien speech corpus for low-resource language research.

Findings

01

Achieved 14.77% relative reduction in character error rate.

02

Demonstrated effective cross-linguistic semantic guidance.

03

Identified optimal auxiliary languages for ASR enhancement.

Abstract

Low-resource automatic speech recognition (ASR) continues to pose significant challenges, primarily due to the limited availability of transcribed data for numerous languages. While a wealth of spoken content is accessible in television dramas and online videos, Taiwanese Hokkien exemplifies this issue, with transcriptions often being scarce and the majority of available subtitles provided only in Mandarin. To address this deficiency, we introduce TG-ASR for Taiwanese Hokkien drama speech recognition, a translation-guided ASR framework that utilizes multilingual translation embeddings to enhance recognition performance in low-resource environments. The framework is centered around the parallel gated cross-attention (PGCA) mechanism, which adaptively integrates embeddings from various auxiliary languages into the ASR decoder. This mechanism facilitates robust cross-linguistic semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research