TL;DR
This paper presents WARDEN, a specialized two-stage model for transcribing and translating the endangered Wardaman language with only 6 hours of data, outperforming larger models in low-resource settings.
Contribution
The paper introduces a novel two-stage approach for low-resource language transcription and translation, utilizing phonemic initialization and domain-specific knowledge.
Findings
WARDEN outperforms larger models with only 6 hours of data.
Phonemic initialization accelerates transcription model fine-tuning.
Domain-specific dictionary improves translation accuracy.
Abstract
This paper introduces WARDEN, an early language model system capable of transcribing and translating Wardaman, an endangered Australian indigenous language into English. The significant challenge we face is the lack of large-scale training data: in fact, we only have 6 hours of annotated audio. Therefore, while it is common practice to train a single model for transcription and translation using large datasets (like English to French), this practice is no longer viable in the Wardaman to English context. To tackle the low-resource challenge, we design WARDEN to have separate transcription and translation models: WARDEN first turns a Wardaman audio input into phonemic transcription, and then the transcription into English translation. Further, we propose two useful techniques to enhance performance. For transcription, we initialize the Wardaman token from Sundanese, a language that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
