Breeze Taigi: Benchmarks and Models for Taiwanese Hokkien Speech Recognition and Synthesis
Yu-Siang Lan, Chia-Sheng Liu, Yi-Chang Chen, Po-Chun Hsu, Allyson Chiu, Shun-Wen Lin, Da-shan Shiu, Yuan-Fu Liao

TL;DR
This paper introduces Breeze Taigi, a standardized benchmarking framework for Taiwanese Hokkien speech recognition and synthesis, utilizing parallel Mandarin resources and synthetic data to develop and evaluate models.
Contribution
It presents a reproducible evaluation methodology, curated datasets, and baseline models for Taigi speech technology, facilitating cross-system comparison and future research.
Findings
Achieved 30.13% CER on the benchmark
Fine-tuned Whisper model on 10,000 hours of synthetic data
Provided open datasets and baseline models for Taigi speech tasks
Abstract
Taiwanese Hokkien (Taigi) presents unique opportunities for advancing speech technology methodologies that can generalize to diverse linguistic contexts. We introduce Breeze Taigi, a comprehensive framework centered on standardized benchmarks for evaluating Taigi speech recognition and synthesis systems. Our primary contribution is a reproducible evaluation methodology that leverages parallel Taiwanese Mandarin resources. We provide 30 carefully curated Mandarin-Taigi audio pairs from Taiwan's Executive Yuan public service announcements with normalized ground truth transcriptions. We establish Character Error Rate (CER) as the standard metric and implement normalization procedures to enable fair cross-system comparisons. To demonstrate the benchmark's utility and provide reference implementations, we develop speech recognition and synthesis models through a methodology that leverages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Face recognition and analysis
