Breeze Taigi: Benchmarks and Models for Taiwanese Hokkien Speech Recognition and Synthesis

Yu-Siang Lan; Chia-Sheng Liu; Yi-Chang Chen; Po-Chun Hsu; Allyson Chiu; Shun-Wen Lin; Da-shan Shiu; Yuan-Fu Liao

arXiv:2603.19259·cs.CL·March 23, 2026

Breeze Taigi: Benchmarks and Models for Taiwanese Hokkien Speech Recognition and Synthesis

Yu-Siang Lan, Chia-Sheng Liu, Yi-Chang Chen, Po-Chun Hsu, Allyson Chiu, Shun-Wen Lin, Da-shan Shiu, Yuan-Fu Liao

PDF

Open Access

TL;DR

This paper introduces Breeze Taigi, a standardized benchmarking framework for Taiwanese Hokkien speech recognition and synthesis, utilizing parallel Mandarin resources and synthetic data to develop and evaluate models.

Contribution

It presents a reproducible evaluation methodology, curated datasets, and baseline models for Taigi speech technology, facilitating cross-system comparison and future research.

Findings

01

Achieved 30.13% CER on the benchmark

02

Fine-tuned Whisper model on 10,000 hours of synthetic data

03

Provided open datasets and baseline models for Taigi speech tasks

Abstract

Taiwanese Hokkien (Taigi) presents unique opportunities for advancing speech technology methodologies that can generalize to diverse linguistic contexts. We introduce Breeze Taigi, a comprehensive framework centered on standardized benchmarks for evaluating Taigi speech recognition and synthesis systems. Our primary contribution is a reproducible evaluation methodology that leverages parallel Taiwanese Mandarin resources. We provide 30 carefully curated Mandarin-Taigi audio pairs from Taiwan's Executive Yuan public service announcements with normalized ground truth transcriptions. We establish Character Error Rate (CER) as the standard metric and implement normalization procedures to enable fair cross-system comparisons. To demonstrate the benchmark's utility and provide reference implementations, we develop speech recognition and synthesis models through a methodology that leverages…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Face recognition and analysis