Building a Taiwanese Mandarin Spoken Language Model: A First Attempt
Chih-Kai Yang, Yu-Kuan Fu, Chen-An Li, Yi-Cheng Lin, Yu-Xiang Lin,, Wei-Chih Chen, Ho Lam Chung, Chun-Yi Kuan, Wei-Ping Huang, Ke-Han Lu,, Tzu-Quan Lin, Hsiu-Hsuan Wang, En-Pei Hu, Chan-Jan Hsu, Liang-Hsuan Tseng,, I-Hsiang Chiu, Ulin Sanga, Xuanjun Chen, Po-chun Hsu

TL;DR
This paper describes the initial development of a Taiwanese Mandarin spoken language model designed for real-time, multi-turn speech interaction, incorporating a transformer architecture and specialized training for conversational fluency.
Contribution
It introduces a novel end-to-end spoken LLM tailored for Taiwanese Mandarin with real-time, multi-turn conversational capabilities and a dedicated evaluation platform.
Findings
Achieved real-time speech-to-speech interaction in Taiwanese Mandarin
Developed a platform for evaluating conversational fluency
Successfully trained a decoder-only transformer model for spoken dialogue
Abstract
This technical report presents our initial attempt to build a spoken large language model (LLM) for Taiwanese Mandarin, specifically tailored to enable real-time, speech-to-speech interaction in multi-turn conversations. Our end-to-end model incorporates a decoder-only transformer architecture and aims to achieve seamless interaction while preserving the conversational flow, including full-duplex capabilities allowing simultaneous speaking and listening. The paper also details the training process, including data preparation with synthesized dialogues and adjustments for real-time interaction. We also developed a platform to evaluate conversational fluency and response coherence in multi-turn dialogues. We hope the release of the report can contribute to the future development of spoken LLMs in Taiwanese Mandarin.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
