TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch
Xingchen Song, Mengtao Xing, Changwei Ma, Shengqiang Li, Di Wu, Binbin, Zhang, Fuping Pan, Dinghao Zhou, Yuekai Zhang, Shun Lei, Zhendong Peng,, Zhiyong Wu

TL;DR
TouchTTS introduces a simplified, cost-effective TTS framework leveraging a noise-robust tokenizer and unified architecture, enabling easier deployment and potential task unification with ASR.
Contribution
It presents a novel simplified TTS pipeline using S3Tokenizer and replaces complex modules with an LLM-based backbone, reducing data and deployment costs.
Findings
Achieves over 50% data retention with simplified pipeline
Reduces deployment costs by unifying TTS and ASR architectures
Demonstrates effective TTS performance with less complex data processing
Abstract
It is well known that LLM-based systems are data-hungry. Recent LLM-based TTS works typically employ complex data processing pipelines to obtain high-quality training data. These sophisticated pipelines require excellent models at each stage (e.g., speech denoising, speech enhancement, speaker diarization, and punctuation models), which themselves demand high-quality training data and are rarely open-sourced. Even with state-of-the-art models, issues persist, such as incomplete background noise removal and misalignment between punctuation and actual speech pauses. Moreover, the stringent filtering strategies often retain only 10-30\% of the original data, significantly impeding data scaling efforts. In this work, we leverage a noise-robust audio tokenizer (S3Tokenizer) to design a simplified yet effective TTS data processing pipeline that maintains data quality while substantially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsContext-Aware Activity Recognition Systems · Personal Information Management and User Behavior
