FormalASR: End-to-End Spoken Chinese to Formal Text

Wanyi Ning; Yinshang Guo; Haitao Qian; Jiyuan Cheng; Weiyuan Feng; Yufei Zhang

arXiv:2605.19266·cs.CL·May 20, 2026

FormalASR: End-to-End Spoken Chinese to Formal Text

Wanyi Ning, Yinshang Guo, Haitao Qian, Jiyuan Cheng, Weiyuan Feng, Yufei Zhang

PDF

2 Models 2 Datasets

TL;DR

FormalASR introduces end-to-end models that directly convert spoken Chinese into formal written text, reducing errors and improving quality without needing post-processing LLMs.

Contribution

The paper presents a novel end-to-end spoken-to-formal transcription model and large-scale datasets, enabling on-device, high-quality Chinese speech recognition for formal writing.

Findings

01

Achieves up to 37.4% CER reduction over baselines

02

Improves ROUGE-L and BERTScore metrics

03

Requires no post-processing LLM at deployment

Abstract

Automatic speech recognition (ASR) systems are typically optimized for verbatim transcription, which preserves disfluencies, filler words, and informal spoken structures that are often unsuitable for downstream writing-oriented applications. A common workaround is a two-stage ASR+LLM pipeline for post-editing, but this design increases latency and memory cost and is difficult to deploy on-device. We present FormalASR, two compact end-to-end models (0.6B and 1.7B) that directly transcribe spoken Chinese into formal written text. To enable this setting, we build WenetSpeech-Formal and Speechio-Formal, two large-scale spoken-to-formal datasets constructed by LLM-based rewriting and quality filtering. We then fine-tune Qwen3-ASR at two scales (0.6B and 1.7B) with supervised fine-tuning. Experiments on WenetSpeech-Formal and Speechio-Formal show that FormalASR achieves up to 37.4% relative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.