WenetSpeech-Wu: Datasets, Benchmarks, and Models for a Unified Chinese Wu Dialect Speech Processing Ecosystem

Chengyou Wang; Mingchen Shao; Jingbin Hu; Zeyu Zhu; Hongfei Xue; Bingshen Mu; Xin Xu; Xingyi Duan; Binbin Zhang; Pengcheng Zhu; Chuang Ding; Xiaojun Zhang; Hui Bu; Lei Xie

arXiv:2601.11027·cs.SD·January 21, 2026

WenetSpeech-Wu: Datasets, Benchmarks, and Models for a Unified Chinese Wu Dialect Speech Processing Ecosystem

Chengyou Wang, Mingchen Shao, Jingbin Hu, Zeyu Zhu, Hongfei Xue, Bingshen Mu, Xin Xu, Xingyi Duan, Binbin Zhang, Pengcheng Zhu, Chuang Ding, Xiaojun Zhang, Hui Bu, Lei Xie

PDF

Open Access 1 Datasets

TL;DR

This paper introduces WenetSpeech-Wu, a comprehensive dataset, benchmarks, and models for Wu dialect speech processing, addressing resource scarcity and enabling advanced research in this underrepresented Chinese dialect.

Contribution

The work provides the first large-scale Wu dialect speech dataset, standardized benchmarks, and open-source models, fostering a unified ecosystem for Wu dialect speech technologies.

Findings

01

High-quality 8,000-hour speech dataset released

02

Established multiple benchmarks for Wu dialect tasks

03

Open-source models demonstrate competitive performance

Abstract

Speech processing for low-resource dialects remains a fundamental challenge in developing inclusive and robust speech technologies. Despite its linguistic significance and large speaker population, the Wu dialect of Chinese has long been hindered by the lack of large-scale speech data, standardized evaluation benchmarks, and publicly available models. In this work, we present WenetSpeech-Wu, the first large-scale, multi-dimensionally annotated open-source speech corpus for the Wu dialect, comprising approximately 8,000 hours of diverse speech data. Building upon this dataset, we introduce WenetSpeech-Wu-Bench, the first standardized and publicly accessible benchmark for systematic evaluation of Wu dialect speech processing, covering automatic speech recognition (ASR), Wu-to-Mandarin translation, speaker attribute prediction, speech emotion recognition, text-to-speech (TTS) synthesis,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ASLP-lab/WenetSpeech-Wu
dataset· 104 dl
104 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Authorship Attribution and Profiling · Phonetics and Phonology Research