Controllable Accent Normalization via Discrete Diffusion

Qibing Bai; Yuhan Du; Tom Ko; Shuai Wang; Yannan Wang; Haizhou Li

arXiv:2603.14275·eess.AS·March 17, 2026

Controllable Accent Normalization via Discrete Diffusion

Qibing Bai, Yuhan Du, Tom Ko, Shuai Wang, Yannan Wang, Haizhou Li

PDF

Open Access

TL;DR

This paper introduces DLM-AN, a novel controllable accent normalization system using discrete diffusion, enabling adjustable accent strength and improved speech naturalness in multi-accent English data.

Contribution

The paper presents a new diffusion-based approach with token reuse and duration adjustment for controllable accent normalization in speech processing.

Findings

01

Achieves lowest word error rate among compared systems

02

Provides smooth and interpretable control over accent strength

03

Maintains competitive accent reduction quality

Abstract

Existing accent normalization methods do not typically offer control over accent strength, yet many applications-such as language learning and dubbing-require tunable accent retention. We propose DLM-AN, a controllable accent normalization system built on masked discrete diffusion over self-supervised speech tokens. A Common Token Predictor identifies source tokens that likely encode native pronunciation; these tokens are selectively reused to initialize the reverse diffusion process. This provides a simple yet effective mechanism for controlling accent strength: reusing more tokens preserves more of the original accent. DLM-AN further incorporates a flow-matching Duration Ratio Predictor that automatically adjusts the total duration to better match the native rhythm. Experiments on multi-accent English data show that DLM-AN achieves the lowest word error rate among all compared systems…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Voice and Speech Disorders