CodecMOS-Accent: A MOS Benchmark of Resynthesized and TTS Speech from Neural Codecs Across English Accents

Wen-Chin Huang; Nicholas Sanders; Erica Cooper

arXiv:2603.14328·cs.SD·March 17, 2026

CodecMOS-Accent: A MOS Benchmark of Resynthesized and TTS Speech from Neural Codecs Across English Accents

Wen-Chin Huang, Nicholas Sanders, Erica Cooper

PDF

Open Access

TL;DR

The paper introduces CodecMOS-Accent, a comprehensive MOS benchmark dataset for evaluating neural codecs and TTS models across diverse English accents, highlighting the relationship between speaker and accent similarity and the effectiveness of objective metrics.

Contribution

It provides a new large-scale dataset with subjective evaluations for assessing neural audio codecs and accented TTS, facilitating more human-centric speech synthesis research.

Findings

01

Strong correlation between speaker and accent similarity.

02

Objective metrics can predict perceptual quality.

03

Listeners exhibit perceptual bias based on shared accent.

Abstract

We present the CodecMOS-Accent dataset, a mean opinion score (MOS) benchmark designed to evaluate neural audio codec (NAC) models and the large language model (LLM)-based text-to-speech (TTS) models trained upon them, especially across non-standard speech like accented speech. The dataset comprises 4,000 codec resynthesis and TTS samples from 24 systems, featuring 32 speakers spanning ten accents. A large-scale subjective test was conducted to collect 19,600 annotations from 25 listeners across three dimensions: naturalness, speaker similarity, and accent similarity. This dataset does not only represent an up-to-date study of recent speech synthesis system performance but reveals insights including a tight relationship between speaker and accent similarity, the predictive power of objective metrics, and a perceptual bias when listeners share the same accent with the speaker. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Voice and Speech Disorders