AccentBox: Towards High-Fidelity Zero-Shot Accent Generation

Jinzuomu Zhong; Korin Richmond; Zhiba Su; Siqi Sun

arXiv:2409.09098·cs.SD·February 6, 2026

AccentBox: Towards High-Fidelity Zero-Shot Accent Generation

Jinzuomu Zhong, Korin Richmond, Zhiba Su, Siqi Sun

PDF

Open Access

TL;DR

This paper introduces AccentBox, a two-stage zero-shot accent generation system that improves accent fidelity and control in TTS by combining accent identification and speaker-agnostic accent embeddings.

Contribution

It unifies FAC, accented TTS, and ZS-TTS into a novel pipeline with state-of-the-art accent identification and enhanced accent fidelity in zero-shot scenarios.

Findings

01

Achieves 0.56 F1 score on unseen speakers for accent identification.

02

Outperforms previous methods in accent fidelity for zero-shot accent generation.

03

Enables generation of unseen accents with high fidelity.

Abstract

While recent Zero-Shot Text-to-Speech (ZS-TTS) models have achieved high naturalness and speaker similarity, they fall short in accent fidelity and control. To address this issue, we propose zero-shot accent generation that unifies Foreign Accent Conversion (FAC), accented TTS, and ZS-TTS, with a novel two-stage pipeline. In the first stage, we achieve state-of-the-art (SOTA) on Accent Identification (AID) with 0.56 f1 score on unseen speakers. In the second stage, we condition a ZS-TTS system on the pretrained speaker-agnostic accent embeddings extracted by the AID model. The proposed system achieves higher accent fidelity on inherent/cross accent generation, and enables unseen accent generation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Advanced Optical Sensing Technologies · Neural Networks and Reservoir Computing