X-VC: Zero-shot Streaming Voice Conversion in Codec Space

Qixi Zheng; Yuxiang Zhao; Tianrui Wang; Wenxi Chen; Kele Xu; Yikang Li; Qinyuan Chen; Xipeng Qiu; Kai Yu; Xie Chen

arXiv:2604.12456·eess.AS·April 23, 2026

X-VC: Zero-shot Streaming Voice Conversion in Codec Space

Qixi Zheng, Yuxiang Zhao, Tianrui Wang, Wenxi Chen, Kele Xu, Yikang Li, Qinyuan Chen, Xipeng Qiu, Kai Yu, Xie Chen

PDF

1 Repo 1 Models

TL;DR

X-VC introduces a zero-shot streaming voice conversion system that operates in the neural codec space, enabling high-quality, low-latency, speaker-independent conversion suitable for interactive applications.

Contribution

The paper proposes a novel codec-space one-step conversion method with dual-conditioning and adaptive normalization, improving zero-shot streaming VC performance.

Findings

01

Achieves the best streaming WER in English and Chinese

02

Demonstrates strong speaker similarity in various settings

03

Offers lower real-time factor than baselines

Abstract

Zero-shot voice conversion (VC) aims to convert a source utterance into the voice of an unseen target speaker while preserving its linguistic content. Although recent systems have improved conversion quality, building zero-shot VC systems for interactive scenarios remains challenging because high-fidelity speaker transfer and low-latency streaming inference are difficult to achieve simultaneously. In this work, we present X-VC, a zero-shot streaming VC system that performs one-step conversion in the latent space of a pretrained neural codec. X-VC uses a dual-conditioning acoustic converter that jointly models source codec latents and frame-level acoustic conditions derived from target reference speech, while injecting utterance-level target speaker information through adaptive normalization. To reduce the mismatch between training and inference, we train the model with generated paired…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Jerrister/X-VC
github

Models

🤗
chenxie95/X-VC
model· 50 dl· ♡ 3
50 dl♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.