PersonaPlex: Voice and Role Control for Full Duplex Conversational Speech Models

Rajarshi Roy; Jonathan Raiman; Sang-gil Lee; Teodor-Dumitru Ene; Robert Kirby; Sungwon Kim; Jaehyeon Kim; Bryan Catanzaro

arXiv:2602.06053·cs.CL·February 9, 2026

PersonaPlex: Voice and Role Control for Full Duplex Conversational Speech Models

Rajarshi Roy, Jonathan Raiman, Sang-gil Lee, Teodor-Dumitru Ene, Robert Kirby, Sungwon Kim, Jaehyeon Kim, Bryan Catanzaro

PDF

Open Access 4 Models

TL;DR

PersonaPlex is a novel duplex speech model enabling role and voice control, supporting personalized, multi-role conversations with high naturalness and responsiveness, trained on synthetic datasets and evaluated on extended benchmarks.

Contribution

Introduces PersonaPlex, a duplex conversational speech model with hybrid prompts for role and voice conditioning, advancing personalized multi-role speech interactions.

Findings

01

Achieves strong role-conditioned behavior and voice similarity.

02

Surpasses state-of-the-art in role adherence and naturalness.

03

Effective in multi-role customer service scenarios.

Abstract

Recent advances in duplex speech models have enabled natural, low-latency speech-to-speech interactions. However, existing models are restricted to a fixed role and voice, limiting their ability to support structured, role-driven real-world applications and personalized interactions. In this work, we introduce PersonaPlex, a duplex conversational speech model that incorporates hybrid system prompts, combining role conditioning with text prompts and voice cloning with speech samples. PersonaPlex is trained on a large-scale synthetic dataset of paired prompts and user-agent conversations, generated with open-source large language models (LLM) and text-to-speech (TTS) models. To evaluate role conditioning in real-world settings, we extend the Full-Duplex-Bench benchmark beyond a single assistant role to multi-role customer service scenarios. Experiments show that PersonaPlex achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Speech Recognition and Synthesis