PersonaPlex: Voice and Role Control for Full Duplex Conversational Speech Models
Rajarshi Roy, Jonathan Raiman, Sang-gil Lee, Teodor-Dumitru Ene, Robert Kirby, Sungwon Kim, Jaehyeon Kim, Bryan Catanzaro

TL;DR
PersonaPlex is a novel duplex speech model enabling role and voice control, supporting personalized, multi-role conversations with high naturalness and responsiveness, trained on synthetic datasets and evaluated on extended benchmarks.
Contribution
Introduces PersonaPlex, a duplex conversational speech model with hybrid prompts for role and voice conditioning, advancing personalized multi-role speech interactions.
Findings
Achieves strong role-conditioned behavior and voice similarity.
Surpasses state-of-the-art in role adherence and naturalness.
Effective in multi-role customer service scenarios.
Abstract
Recent advances in duplex speech models have enabled natural, low-latency speech-to-speech interactions. However, existing models are restricted to a fixed role and voice, limiting their ability to support structured, role-driven real-world applications and personalized interactions. In this work, we introduce PersonaPlex, a duplex conversational speech model that incorporates hybrid system prompts, combining role conditioning with text prompts and voice cloning with speech samples. PersonaPlex is trained on a large-scale synthetic dataset of paired prompts and user-agent conversations, generated with open-source large language models (LLM) and text-to-speech (TTS) models. To evaluate role conditioning in real-world settings, we extend the Full-Duplex-Bench benchmark beyond a single assistant role to multi-role customer service scenarios. Experiments show that PersonaPlex achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Speech Recognition and Synthesis
