PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment

Jihwan Oh; Soowon Oh; Murad Aghazada; Minchan Jeong; Sungnyun Kim; Se-Young Yun

arXiv:2604.08986·cs.CL·April 13, 2026

PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment

Jihwan Oh, Soowon Oh, Murad Aghazada, Minchan Jeong, Sungnyun Kim, Se-Young Yun

PDF

TL;DR

This paper introduces PerMix-RLVR, a training strategy for large language models that balances robustness to persona variations with the ability to faithfully adopt specific personas, addressing a key challenge in persona prompting.

Contribution

The paper proposes PerMix-RLVR, a novel persona-mixed reinforcement learning approach that mitigates the robustness-fidelity trade-off in persona-sensitive language models.

Findings

01

PerMix-RLVR improves persona stability score (PSS) by +21.2% on MATH500.

02

PerMix-RLVR enhances persona fidelity by +11.4% on PersonaGym.

03

RLVR reduces persona sensitivity but can degrade persona expressivity.

Abstract

Persona prompting has been widely adopted to steer large language models (LLMs) behavior and improve their instruction performance by assigning specific characters. However, identifying an optimal persona is time-consuming, and its impact on output quality remains poorly understood. Prior work has mainly addressed this issue at the prompt level via inference-time strategies, incurring additional computation. In this work, we avoid inference-time prompt search by tackling persona sensitivity during training, aiming to train models that adapt their behavior to diverse personas while preserving task performance. In particular, we find that reinforcement learning with verifiable rewards (RLVR) systematically reduces sensitivity to persona prompts, but also reveals an inherent trade-off of outcome-based optimization: while RLVR improves robustness on tasks with verifiable goals, it can also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.