PersonaGym: Evaluating Persona Agents and LLMs
Vinay Samuel, Henry Peng Zou, Yue Zhou, Shreyas Chaudhari, Ashwin Kalyan, Tanmay Rajpurohit, Ameet Deshpande, Karthik Narasimhan, Vishvak Murahari

TL;DR
This paper introduces PersonaGym, a novel dynamic evaluation framework for assessing how well persona agents and large language models adhere to assigned personas across diverse settings, revealing significant room for improvement.
Contribution
The paper presents PersonaGym and PersonaScore, pioneering tools for comprehensive, large-scale evaluation of persona agents' fidelity and performance.
Findings
GPT-4.1 and LLaMA-3-8b have similar PersonaScores despite size differences.
Increasing model size does not necessarily improve persona adherence.
Significant opportunities exist for enhancing persona agent capabilities.
Abstract
Persona agents, which are LLM agents conditioned to act according to an assigned persona, enable contextually rich and user aligned interactions across domains like education and healthcare. However, evaluating how faithfully these agents adhere to their personas remains a significant challenge, particularly in free-form settings that demand consistency across diverse, persona-relevant environments. We introduce PersonaGym, the first dynamic evaluation framework for persona agents, and PersonaScore, a human-aligned automatic metric grounded in decision theory that enables comprehensive large-scale evaluation. Our evaluation of 10 leading LLMs across 200 personas and 10,000 questions reveals significant advancement opportunities. For example, GPT-4.1 had the exact same PersonaScore as LLaMA-3-8b despite being a more recent and advanced closed source model. Importantly, increased model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsPersona Design and Applications · Information Systems Theories and Implementation · Innovative Human-Technology Interaction
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Transformer · GPT-4 · Adam · Cosine Annealing · Linear Layer
