PersonaGym: Evaluating Persona Agents and LLMs

Vinay Samuel; Henry Peng Zou; Yue Zhou; Shreyas Chaudhari; Ashwin Kalyan; Tanmay Rajpurohit; Ameet Deshpande; Karthik Narasimhan; Vishvak Murahari

arXiv:2407.18416·cs.CL·September 8, 2025·3 cites

PersonaGym: Evaluating Persona Agents and LLMs

Vinay Samuel, Henry Peng Zou, Yue Zhou, Shreyas Chaudhari, Ashwin Kalyan, Tanmay Rajpurohit, Ameet Deshpande, Karthik Narasimhan, Vishvak Murahari

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces PersonaGym, a novel dynamic evaluation framework for assessing how well persona agents and large language models adhere to assigned personas across diverse settings, revealing significant room for improvement.

Contribution

The paper presents PersonaGym and PersonaScore, pioneering tools for comprehensive, large-scale evaluation of persona agents' fidelity and performance.

Findings

01

GPT-4.1 and LLaMA-3-8b have similar PersonaScores despite size differences.

02

Increasing model size does not necessarily improve persona adherence.

03

Significant opportunities exist for enhancing persona agent capabilities.

Abstract

Persona agents, which are LLM agents conditioned to act according to an assigned persona, enable contextually rich and user aligned interactions across domains like education and healthcare. However, evaluating how faithfully these agents adhere to their personas remains a significant challenge, particularly in free-form settings that demand consistency across diverse, persona-relevant environments. We introduce PersonaGym, the first dynamic evaluation framework for persona agents, and PersonaScore, a human-aligned automatic metric grounded in decision theory that enables comprehensive large-scale evaluation. Our evaluation of 10 leading LLMs across 200 personas and 10,000 questions reveals significant advancement opportunities. For example, GPT-4.1 had the exact same PersonaScore as LLaMA-3-8b despite being a more recent and advanced closed source model. Importantly, increased model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vsamuel2003/cie
pytorch

Videos

PersonaGym: Evaluating Persona Agents and LLMs· underline

Taxonomy

TopicsPersona Design and Applications · Information Systems Theories and Implementation · Innovative Human-Technology Interaction

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Transformer · GPT-4 · Adam · Cosine Annealing · Linear Layer