Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents

Harshita Chopra; Kshitish Ghate; Aylin Caliskan; Tadayoshi Kohno; Chirag Shah; Natasha Jaques

arXiv:2605.12894·cs.AI·May 14, 2026

Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents

Harshita Chopra, Kshitish Ghate, Aylin Caliskan, Tadayoshi Kohno, Chirag Shah, Natasha Jaques

PDF

TL;DR

This paper introduces Persona Policies (PPol), an LLM-driven method to generate diverse, realistic user personas for more robust evaluation and training of LLM agents, addressing the limitations of homogeneous simulators.

Contribution

It presents a novel, automated approach to create diverse human-like user personas using evolutionary program search guided by multi-objective fitness scores.

Findings

01

PPol-generated personas achieve 80.4% human-likeness in evaluations.

02

Agents trained with PPol show a 17% increase in task success on out-of-distribution behaviors.

03

PPol improves simulator diversity and robustness across retail and airline domains.

Abstract

Large Language Model (LLM) agents are increasingly deployed in settings where they interact with a wide variety of people, including users who are unclear, impatient, or reluctant to share information. However, collecting real interaction data at scale remains expensive. The field has turned to LLM-based user simulators as stand-ins, but these simulators inherit the behavior of their underlying models: cooperative and homogeneous. As a result, agents that appear strong in simulation often fail under the unseen, diverse communication patterns of real users. To narrow this gap, we introduce Persona Policies (PPol), a plug-and-play control layer that induces realistic behavioral variation in user simulators while preserving the original task goals. Rather than hand-crafting personas, we cast persona generation as an LLM-driven evolutionary program search that optimizes a Python generator…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.