Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents
Harshita Chopra, Kshitish Ghate, Aylin Caliskan, Tadayoshi Kohno, Chirag Shah, Natasha Jaques

TL;DR
This paper introduces Persona Policies (PPol), an LLM-driven method to generate diverse, realistic user personas for more robust evaluation and training of LLM agents, addressing the limitations of homogeneous simulators.
Contribution
It presents a novel, automated approach to create diverse human-like user personas using evolutionary program search guided by multi-objective fitness scores.
Findings
PPol-generated personas achieve 80.4% human-likeness in evaluations.
Agents trained with PPol show a 17% increase in task success on out-of-distribution behaviors.
PPol improves simulator diversity and robustness across retail and airline domains.
Abstract
Large Language Model (LLM) agents are increasingly deployed in settings where they interact with a wide variety of people, including users who are unclear, impatient, or reluctant to share information. However, collecting real interaction data at scale remains expensive. The field has turned to LLM-based user simulators as stand-ins, but these simulators inherit the behavior of their underlying models: cooperative and homogeneous. As a result, agents that appear strong in simulation often fail under the unseen, diverse communication patterns of real users. To narrow this gap, we introduce Persona Policies (PPol), a plug-and-play control layer that induces realistic behavioral variation in user simulators while preserving the original task goals. Rather than hand-crafting personas, we cast persona generation as an LLM-driven evolutionary program search that optimizes a Python generator…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
