Measuring and Mitigating the Distributional Gap Between Real and Simulated User Behaviors

Shuhaib Mehri; Philippe Laban; Sumuk Shashidhar; Marwa Abdulhai; Sergey Levine; Michel Galley; Dilek Hakkani-T\"ur

arXiv:2605.07847·cs.CL·May 11, 2026

Measuring and Mitigating the Distributional Gap Between Real and Simulated User Behaviors

Shuhaib Mehri, Philippe Laban, Sumuk Shashidhar, Marwa Abdulhai, Sergey Levine, Michel Galley, Dilek Hakkani-T\"ur

PDF

TL;DR

This paper introduces a method to measure the difference between real and simulated user behaviors, revealing significant gaps and suggesting ways to improve user simulators for AI training.

Contribution

It presents a novel approach to quantify and analyze the distributional gap between real and simulated user behaviors using clustering and divergence metrics.

Findings

01

Large distributional gaps exist between real and simulated user behaviors.

02

Combining multiple simulators can better approximate real user behavior.

03

Behavioral patterns captured by simulators can be interpreted via TF-IDF analysis.

Abstract

As user simulators are increasingly used for interactive training and evaluation of AI assistants, it is essential that they represent the diverse behaviors of real users. While existing works train user simulators to generate human-like responses, whether they capture the broad and heterogeneous distribution of real user behaviors remains an open question. In this work, we introduce a method to measure the distributional gap between real and simulated user behaviors, validated through a human study and ablations. Given a dataset of real and simulated conversations, our method extracts representations of user behavior from each conversation, quantizes them into discrete distributions via clustering, then computes divergence metrics. We provide the first systematic evaluation of 24 LLM-based user simulators on coding and writing tasks, and reveal a large distributional gap from real…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.