Measuring and Mitigating the Distributional Gap Between Real and Simulated User Behaviors
Shuhaib Mehri, Philippe Laban, Sumuk Shashidhar, Marwa Abdulhai, Sergey Levine, Michel Galley, Dilek Hakkani-T\"ur

TL;DR
This paper introduces a method to measure the difference between real and simulated user behaviors, revealing significant gaps and suggesting ways to improve user simulators for AI training.
Contribution
It presents a novel approach to quantify and analyze the distributional gap between real and simulated user behaviors using clustering and divergence metrics.
Findings
Large distributional gaps exist between real and simulated user behaviors.
Combining multiple simulators can better approximate real user behavior.
Behavioral patterns captured by simulators can be interpreted via TF-IDF analysis.
Abstract
As user simulators are increasingly used for interactive training and evaluation of AI assistants, it is essential that they represent the diverse behaviors of real users. While existing works train user simulators to generate human-like responses, whether they capture the broad and heterogeneous distribution of real user behaviors remains an open question. In this work, we introduce a method to measure the distributional gap between real and simulated user behaviors, validated through a human study and ablations. Given a dataset of real and simulated conversations, our method extracts representations of user behavior from each conversation, quantizes them into discrete distributions via clustering, then computes divergence metrics. We provide the first systematic evaluation of 24 LLM-based user simulators on coding and writing tasks, and reveal a large distributional gap from real…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
