Controllable User Simulation

Guy Tennenholtz; Ofer Meshi; Amir Globerson; Uri Shalit; Jihwan Jeong; Craig Boutilier

arXiv:2605.11519·cs.AI·May 13, 2026

Controllable User Simulation

Guy Tennenholtz, Ofer Meshi, Amir Globerson, Uri Shalit, Jihwan Jeong, Craig Boutilier

PDF

TL;DR

This paper formalizes controllable user simulation as a causal inference problem, identifying biases in existing methods and proposing mitigations to improve evaluation of conversational agents.

Contribution

It introduces a causal framework for controllable simulation, analyzes biases in supervised fine-tuning, and proposes practical training methods for unbiased, robust simulators.

Findings

01

Standard fine-tuning introduces look-ahead bias and reduces diversity.

02

Causally grounded simulators eliminate bias and maintain natural variance.

03

Proposed methods improve zero-shot generalization to unseen behaviors.

Abstract

Using offline datasets to evaluate conversational agents often fails to cover rare scenarios or to support testing new policies. This has motivated the use of controllable user simulators for targeted, counterfactual evaluation, typically implemented by prompting or fine-tuning large language models. In this work, we formalize controllable simulation as a causal inference problem. By bridging natural language evaluation with off-policy evaluation methodology, we show that the standard practice of training simulators via supervised fine-tuning on post-hoc trajectory labels yields a structurally biased model. Specifically, these labels are inextricably coupled to the data-generating behavior policy, injecting a look-ahead bias that breaks causal consistency. Furthermore, we prove that under policy shift this failure causes the variance of evaluation metrics to explode geometrically, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.