Mind the Sim-to-Real Gap & Think Like a Scientist
Harsh Parikh, Gabriel Levin-Konigsberg, Dominique Perrault-Joncas, Alexander Volfovsky

TL;DR
This paper analyzes how to effectively combine simulation and real experiments in sequential decision problems, providing theoretical insights and a new policy for optimal experimentation.
Contribution
It introduces a decomposition of simulation error, analyzes the value gap in policies, and proposes Fisher-SEP, a novel simulation-aided experimental policy.
Findings
Simulation error decomposes into calibration-shift and residual components.
The value gap splits into local and reachability components, with the latter bounded under passive learning.
Fisher-SEP minimizes posterior predictive variance, improving decision-making in case studies.
Abstract
Suppose a planner has a pre-trained simulator of a sequential decision problem and the option to run real experiments in the field. The simulator is cheap to query but inherits confounding and drift from its calibration data. Experimentation is unbiased but consumes one real unit per trial. We study when, and how, the planner should supplement the simulator with experiments. We give three results. First, an extended simulation lemma decomposes the simulator's value error into a calibration--deployment shift that randomization can identify and a parametric residual that no further interaction can reduce. Second, the value gap between the simulator-optimal policy and the optimum splits into a local component, on states the deployed policy already visits, and a reachability component, on states it does not. The reachability component stays bounded away from zero at any horizon under purely…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
