Model-Free Assessment of Simulator Fidelity via Quantile Curves
Garud Iyengar, Yu-Shiou Willy Lin, Kaizheng Wang

TL;DR
This paper introduces a model-free, statistical approach to quantify the discrepancy between real and simulated systems across scenarios by constructing confidence sets for latent parameters and estimating their distributional risk profile.
Contribution
It develops a robust, model-agnostic method to assess simulator fidelity using quantile functions of a discrepancy proxy, applicable to diverse output types.
Findings
Effectively evaluates four major LLMs against human data.
Provides a distribution-level risk profile for simulator assessment.
Supports statistical inference and comparison across different simulators.
Abstract
As generative AI models are increasingly used to simulate real-world systems, quantifying the ``sim-to-real'' gap is critical. For each input setting of interest -- which we call a \emph{scenario}, such as a survey question or operating condition -- the real and simulated systems are associated with unobserved latent population parameters, and their discrepancy varies across scenarios. A fundamental challenge is that, for any given scenario, this discrepancy cannot be observed directly, since both systems are accessible only through finite samples, often of heterogeneous sizes across scenarios. Standard predictive inference methods are therefore ill-suited, as they quantify uncertainty in observable outputs rather than latent population parameters. To address this, we construct confidence sets for these latent parameters and use them to derive a robust proxy for the sim-to-real…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
