STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation
Hossein Goli, Michael Gimelfarb, Nathan Samuel de Lara, Haruki Nishimura, Masha Itkina, Florian Shkurti

TL;DR
STITCH-OPE introduces a diffusion-based model framework for accurate off-policy evaluation in high-dimensional, long-horizon problems, significantly reducing variance and improving estimation accuracy.
Contribution
It presents a novel diffusion-guided trajectory stitching method that enhances long-horizon off-policy evaluation with theoretical variance reduction guarantees.
Findings
Outperforms existing methods on D4RL and OpenAI Gym benchmarks.
Achieves lower mean squared error and higher correlation in estimates.
Demonstrates exponential variance reduction in theoretical analysis.
Abstract
Off-policy evaluation (OPE) estimates the performance of a target policy using offline data collected from a behavior policy, and is crucial in domains such as robotics or healthcare where direct interaction with the environment is costly or unsafe. Existing OPE methods are ineffective for high-dimensional, long-horizon problems, due to exponential blow-ups in variance from importance weighting or compounding errors from learned dynamics models. To address these challenges, we propose STITCH-OPE, a model-based generative framework that leverages denoising diffusion for long-horizon OPE in high-dimensional state and action spaces. Starting with a diffusion model pre-trained on the behavior data, STITCH-OPE generates synthetic trajectories from the target policy by guiding the denoising process using the score function of the target policy. STITCH-OPE proposes two technical innovations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Anomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning
MethodsDiffusion
