STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation

Hossein Goli; Michael Gimelfarb; Nathan Samuel de Lara; Haruki Nishimura; Masha Itkina; Florian Shkurti

arXiv:2505.20781·cs.RO·May 28, 2025

STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation

Hossein Goli, Michael Gimelfarb, Nathan Samuel de Lara, Haruki Nishimura, Masha Itkina, Florian Shkurti

PDF

Open Access 1 Video

TL;DR

STITCH-OPE introduces a diffusion-based model framework for accurate off-policy evaluation in high-dimensional, long-horizon problems, significantly reducing variance and improving estimation accuracy.

Contribution

It presents a novel diffusion-guided trajectory stitching method that enhances long-horizon off-policy evaluation with theoretical variance reduction guarantees.

Findings

01

Outperforms existing methods on D4RL and OpenAI Gym benchmarks.

02

Achieves lower mean squared error and higher correlation in estimates.

03

Demonstrates exponential variance reduction in theoretical analysis.

Abstract

Off-policy evaluation (OPE) estimates the performance of a target policy using offline data collected from a behavior policy, and is crucial in domains such as robotics or healthcare where direct interaction with the environment is costly or unsafe. Existing OPE methods are ineffective for high-dimensional, long-horizon problems, due to exponential blow-ups in variance from importance weighting or compounding errors from learned dynamics models. To address these challenges, we propose STITCH-OPE, a model-based generative framework that leverages denoising diffusion for long-horizon OPE in high-dimensional state and action spaces. Starting with a diffusion model pre-trained on the behavior data, STITCH-OPE generates synthetic trajectories from the target policy by guiding the denoising process using the score function of the target policy. STITCH-OPE proposes two technical innovations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation· slideslive

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Anomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning

MethodsDiffusion