Offline Policy Evaluation for Manipulation Policies via Discounted Liveness Formulation

Hao Wang; Joshua Bowden; Colton Crosby; Somil Bansal

arXiv:2605.11479·cs.RO·May 13, 2026

Offline Policy Evaluation for Manipulation Policies via Discounted Liveness Formulation

Hao Wang, Joshua Bowden, Colton Crosby, Somil Bansal

PDF

TL;DR

This paper introduces a liveness-based Bellman operator for offline policy evaluation in robotic manipulation, effectively addressing finite-horizon truncation bias and improving task progress estimation.

Contribution

It proposes a novel framework that interprets policy evaluation as a task-completion problem, providing theoretical guarantees and practical improvements over classical methods.

Findings

01

The method reduces truncation bias in finite-horizon evaluations.

02

It outperforms classical baselines like TD(0) and Monte Carlo methods.

03

Empirical results show improved accuracy in reflecting task progress.

Abstract

Policy evaluation is a fundamental component of the development and deployment pipeline for robotic policies. In modern manipulation systems, this problem is particularly challenging: rewards are often sparse, task progression of evaluation rollouts are often non-monotonic as the policies exhibit recovery behaviors, and evaluation rollouts are necessarily of finite length. This finite length introduces truncation bias, breaking the infinite-horizon assumptions underlying standard methods relying on Bellman equations/principle of optimality. In this work, we propose a framework for offline policy evaluation from sparse rewards based on a liveness-based Bellman operator. Our formulation interprets policy evaluation as a task-completion problem and yields a conservative fixed-point value function that is robust to finite-horizon truncation. We analyze the theoretical properties of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.