Offline Policy Evaluation for Manipulation Policies via Discounted Liveness Formulation
Hao Wang, Joshua Bowden, Colton Crosby, Somil Bansal

TL;DR
This paper introduces a liveness-based Bellman operator for offline policy evaluation in robotic manipulation, effectively addressing finite-horizon truncation bias and improving task progress estimation.
Contribution
It proposes a novel framework that interprets policy evaluation as a task-completion problem, providing theoretical guarantees and practical improvements over classical methods.
Findings
The method reduces truncation bias in finite-horizon evaluations.
It outperforms classical baselines like TD(0) and Monte Carlo methods.
Empirical results show improved accuracy in reflecting task progress.
Abstract
Policy evaluation is a fundamental component of the development and deployment pipeline for robotic policies. In modern manipulation systems, this problem is particularly challenging: rewards are often sparse, task progression of evaluation rollouts are often non-monotonic as the policies exhibit recovery behaviors, and evaluation rollouts are necessarily of finite length. This finite length introduces truncation bias, breaking the infinite-horizon assumptions underlying standard methods relying on Bellman equations/principle of optimality. In this work, we propose a framework for offline policy evaluation from sparse rewards based on a liveness-based Bellman operator. Our formulation interprets policy evaluation as a task-completion problem and yields a conservative fixed-point value function that is robust to finite-horizon truncation. We analyze the theoretical properties of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
