Future-Dependent Value-Based Off-Policy Evaluation in POMDPs
Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor, Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun

TL;DR
This paper introduces a novel off-policy evaluation method for POMDPs using future-dependent value functions, addressing the horizon curse and enabling consistent estimation with general function approximation.
Contribution
It develops a new model-free OPE approach with future-dependent value functions, a Bellman equation with instrumental variables, and a minimax learning method, extending to dynamics learning and spectral methods.
Findings
The proposed estimator is PAC consistent under sufficient information conditions.
The method effectively addresses the curse of horizon in POMDPs.
Connections to spectral learning methods are established.
Abstract
We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs. Future-dependent value functions play similar roles as classical value functions in fully-observable MDPs. We derive a new Bellman equation for future-dependent value functions as conditional moment equations that use history proxies as instrumental variables. We further propose a minimax learning method to learn future-dependent value functions using the new Bellman equation. We obtain the PAC result, which implies our OPE estimator is consistent as long as futures and histories contain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Grid Energy Management
