Cross-fitted Proximal Learning for Model-Based Reinforcement Learning
Nishanth Venkatesh, Andreas A. Malikopoulos

TL;DR
This paper introduces a cross-fitted estimation method for bridge functions in confounded POMDPs, improving data efficiency and theoretical understanding in model-based reinforcement learning.
Contribution
It develops a K-fold cross-fitted extension of the two-stage bridge estimator, enhancing estimation efficiency and providing theoretical error bounds.
Findings
The cross-fitted estimator outperforms single-split methods in data efficiency.
Derived an oracle-comparator bound for the cross-fitted estimator.
Decomposed estimation error into nuisance and empirical averaging components.
Abstract
Model-based reinforcement learning is attractive for sequential decision-making because it explicitly estimates reward and transition models and then supports planning through simulated rollouts. In offline settings with hidden confounding, however, models learned directly from observational data may be biased. This challenge is especially pronounced in partially observable systems, where latent factors may jointly affect actions, rewards, and future observations. Recent work has shown that policy evaluation in such confounded partially observable Markov decision processes (POMDPs) can be reduced to estimating reward-emission and observation-transition bridge functions satisfying conditional moment restrictions (CMRs). In this paper, we study the statistical estimation of these bridge functions. We formulate bridge learning as a CMR problem with nuisance objects given by a conditional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
