Provably Good Batch Reinforcement Learning Without Great Exploration
Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

TL;DR
This paper introduces a conservative modification to batch reinforcement learning algorithms that guarantees near-optimal policies within the explored data, even without strong concentrability assumptions, improving reliability in high-stakes applications.
Contribution
It proposes a conservative Bellman backup method that provides stronger performance guarantees and can identify near-optimal policies within the data support without requiring strong distributional assumptions.
Findings
The modified algorithm finds approximately best policies within the data support.
It outperforms existing batch RL methods in standard benchmarks.
The approach is robust even when traditional assumptions do not hold.
Abstract
Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes tasks. Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance. Recent algorithms have shown promise but can still be overly optimistic in their expected outcomes. Theoretical work that provides strong guarantees on the performance of the output policy relies on a strong concentrability assumption, that makes it unsuitable for cases where the ratio between state-action distributions of behavior policy and some candidate policies is large. This is because in the traditional analysis, the error…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Scheduling and Optimization Algorithms · Advanced Control Systems Optimization
