Contextual Decision Processes with Low Bellman Rank are PAC-Learnable

Nan Jiang; Akshay Krishnamurthy; Alekh Agarwal; John Langford; Robert; E. Schapire

arXiv:1610.09512·cs.LG·December 2, 2016·153 cites

Contextual Decision Processes with Low Bellman Rank are PAC-Learnable

Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert, E. Schapire

PDF

Open Access

TL;DR

This paper introduces a new model called contextual decision processes and a complexity measure, the Bellman rank, which enables efficient reinforcement learning with rich observations and function approximation, achieving PAC guarantees.

Contribution

It defines the Bellman rank as a key complexity measure and develops a new RL algorithm that learns near-optimal policies efficiently in low Bellman rank settings.

Findings

01

Bellman rank is small in many RL settings

02

The proposed algorithm learns near-optimal policies with polynomial sample complexity

03

Sample complexity is independent of the number of observations

Abstract

This paper studies systematic exploration for reinforcement learning with rich observations and function approximation. We introduce a new model called contextual decision processes, that unifies and generalizes most prior settings. Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in these processes and is naturally small for many well-studied reinforcement learning settings. Our second contribution is a new reinforcement learning algorithm that engages in systematic exploration to learn contextual decision processes with low Bellman rank. Our algorithm provably learns near-optimal behavior with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The approach uses Bellman error minimization with optimistic exploration and provides new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Fault Detection and Control Systems · Bayesian Modeling and Causal Inference