Is Long Horizon Reinforcement Learning More Difficult Than Short Horizon   Reinforcement Learning?

Ruosong Wang; Simon S. Du; Lin F. Yang; Sham M. Kakade

arXiv:2005.00527·cs.LG·July 10, 2020·23 cites

Is Long Horizon Reinforcement Learning More Difficult Than Short Horizon Reinforcement Learning?

Ruosong Wang, Simon S. Du, Lin F. Yang, Sham M. Kakade

PDF

Open Access

TL;DR

This paper demonstrates that, contrary to previous beliefs, long horizon reinforcement learning in tabular settings can be as sample-efficient as short horizon learning when normalized appropriately, with complexity only logarithmic in the horizon.

Contribution

The work refutes the conjecture that sample complexity must polynomially depend on the horizon, showing it can be logarithmic, and introduces new techniques for policy class analysis and evaluation.

Findings

01

Sample complexity scales logarithmically with the horizon.

02

Introduces an $oldsymbol{ ext{ε}}$-net for optimal policies with logarithmic size.

03

Proposes the Online Trajectory Synthesis algorithm for adaptive policy evaluation.

Abstract

Learning to plan for long horizons is a central challenge in episodic reinforcement learning problems. A fundamental question is to understand how the difficulty of the problem scales as the horizon increases. Here the natural measure of sample complexity is a normalized one: we are interested in the number of episodes it takes to provably discover a policy whose value is $ε$ near to that of the optimal value, where the value is measured by the normalized cumulative reward in each episode. In a COLT 2018 open problem, Jiang and Agarwal conjectured that, for tabular, episodic reinforcement learning problems, there exists a sample complexity lower bound which exhibits a polynomial dependence on the horizon -- a conjecture which is consistent with all known sample complexity upper bounds. This work refutes this conjecture, proving that tabular, episodic reinforcement learning is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Optimization and Search Problems