Provable Reinforcement Learning with a Short-Term Memory
Yonathan Efroni, Chi Jin, Akshay Krishnamurthy, Sobhan Miryoosefi

TL;DR
This paper introduces a new subclass of POMDPs where recent history of limited length suffices for decision making, providing bounds on sample complexity and algorithms that leverage short-term memory for efficient reinforcement learning.
Contribution
It defines a new POMDP subclass with short-term memory decoding, and develops algorithms with sample complexity bounds that depend on memory length, not horizon or observation size.
Findings
Short-term memory suffices for RL in the new POMDP subclass.
Sample complexity scales exponentially with memory length, not horizon.
Algorithms are effective in both tabular and rich-observation settings.
Abstract
Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions. Coping with partial observability in general is extremely challenging, as a number of worst-case statistical and computational barriers are known in learning Partially Observable Markov Decision Processes (POMDPs). Motivated by the problem structure in several physical applications, as well as a commonly used technique known as "frame stacking", this paper proposes to study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length . We establish a set of upper and lower bounds on the sample complexity for learning near-optimal policies for this class of problems in both tabular and rich-observation settings (where the number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research
