Provable Reinforcement Learning with a Short-Term Memory

Yonathan Efroni; Chi Jin; Akshay Krishnamurthy; Sobhan Miryoosefi

arXiv:2202.03983·cs.LG·February 9, 2022·1 cites

Provable Reinforcement Learning with a Short-Term Memory

Yonathan Efroni, Chi Jin, Akshay Krishnamurthy, Sobhan Miryoosefi

PDF

Open Access

TL;DR

This paper introduces a new subclass of POMDPs where recent history of limited length suffices for decision making, providing bounds on sample complexity and algorithms that leverage short-term memory for efficient reinforcement learning.

Contribution

It defines a new POMDP subclass with short-term memory decoding, and develops algorithms with sample complexity bounds that depend on memory length, not horizon or observation size.

Findings

01

Short-term memory suffices for RL in the new POMDP subclass.

02

Sample complexity scales exponentially with memory length, not horizon.

03

Algorithms are effective in both tabular and rich-observation settings.

Abstract

Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions. Coping with partial observability in general is extremely challenging, as a number of worst-case statistical and computational barriers are known in learning Partially Observable Markov Decision Processes (POMDPs). Motivated by the problem structure in several physical applications, as well as a commonly used technique known as "frame stacking", this paper proposes to study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length $m$ . We establish a set of upper and lower bounds on the sample complexity for learning near-optimal policies for this class of problems in both tabular and rich-observation settings (where the number of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research