Reducing Planning Complexity of General Reinforcement Learning with Non-Markovian Abstractions
Sultan J. Majeed, Marcus Hutter

TL;DR
This paper introduces a new non-Markovian abstraction for general reinforcement learning that significantly reduces the complexity of planning by providing tighter bounds on the number of states needed for effective surrogate MDPs.
Contribution
The paper proposes a novel non-MDP abstraction that improves upon the existing ESA framework by offering much tighter upper bounds on the state complexity for planning in GRL.
Findings
New non-MDP abstraction with improved upper bounds
Bound reduced from exponential to near-logarithmic in actions
Action-sequentialization further tightens the bound
Abstract
The field of General Reinforcement Learning (GRL) formulates the problem of sequential decision-making from ground up. The history of interaction constitutes a "ground" state of the system, which never repeats. On the one hand, this generality allows GRL to model almost every domain possible, e.g.\ Bandits, MDPs, POMDPs, PSRs, and history-based environments. On the other hand, in general, the near-optimal policies in GRL are functions of complete history, which hinders not only learning but also planning in GRL. The usual way around for the planning part is that the agent is given a Markovian abstraction of the underlying process. So, it can use any MDP planning algorithm to find a near-optimal policy. The Extreme State Aggregation (ESA) framework has extended this idea to non-Markovian abstractions without compromising on the possibility of planning through a (surrogate) MDP. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Formal Methods in Verification
