Extreme State Aggregation Beyond MDPs

Marcus Hutter

arXiv:1407.3341·cs.AI·July 15, 2014

Extreme State Aggregation Beyond MDPs

Marcus Hutter

PDF

Open Access

TL;DR

This paper extends the theory of state aggregation in reinforcement learning to environments beyond Markov Decision Processes, showing that value functions and policies can be effectively approximated even when the reduced process isn't an MDP.

Contribution

It generalizes existing aggregation results by demonstrating that value functions and policies of an associated MDP can solve more general RL problems beyond MDP assumptions.

Findings

01

Value functions and policies of an associated MDP solve the original problem.

02

Upper bounds on state space size are established for all RL problems.

03

RL algorithms for MDPs perform well beyond MDP environments.

Abstract

We consider a Reinforcement Learning setup where an agent interacts with an environment in observation-reward-action cycles without any (esp.\ MDP) assumptions on the environment. State aggregation and more generally feature reinforcement learning is concerned with mapping histories/raw-states to reduced/aggregated states. The idea behind both is that the resulting reduced process (approximately) forms a small stationary finite-state MDP, which can then be efficiently solved or learnt. We considerably generalize existing aggregation results by showing that even if the reduced process is not an MDP, the (q-)value functions and (optimal) policies of an associated MDP with same state-space size solve the original problem, as long as the solution can approximately be represented as a function of the reduced states. This implies an upper bound on the required state space size that holds…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques