
TL;DR
This paper extends the theory of state aggregation in reinforcement learning to environments beyond Markov Decision Processes, showing that value functions and policies can be effectively approximated even when the reduced process isn't an MDP.
Contribution
It generalizes existing aggregation results by demonstrating that value functions and policies of an associated MDP can solve more general RL problems beyond MDP assumptions.
Findings
Value functions and policies of an associated MDP solve the original problem.
Upper bounds on state space size are established for all RL problems.
RL algorithms for MDPs perform well beyond MDP environments.
Abstract
We consider a Reinforcement Learning setup where an agent interacts with an environment in observation-reward-action cycles without any (esp.\ MDP) assumptions on the environment. State aggregation and more generally feature reinforcement learning is concerned with mapping histories/raw-states to reduced/aggregated states. The idea behind both is that the resulting reduced process (approximately) forms a small stationary finite-state MDP, which can then be efficiently solved or learnt. We considerably generalize existing aggregation results by showing that even if the reduced process is not an MDP, the (q-)value functions and (optimal) policies of an associated MDP with same state-space size solve the original problem, as long as the solution can approximately be represented as a function of the reduced states. This implies an upper bound on the required state space size that holds…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques
