Selecting the State-Representation in Reinforcement Learning
Odalric-Ambrym Maillard, R\'emi Munos, Daniil Ryabko

TL;DR
This paper addresses the challenge of selecting an appropriate state-representation in reinforcement learning, proposing an algorithm that learns to maximize reward without prior knowledge of the correct model, achieving sublinear regret.
Contribution
It introduces a novel algorithm for state-representation selection in RL that guarantees near-optimal reward without knowing the true model beforehand.
Findings
Achieves regret of order T^{2/3} over horizon T.
Effectively identifies the best state-representation among models.
Provides theoretical guarantees for the proposed method.
Abstract
The problem of selecting the right state-representation in a reinforcement learning problem is considered. Several models (functions mapping past observations to a finite set) of the observations are given, and it is known that for at least one of these models the resulting state dynamics are indeed Markovian. Without knowing neither which of the models is the correct one, nor what are the probabilistic characteristics of the resulting MDP, it is required to obtain as much reward as the optimal policy for the correct model (or for the best of the correct models, if there are several). We propose an algorithm that achieves that, with a regret of order T^{2/3} where T is the horizon time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Data Stream Mining Techniques
