Selecting the State-Representation in Reinforcement Learning

Odalric-Ambrym Maillard; R\'emi Munos; Daniil Ryabko

arXiv:1302.2552·cs.LG·February 12, 2013·30 cites

Selecting the State-Representation in Reinforcement Learning

Odalric-Ambrym Maillard, R\'emi Munos, Daniil Ryabko

PDF

Open Access

TL;DR

This paper addresses the challenge of selecting an appropriate state-representation in reinforcement learning, proposing an algorithm that learns to maximize reward without prior knowledge of the correct model, achieving sublinear regret.

Contribution

It introduces a novel algorithm for state-representation selection in RL that guarantees near-optimal reward without knowing the true model beforehand.

Findings

01

Achieves regret of order T^{2/3} over horizon T.

02

Effectively identifies the best state-representation among models.

03

Provides theoretical guarantees for the proposed method.

Abstract

The problem of selecting the right state-representation in a reinforcement learning problem is considered. Several models (functions mapping past observations to a finite set) of the observations are given, and it is known that for at least one of these models the resulting state dynamics are indeed Markovian. Without knowing neither which of the models is the correct one, nor what are the probabilistic characteristics of the resulting MDP, it is required to obtain as much reward as the optimal policy for the correct model (or for the best of the correct models, if there are several). We propose an algorithm that achieves that, with a regret of order T^{2/3} where T is the horizon time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Data Stream Mining Techniques