Local Linearity: the Key for No-regret Reinforcement Learning in   Continuous MDPs

Davide Maran; Alberto Maria Metelli; Matteo Papini; Marcello Restelli

arXiv:2410.24071·cs.LG·November 1, 2024

Local Linearity: the Key for No-regret Reinforcement Learning in Continuous MDPs

Davide Maran, Alberto Maria Metelli, Matteo Papini, Marcello Restelli

PDF

Open Access

TL;DR

This paper introduces a new class of Markov Decision Processes called Locally Linearizable MDPs, which enables the development of a no-regret reinforcement learning algorithm, Cinderella, that is effective in continuous environments with polynomial regret bounds.

Contribution

The paper defines Locally Linearizable MDPs, generalizes existing classes, and presents Cinderella, a no-regret RL algorithm with state-of-the-art bounds applicable to a broad range of continuous MDPs.

Findings

01

Cinderella achieves sublinear regret in Locally Linearizable MDPs.

02

All known feasible MDPs are representable as Locally Linearizable MDPs.

03

The approach generalizes and improves upon previous RL methods for continuous environments.

Abstract

Achieving the no-regret property for Reinforcement Learning (RL) problems in continuous state and action-space environments is one of the major open problems in the field. Existing solutions either work under very specific assumptions or achieve bounds that are vacuous in some regimes. Furthermore, many structural assumptions are known to suffer from a provably unavoidable exponential dependence on the time horizon $H$ in the regret, which makes any possible solution unfeasible in practice. In this paper, we identify local linearity as the feature that makes Markov Decision Processes (MDPs) both learnable (sublinear regret) and feasible (regret that is polynomial in $H$ ). We define a novel MDP representation class, namely Locally Linearizable MDPs, generalizing other representation classes like Linear MDPs and MDPS with low inherent Belmman error. Then, i) we introduce Cinderella, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Reinforcement Learning in Robotics