Local Linearity: the Key for No-regret Reinforcement Learning in Continuous MDPs
Davide Maran, Alberto Maria Metelli, Matteo Papini, Marcello Restelli

TL;DR
This paper introduces a new class of Markov Decision Processes called Locally Linearizable MDPs, which enables the development of a no-regret reinforcement learning algorithm, Cinderella, that is effective in continuous environments with polynomial regret bounds.
Contribution
The paper defines Locally Linearizable MDPs, generalizes existing classes, and presents Cinderella, a no-regret RL algorithm with state-of-the-art bounds applicable to a broad range of continuous MDPs.
Findings
Cinderella achieves sublinear regret in Locally Linearizable MDPs.
All known feasible MDPs are representable as Locally Linearizable MDPs.
The approach generalizes and improves upon previous RL methods for continuous environments.
Abstract
Achieving the no-regret property for Reinforcement Learning (RL) problems in continuous state and action-space environments is one of the major open problems in the field. Existing solutions either work under very specific assumptions or achieve bounds that are vacuous in some regimes. Furthermore, many structural assumptions are known to suffer from a provably unavoidable exponential dependence on the time horizon in the regret, which makes any possible solution unfeasible in practice. In this paper, we identify local linearity as the feature that makes Markov Decision Processes (MDPs) both learnable (sublinear regret) and feasible (regret that is polynomial in ). We define a novel MDP representation class, namely Locally Linearizable MDPs, generalizing other representation classes like Linear MDPs and MDPS with low inherent Belmman error. Then, i) we introduce Cinderella, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Reinforcement Learning in Robotics
