No-Regret Reinforcement Learning in Smooth MDPs

Davide Maran; Alberto Maria Metelli; Matteo Papini; Marcello Restell

arXiv:2402.03792·cs.LG·February 7, 2024·1 cites

No-Regret Reinforcement Learning in Smooth MDPs

Davide Maran, Alberto Maria Metelli, Matteo Papini, Marcello Restell

PDF

Open Access

TL;DR

This paper introduces a new smoothness assumption for MDPs and proposes two algorithms that achieve the best regret guarantees in reinforcement learning with continuous state and action spaces.

Contribution

The paper proposes a novel $ u$-smoothness assumption for MDPs and develops two algorithms with improved regret guarantees for continuous RL problems.

Findings

01

Both algorithms achieve the best regret guarantees compared to state-of-the-art.

02

Legendre-Eleanor is more general but computationally inefficient.

03

Legendre-LSVI is computationally efficient but applies to a smaller class of problems.

Abstract

Obtaining no-regret guarantees for reinforcement learning (RL) in the case of problems with continuous state and/or action spaces is still one of the major open challenges in the field. Recently, a variety of solutions have been proposed, but besides very specific settings, the general problem remains unsolved. In this paper, we introduce a novel structural assumption on the Markov decision processes (MDPs), namely $ν -$ smoothness, that generalizes most of the settings proposed so far (e.g., linear MDPs and Lipschitz MDPs). To face this challenging scenario, we propose two algorithms for regret minimization in $ν -$ smooth MDPs. Both algorithms build upon the idea of constructing an MDP representation through an orthogonal feature map based on Legendre polynomials. The first algorithm, \textsc{Legendre-Eleanor}, archives the no-regret property under weaker assumptions but is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research