MetaCURL: Non-stationary Concave Utility Reinforcement Learning

Bianca Marin Moreno (UGA; Thoth; EDF R&D; FiME Lab); Margaux; Br\'eg\`ere (LPSM; EDF R&D); Pierre Gaillard (UGA; Thoth); Nadia Oudjane (EDF; R&D; FiME Lab)

arXiv:2405.19807·cs.LG·May 31, 2024

MetaCURL: Non-stationary Concave Utility Reinforcement Learning

Bianca Marin Moreno (UGA, Thoth, EDF R&D, FiME Lab), Margaux, Br\'eg\`ere (LPSM, EDF R&D), Pierre Gaillard (UGA, Thoth), Nadia Oudjane (EDF, R&D, FiME Lab)

PDF

Open Access 1 Video

TL;DR

MetaCURL is a novel algorithm designed for non-stationary Markov decision processes, enabling effective reinforcement learning with convex performance criteria despite environment changes and partial information.

Contribution

It introduces MetaCURL, the first CURL algorithm for non-stationary MDPs, using a meta-algorithm with multiple instances and a sleeping expert framework to handle environment changes.

Findings

01

Achieves optimal dynamic regret without prior knowledge of MDP changes.

02

Handles full adversarial losses, not just stochastic environments.

03

Effectively manages non-stationarity with experts in partial information settings.

Abstract

We explore online learning in episodic loop-free Markov decision processes on non-stationary environments (changing losses and probability transitions). Our focus is on the Concave Utility Reinforcement Learning problem (CURL), an extension of classical RL for handling convex performance criteria in state-action distributions induced by agent policies. While various machine learning problems can be written as CURL, its non-linearity invalidates traditional Bellman equations. Despite recent solutions to classical CURL, none address non-stationary MDPs. This paper introduces MetaCURL, the first CURL algorithm for non-stationary MDPs. It employs a meta-algorithm running multiple black-box algorithms instances over different intervals, aggregating outputs via a sleeping expert framework. The key hurdle is partial information due to MDP uncertainty. Under partial information on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

MetaCURL: Non-stationary Concave Utility Reinforcement Learning· slideslive

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Data Stream Mining Techniques · Electricity Theft Detection Techniques

MethodsFocus