MetaCURL: Non-stationary Concave Utility Reinforcement Learning
Bianca Marin Moreno (UGA, Thoth, EDF R&D, FiME Lab), Margaux, Br\'eg\`ere (LPSM, EDF R&D), Pierre Gaillard (UGA, Thoth), Nadia Oudjane (EDF, R&D, FiME Lab)

TL;DR
MetaCURL is a novel algorithm designed for non-stationary Markov decision processes, enabling effective reinforcement learning with convex performance criteria despite environment changes and partial information.
Contribution
It introduces MetaCURL, the first CURL algorithm for non-stationary MDPs, using a meta-algorithm with multiple instances and a sleeping expert framework to handle environment changes.
Findings
Achieves optimal dynamic regret without prior knowledge of MDP changes.
Handles full adversarial losses, not just stochastic environments.
Effectively manages non-stationarity with experts in partial information settings.
Abstract
We explore online learning in episodic loop-free Markov decision processes on non-stationary environments (changing losses and probability transitions). Our focus is on the Concave Utility Reinforcement Learning problem (CURL), an extension of classical RL for handling convex performance criteria in state-action distributions induced by agent policies. While various machine learning problems can be written as CURL, its non-linearity invalidates traditional Bellman equations. Despite recent solutions to classical CURL, none address non-stationary MDPs. This paper introduces MetaCURL, the first CURL algorithm for non-stationary MDPs. It employs a meta-algorithm running multiple black-box algorithms instances over different intervals, aggregating outputs via a sleeping expert framework. The key hurdle is partial information due to MDP uncertainty. Under partial information on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Data Stream Mining Techniques · Electricity Theft Detection Techniques
MethodsFocus
