Performative Reinforcement Learning with Linear Markov Decision Process

Debmalya Mandal; and Goran Radanovic

arXiv:2411.05234·cs.LG·March 18, 2025

Performative Reinforcement Learning with Linear Markov Decision Process

Debmalya Mandal, and Goran Radanovic

PDF

Open Access

TL;DR

This paper extends performative reinforcement learning to linear Markov decision processes, demonstrating convergence of regularized policy optimization and empirical saddle point methods under finite samples, with applications to multi-agent systems.

Contribution

It generalizes performative RL results from tabular to linear MDPs, introducing new convergence analysis without strong convexity and an empirical saddle point algorithm.

Findings

01

Repeated optimization converges to performatively stable policies.

02

Empirical saddle point method converges under bounded coverage.

03

Framework applies to multi-agent systems.

Abstract

We study the setting of \emph{performative reinforcement learning} where the deployed policy affects both the reward, and the transition of the underlying Markov decision process. Prior work~\parencite{MTR23} has addressed this problem under the tabular setting and established last-iterate convergence of repeated retraining with iteration complexity explicitly depending on the number of states. In this work, we generalize the results to \emph{linear Markov decision processes} which is the primary theoretical model of large-scale MDPs. The main challenge with linear MDP is that the regularized objective is no longer strongly convex and we want a bound that scales with the dimension of the features, rather than states which can be infinite. Our first result shows that repeatedly optimizing a regularized objective converges to a \emph{performatively stable policy}. In the absence of strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSupply Chain and Inventory Management · Reinforcement Learning in Robotics

MethodsSparse Evolutionary Training