Performative Reinforcement Learning with Linear Markov Decision Process
Debmalya Mandal, and Goran Radanovic

TL;DR
This paper extends performative reinforcement learning to linear Markov decision processes, demonstrating convergence of regularized policy optimization and empirical saddle point methods under finite samples, with applications to multi-agent systems.
Contribution
It generalizes performative RL results from tabular to linear MDPs, introducing new convergence analysis without strong convexity and an empirical saddle point algorithm.
Findings
Repeated optimization converges to performatively stable policies.
Empirical saddle point method converges under bounded coverage.
Framework applies to multi-agent systems.
Abstract
We study the setting of \emph{performative reinforcement learning} where the deployed policy affects both the reward, and the transition of the underlying Markov decision process. Prior work~\parencite{MTR23} has addressed this problem under the tabular setting and established last-iterate convergence of repeated retraining with iteration complexity explicitly depending on the number of states. In this work, we generalize the results to \emph{linear Markov decision processes} which is the primary theoretical model of large-scale MDPs. The main challenge with linear MDP is that the regularized objective is no longer strongly convex and we want a bound that scales with the dimension of the features, rather than states which can be infinite. Our first result shows that repeatedly optimizing a regularized objective converges to a \emph{performatively stable policy}. In the absence of strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSupply Chain and Inventory Management · Reinforcement Learning in Robotics
MethodsSparse Evolutionary Training
