Performative Policy Gradient: Optimality in Performative Reinforcement Learning

Debabrota Basu; Udvas Das; Brahim Driss; Uddalak Mukherjee

arXiv:2512.20576·cs.LG·February 3, 2026

Performative Policy Gradient: Optimality in Performative Reinforcement Learning

Debabrota Basu, Udvas Das, Brahim Driss, Uddalak Mukherjee

PDF

Open Access

TL;DR

This paper introduces the Performative Policy Gradient (PePG) algorithm for reinforcement learning environments where policies influence the environment, proving its convergence to performatively optimal policies and demonstrating superior empirical performance.

Contribution

The paper extends performative RL by developing the first policy gradient algorithm that guarantees convergence to performatively optimal policies, with theoretical proofs and empirical validation.

Findings

01

PePG converges to performatively optimal policies under softmax parametrisation.

02

PePG outperforms existing performative RL algorithms in standard environments.

03

Theoretical extensions of performance difference lemma and policy gradient theorem to performative RL.

Abstract

Post-deployment machine learning algorithms often influence the environments they act in, and thus shift the underlying dynamics that the standard reinforcement learning (RL) methods ignore. While designing optimal algorithms in this performative setting has recently been studied in supervised learning, the RL counterpart remains under-explored. In this paper, we prove the performative counterparts of the performance difference lemma and the policy gradient theorem in RL, and further introduce the Performative Policy Gradient algorithm (PePG). PePG is the first policy gradient algorithm designed to account for performativity in RL. Under softmax parametrisation, and also with and without entropy regularisation, we prove that PePG converges to performatively optimal policies, i.e. policies that remain optimal under the distribution shifts induced by themselves. Thus, PePG significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Memory and Neural Computing · Stochastic Gradient Optimization Techniques