ReLExS: Reinforcement Learning Explanations for Stackelberg No-Regret Learners
Xiangge Huang, Jingyuan Li, Jiaqing Xie

TL;DR
This paper investigates how reinforcement learning explanations influence the convergence to Stackelberg equilibrium in two-player no-regret games, establishing conditions under which equilibrium is achieved and utility bounds are maintained.
Contribution
It demonstrates that players can reach Stackelberg equilibrium under no regret constraints and provides bounds on utility differences and total game utility.
Findings
Players reach Stackelberg equilibrium with reward-average strategies.
A strict upper bound on follower utility difference is established.
Total game utility remains bounded in constant-sum scenarios.
Abstract
With the constraint of a no regret follower, will the players in a two-player Stackelberg game still reach Stackelberg equilibrium? We first show when the follower strategy is either reward-average or transform-reward-average, the two players can always get the Stackelberg Equilibrium. Then, we extend that the players can achieve the Stackelberg equilibrium in the two-player game under the no regret constraint. Also, we show a strict upper bound of the follower's utility difference between with and without no regret constraint. Moreover, in constant-sum two-player Stackelberg games with non-regret action sequences, we ensure the total optimal utility of the game remains also bounded.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Data Stream Mining Techniques · Reinforcement Learning in Robotics
