Policy Iteration for Two-Player General-Sum Stochastic Stackelberg Games
Mikoto Kudo, Youhei Akimoto

TL;DR
This paper introduces a new policy iteration algorithm for two-player stochastic Stackelberg games that guarantees monotone improvement and converges to Pareto-optimal solutions, addressing limitations of existing methods.
Contribution
The paper develops a policy improvement theorem and a novel algorithm for SSGs that ensures monotone performance enhancement and convergence to Pareto-optimal equilibria.
Findings
The proposed method guarantees monotone improvement in leader's policy.
It converges to Pareto-optimal solutions when the leader is myopic.
The algorithm outperforms existing approaches in convergence guarantees.
Abstract
We address two-player general-sum stochastic Stackelberg games (SSGs), where the leader's policy is optimized considering the best-response follower whose policy is optimal for its reward under the leader. Existing policy gradient and value iteration approaches for SSGs do not guarantee monotone improvement in the leader's policy under the best-response follower. Consequently, their performance is not guaranteed when their limits are not stationary Stackelberg equilibria (SSEs), which do not necessarily exist. In this paper, we derive a policy improvement theorem for SSGs under the best-response follower and propose a novel policy iteration algorithm that guarantees monotone improvement in the leader's performance. Additionally, we introduce Pareto-optimality as an extended optimality of the SSE and prove that our method converges to the Pareto front when the leader is myopic.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Portfolio Optimization · Stochastic processes and financial applications
MethodsStochastic Steady-state Embedding
