Policy Iteration for Two-Player General-Sum Stochastic Stackelberg Games

Mikoto Kudo; Youhei Akimoto

arXiv:2405.06689·cs.GT·March 17, 2026

Policy Iteration for Two-Player General-Sum Stochastic Stackelberg Games

Mikoto Kudo, Youhei Akimoto

PDF

Open Access

TL;DR

This paper introduces a new policy iteration algorithm for two-player stochastic Stackelberg games that guarantees monotone improvement and converges to Pareto-optimal solutions, addressing limitations of existing methods.

Contribution

The paper develops a policy improvement theorem and a novel algorithm for SSGs that ensures monotone performance enhancement and convergence to Pareto-optimal equilibria.

Findings

01

The proposed method guarantees monotone improvement in leader's policy.

02

It converges to Pareto-optimal solutions when the leader is myopic.

03

The algorithm outperforms existing approaches in convergence guarantees.

Abstract

We address two-player general-sum stochastic Stackelberg games (SSGs), where the leader's policy is optimized considering the best-response follower whose policy is optimal for its reward under the leader. Existing policy gradient and value iteration approaches for SSGs do not guarantee monotone improvement in the leader's policy under the best-response follower. Consequently, their performance is not guaranteed when their limits are not stationary Stackelberg equilibria (SSEs), which do not necessarily exist. In this paper, we derive a policy improvement theorem for SSGs under the best-response follower and propose a novel policy iteration algorithm that guarantees monotone improvement in the leader's performance. Additionally, we introduce Pareto-optimality as an extended optimality of the SSE and prove that our method converges to the Pareto front when the leader is myopic.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Portfolio Optimization · Stochastic processes and financial applications

MethodsStochastic Steady-state Embedding