Team Variance Optimization of n-Player Stochastic Games with Separately Controlled Chains
Li Xia

TL;DR
This paper addresses a complex class of n-player stochastic games with separate control of internal states, proposing sensitivity-based optimization and a bilevel algorithm to find equilibrium policies for minimizing team variance.
Contribution
It introduces a novel sensitivity analysis framework and a bilevel optimization algorithm for decentralized policy optimization in non-Markovian, team-based stochastic games.
Findings
Derived difference and derivative formulas for team variance.
Proved existence of stationary pure Nash equilibrium.
Demonstrated algorithm convergence and effectiveness in smart grid energy management.
Abstract
In this paper, we study a subclass of n-player stochastic games, in which each player has their own internal state controlled only by their own action and their objective is a common goal called team variance which measures the total variation of the random rewards of all players. It is assumed that players cannot observe each others' state/action. Thus, players' internal chains are controlled separately by their own action and they are coupled through the objective of team variance. Since the variance metric is not additive or Markovian, the dynamic programming principle fails in this problem. We study this problem from the viewpoint of sensitivity-based optimization. A difference formula and a derivative formula for team variance with respect to policy perturbations are derived, which provide sensitivity information to guide decentralized optimization. The existence of a stationary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
