Smoothing Policy Iteration for Zero-sum Markov Games
Yangang Ren, Yao Lyu, Wenxuan Wang, Shengbo Eben Li, Zeyang Li,, Jingliang Duan

TL;DR
This paper introduces the smoothing policy iteration (SPI) algorithm for zero-sum Markov Games, replacing the max operator with a smooth approximation to efficiently find equilibrium policies in complex, large-scale action spaces.
Contribution
The paper proposes the SPI algorithm using WLSE for approximate solutions, proves its convergence, and extends it with a model-based actor-critic method for improved robustness and training stability.
Findings
SPI accurately approximates worst-case value functions.
SaAC stabilizes training and enhances adversarial robustness.
Algorithms are effective in both tabular and function approximation settings.
Abstract
Zero-sum Markov Games (MGs) has been an efficient framework for multi-agent systems and robust control, wherein a minimax problem is constructed to solve the equilibrium policies. At present, this formulation is well studied under tabular settings wherein the maximum operator is primarily and exactly solved to calculate the worst-case value function. However, it is non-trivial to extend such methods to handle complex tasks, as finding the maximum over large-scale action spaces is usually cumbersome. In this paper, we propose the smoothing policy iteration (SPI) algorithm to solve the zero-sum MGs approximately, where the maximum operator is replaced by the weighted LogSumExp (WLSE) function to obtain the nearly optimal equilibrium policies. Specially, the adversarial policy is served as the weight function to enable an efficient sampling over action spaces.We also prove the convergence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics
