Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles
Jiesong Lian, Yucong Huang, Chengdong Ma, Mingzhi Wang, Ying Wen, Long Hu, Yixue Hao

TL;DR
Fusion-PSRO introduces Nash Policy Fusion to improve policy initialization in PSRO, leveraging past policies and dynamic weighting to better approximate Nash Equilibrium in zero-sum games.
Contribution
It proposes a novel Nash Policy Fusion method for PSRO, enhancing policy initialization and convergence to NE by utilizing past policies and adaptive weighting.
Findings
Achieves lower exploitability on benchmark games
Improves policy population quality over iterations
Mitigates previous initialization shortcomings
Abstract
For solving zero-sum games involving non-transitivity, a useful approach is to maintain a policy population to approximate the Nash Equilibrium (NE). Previous studies have shown that the Policy Space Response Oracles (PSRO) algorithm is an effective framework for solving such games. However, current methods initialize a new policy from scratch or inherit a single historical policy in Best Response (BR), missing the opportunity to leverage past policies to generate a better BR. In this paper, we propose Fusion-PSRO, which employs Nash Policy Fusion to initialize a new policy for BR training. Nash Policy Fusion serves as an implicit guiding policy that starts exploration on the current Meta-NE, thus providing a closer approximation to BR. Moreover, it insightfully captures a weighted moving average of past policies, dynamically adjusting these weights based on the Meta-NE in each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAccess Control and Trust
MethodsBalanced Selection
