Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games

Runyu Lu; Peng Zhang; Ruochuan Shi; Yuanheng Zhu; Dongbin Zhao; Yang Liu; Dong Wang; Cesare Alippi

arXiv:2511.00811·cs.LG·December 15, 2025

Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games

Runyu Lu, Peng Zhang, Ruochuan Shi, Yuanheng Zhu, Dongbin Zhao, Yang Liu, Dong Wang, Cesare Alippi

PDF

Open Access

TL;DR

This paper introduces an Equilibrium Policy Generalization framework that enables reinforcement learning agents to perform zero-shot generalization across different graph structures in pursuit-evasion games, improving real-time applicability.

Contribution

The paper presents a novel framework for cross-graph zero-shot generalization in pursuit-evasion games, including a dynamic programming algorithm for equilibrium computation and mechanisms for scalability.

Findings

01

EPG achieves zero-shot generalization in unseen graphs.

02

The framework matches fine-tuned policies in performance.

03

Experimental results validate robustness across various scenarios.

Abstract

Equilibrium learning in adversarial games is an important topic widely examined in the fields of game theory and reinforcement learning (RL). Pursuit-evasion game (PEG), as an important class of real-world games from the fields of robotics and security, requires exponential time to be accurately solved. When the underlying graph structure varies, even the state-of-the-art RL methods require recomputation or at least fine-tuning, which can be time-consuming and impair real-time applicability. This paper proposes an Equilibrium Policy Generalization (EPG) framework to effectively learn a generalized policy with robust cross-graph zero-shot performance. In the context of PEGs, our framework is generally applicable to both pursuer and evader sides in both no-exit and multi-exit scenarios. These two generalizability properties, to our knowledge, are the first to appear in this domain. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Guidance and Control Systems