Riemannian Manifold Learning for Stackelberg Games with Neural Flow Representations
Larkin Liu, Kashif Rasul, Yutong Chao, Jalal Etesami

TL;DR
This paper introduces a novel manifold learning framework using neural flows for online Stackelberg game learning, enabling efficient equilibrium computation with theoretical regret bounds and empirical validation in cybersecurity and supply chains.
Contribution
It proposes a new approach combining Riemannian manifold learning with neural normalizing flows for Stackelberg games, providing theoretical regret guarantees and demonstrating practical effectiveness.
Findings
Effective regret minimization on the learned manifold.
Superior performance over standard baselines.
Applicability to cybersecurity and supply chain domains.
Abstract
We present a novel framework for online learning in Stackelberg general-sum games, where two agents, the leader and follower, engage in sequential turn-based interactions. At the core of this approach is a learned diffeomorphism that maps the joint action space to a smooth spherical Riemannian manifold, referred to as the Stackelberg manifold. This mapping, facilitated by neural normalizing flows, ensures the formation of tractable isoplanar subspaces, enabling efficient techniques for online learning. Leveraging the linearity of the agents' reward functions on the Stackelberg manifold, our construct allows the application of linear bandit algorithms. We then provide a rigorous theoretical basis for regret minimization on the learned manifold and establish bounds on the simple regret for learning Stackelberg equilibrium. This integration of manifold learning into game theory uncovers a…
Peer Reviews
Decision·Submitted to ICLR 2025
The proposed framework is highly novel to me. It effectively bridges equilibrium learning, manifold learning, and online learning paradigms by establishing theoretical and practical connections. The application of normalizing neural flows shows particular promise for converting equilibrium learning problems into online manifold learning, and optimization problems, opening valuable new research directions.
The paper would benefit from a clearer organizational structure, particularly in Section 4. The progression of ideas through Lemmas 4.1-4.4 needs stronger motivation and explicit statements of their individual contributions to the overall narrative. Additionally, the technical foundation could be strengthened by providing clearer connections between consecutive lemmas and their role in building the theoretical framework and by including fundamental definitions from Riemannian geometry, especiall
The paper tries to provide a framework to map the Stackelberg games to a bandit problem by using neural normalizing flows and by assuming the reward function is linear on the mapped manifold, allowing the use of the bandit results to study the game. The paper could provide regret analysis and empirical results. However, the reviewer is very confused by the motivation of the approach. More details of weakness is shown below.
The paper is not very well structured and not very motivated. The contributions are not clear. The problem is statement is not clearly stated. It seems like the paper considers a 2-player Stackelberg game and considers bounding a form of regret as a measure of performance. However, Section 2.2 describes a procedure to obtain an embedding on the D-sphere where the reward functions are linear as in (2.16). First of all, it is unclear why people should do this embedding (motivation is not clear). S
The idea of learning an embedding of a Stackelberg game such that the game can be reasoned about and solved in that space more easily is novel and interesting. To accomplish this, the authors leverage normalizing flows; in general, this work requires aggregating ideas and techniques from several typically disparate research areas which makes it original. While the paper focuses on the spherical manifold, the proposed loss function in equation 2.14 provides a way of learning the embedding, not ju
As the paper is currently presented, there was little discussion or motivation early on for why a spherical embedding (or any embedding) would be helpful for constructing an improved no-regret algorithm. The learned mapping is invertible so there's no dimensionality reduction, which is a typical motivation in other work making similar assumptions. Section 2.3 is introduced eventually, but it would seem the assumption of reward functions that are linear in the embedding space is critical for this
Videos
Taxonomy
TopicsStatistical Mechanics and Entropy
MethodsNormalizing Flows
