BANGS: Game-Theoretic Node Selection for Graph Self-Training
Fangxin Wang, Kay Liu, Sourav Medya, Philip S. Yu

TL;DR
BANGS introduces a game-theoretic framework for node selection in graph self-training, optimizing collective node sets based on mutual information to enhance robustness and performance of GNNs.
Contribution
It unifies node labeling with mutual information maximization using game theory, enabling collective node selection with theoretical robustness guarantees.
Findings
Outperforms existing methods across multiple datasets.
Demonstrates robustness under noisy conditions.
Provides theoretical guarantees for node selection robustness.
Abstract
Graph self-training is a semi-supervised learning method that iteratively selects a set of unlabeled data to retrain the underlying graph neural network (GNN) model and improve its prediction performance. While selecting highly confident nodes has proven effective for self-training, this pseudo-labeling strategy ignores the combinatorial dependencies between nodes and suffers from a local view of the distribution. To overcome these issues, we propose BANGS, a novel framework that unifies the labeling strategy with conditional mutual information as the objective of node selection. Our approach -- grounded in game theory -- selects nodes in a combinatorial fashion and provides theoretical guarantees for robustness under noisy objective. More specifically, unlike traditional methods that rank and select nodes independently, BANGS considers nodes as a collective set in the self-training…
Peer Reviews
Decision·ICLR 2025 Poster
This paper introduces a new direction in graph self-training by integrating conditional mutual information into the pseudo-labeling process. The proposed method may have the potential to improve the effectiveness of semi-supervised learning on graphs, particularly in scenarios where node dependencies play a significant role. The empirical studies are solid.
1. One main concern is the rationale behind forming a node set for graph self-training from a submodular optimization perspective. The paper argues that pseudo-labels should be evaluated and fed into the model as a set, contrasting with most existing self-training strategies that evaluate each pseudo-label individually. The justification of the traditional strategy is that adding pseudo-labels to the training set satisfies submodularity, allowing for the use of a greedy strategy to achieve an op
The structure of this article is clear and easy to understand. The experiments are detailed, and the analysis of the results is also quite clear. Moreover, the code has been made publicly available.
1. The motivation of the article is unclear and not strong enough. The core work of self-training is to select suitable nodes and assign pseudo-labels. This article focuses on the combinatorial dependencies in node selection; however, the impact of such information on the model's performance is not discussed in depth and lacks supporting experiments. 2. Furthermore, I do not see self-training as an interesting research direction; it seems more like a variant of data augmentation to me. Assignin
The writing is clear. The discussion is comprehensive.
The novelty of the studied problem is limited - the authors studied the node selection problem using a novel formulation with mutual information. The paper relies on a lot on (Wang & Jia, 2023) technically. It would be good to demonstrate the authors' unique contribution in the context of (Wang & Jia, 2023) - is it a straightforward extension of the referenced work?
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Recommender Systems and Techniques · Data Stream Mining Techniques
MethodsGraph Neural Network · Balanced Selection · Sparse Evolutionary Training
