Instance-Prototype Affinity Learning for Non-Exemplar Continual Graph Learning
Lei Song, Jiaxing Li, Shihan Guan, Youyong Kong

TL;DR
This paper introduces a novel method called IPAL for non-exemplar continual graph learning, leveraging prototype affinity learning and graph structure to mitigate catastrophic forgetting while preserving knowledge.
Contribution
It proposes a new paradigm using topology-integrated prototypes and affinity distillation to improve continual learning in GNNs without storing past data.
Findings
Outperforms state-of-the-art methods on four benchmark datasets.
Achieves a better balance between plasticity and stability.
Utilizes graph structure to enhance knowledge retention.
Abstract
Graph Neural Networks (GNN) endure catastrophic forgetting, undermining their capacity to preserve previously acquired knowledge amid the assimilation of novel information. Rehearsal-based techniques revisit historical examples, adopted as a principal strategy to alleviate this phenomenon. However, memory explosion and privacy infringements impose significant constraints on their utility. Non-Exemplar methods circumvent the prior issues through Prototype Replay (PR), yet feature drift presents new challenges. In this paper, our empirical findings reveal that Prototype Contrastive Learning (PCL) exhibits less pronounced drift than conventional PR. Drawing upon PCL, we propose Instance-Prototype Affinity Learning (IPAL), a novel paradigm for Non-Exemplar Continual Graph Learning (NECGL). Exploiting graph structural information, we formulate Topology-Integrated Gaussian Prototypes (TIGP),…
Peer Reviews
Decision·Submitted to ICLR 2026
1. Sound theoretical footing for using PCL in NECGL: The authors formalize “feature drift” and prove (via a KL-divergence analysis under Gaussian assumptions) that Prototype Contrastive Learning (PCL) incurs strictly less drift than conventional Prototype Replay (PR). This gives a clear, principled reason to prefer PCL in non-exemplar continual graph learning, not just an empirical hunch. 2. Topology-aware prototypes that actually use the graph: The Topology-Integrated Gaussian Prototypes (TIGP
1. Theory relies on restrictive assumptions and local approximations: The “PCL drifts less than PR” claim is proved under multivariate Gaussian feature distributions with positive-definite covariances, and the derivation uses a first-order Taylor approximation of the encoder’s update (i.e., infinitesimal step). That makes the result sensitive to non-Gaussian embeddings and larger optimization steps typical in practice. Clarifying the conditions (e.g., step size, τ, optimizer) under which the str
1. The use of PageRank-based topology integration (TIGP) is a meaningful attempt to incorporate graph structural importance into prototype construction, improving representation of high-impact nodes. 2. The proposed Instance-Prototype Affinity Distillation (IPAD) provides a more flexible alternative to traditional feature distillation, maintaining inter- and intra-class relations without over-constraining the feature space. 3.The Decision Boundary Perception (DBP) mechanism is a thoughtful add
1. It is not entirely clear why PageRank-weighted nodes would yield better class prototypes, as the paper provides limited explanation or analysis for this design choice. High-centrality nodes may not adequately represent peripheral or low-degree nodes, potentially biasing the prototypes toward graph centers. 2. The framework introduces several additional components but does not report training time or memory usage. It remains unclear whether these modules introduce notable computational or mem
1. The paper addresses a meaningful challenge in non-exemplar continual graph learning where privacy and memory constraints prevent rehearsal. 2. The three modules (TIGP, IPAD, DBP) are well-motivated, mutually complementary, and grounded in clear intuition. 3. Extensive experiments across four benchmarks with ablation and sensitivity analyses support the effectiveness of each component.
1. The framework is an evolutionary extension of PCL and prototype replay ideas rather than a fundamentally new paradigm. 2. The paper lacks concrete analysis of computational cost, parameter overhead, or runtime efficiency.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Machine Learning and Data Classification · Face and Expression Recognition
MethodsContrastive Learning
