Can LLMs Alleviate Catastrophic Forgetting in Graph Continual Learning? A Systematic Study

Ziyang Cheng; Zhixun Li; Yuhan Li; Yixin Song; Kangyi Zhao; Dawei Cheng; Jia Li; Hong Cheng; Jeffrey Xu Yu

arXiv:2505.18697·cs.LG·September 29, 2025

Can LLMs Alleviate Catastrophic Forgetting in Graph Continual Learning? A Systematic Study

Ziyang Cheng, Zhixun Li, Yuhan Li, Yixin Song, Kangyi Zhao, Dawei Cheng, Jia Li, Hong Cheng, Jeffrey Xu Yu

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper investigates whether large language models can reduce catastrophic forgetting in graph continual learning, identifies flaws in current evaluation setups, and proposes a simple method that outperforms previous GNN-based approaches.

Contribution

It introduces a realistic evaluation framework for GCL, demonstrates the effectiveness of LLMs in this context, and proposes SimGCL, a new simple method that surpasses state-of-the-art results.

Findings

01

LLMs can significantly mitigate catastrophic forgetting in GCL.

02

Current evaluation setups may lead to task ID leakage.

03

SimGCL outperforms previous methods by around 20% under rehearsal-free constraints.

Abstract

Nowadays, real-world data, including graph-structure data, often arrives in a streaming manner, which means that learning systems need to continuously acquire new knowledge without forgetting previously learned information. Although substantial existing works attempt to address catastrophic forgetting in graph machine learning, they are all based on training from scratch with streaming data. With the rise of pretrained models, an increasing number of studies have leveraged their strong generalization ability for continual learning. Therefore, in this work, we attempt to answer whether large language models (LLMs) can mitigate catastrophic forgetting in Graph Continual Learning (GCL). We first point out that current experimental setups for GCL have significant flaws, as the evaluation stage may lead to task ID leakage. Then, we evaluate the performance of LLMs in more realistic scenarios…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 3

Strengths

1. Critical Re-evaluation of GCL Benchmarks: The paper makes a valuable methodological contribution by identifying and empirically demonstrating a task ID leakage issue in existing Graph Continual Learning (GCL) benchmarks. This flaw—previously overlooked in the community—renders many reported results unreliable. By introducing a corrected global testing setup, the authors establish a fair and realistic evaluation framework for future GCL research. 2. Bridging GCL and Foundation Models: Concept

Weaknesses

1. Limited Theoretical Justification for SimGCL: While SimGCL shows strong empirical results, the paper provides little theoretical or analytical grounding for why the combination of graph prompts, LoRA fine-tuning, and prototype-based classification alleviates forgetting. The approach appears largely empirical, and the mechanism behind its robustness (e.g., whether prototype stability or prompt alignment is the key factor) is not formally analyzed. Adding theoretical reasoning or ablation-based

Reviewer 02Rating 2Confidence 4

Strengths

The paper explores a novel approach that connects LLMs with the graph continual learning framework. Moreover, it analyzes the commonly used GCL scenario, revealing an interesting limitation and proposing a solution to address this issue.

Weaknesses

The paper presents several issues, mainly related to the clarity of presentation and the design of the experimental framework. Regarding the presentation, the paper does not clearly explain the proposed model or the models used for comparison. As a result, many methodological choices are not properly justified. Going into more detail, in the introduction, the authors claim that there is a lack of investigation into the rationality of the common experimental setup used for GCL. It is worth noting

Reviewer 03Rating 4Confidence 4

Strengths

1. The paper tackles an underexplored question—whether large language models (LLMs) can mitigate catastrophic forgetting in graph continual learning. 2. A benchmark, LLM4GCL, is developed.

Weaknesses

1. My biggest concern lies in the motivation. The primary goal of continual learning is to enable efficient adaptation under limited resources, whereas LLMs are inherently computationally expensive. It is unclear whether the significant computational cost introduced by LLMs can truly justify the efficiency-driven learning protocol that continual learning aims to achieve. 2. Following this point, I believe the comparison should include GCL methods that allocate additional parameters (expansion-b

Code & Models

Repositories

zhixunlee/llm4gcl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning