Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

Yuanyi Wang; Yifan Yang; Su Lu; Yanggan Gu; Pengkai Wang; Wenjun Wang; Zhaoyi Yan; Congkai Xie; Jianmin Wu; Jialun Cao; Shing-Chi Cheung; Hongxia Yang

arXiv:2605.09608·cs.LG·May 12, 2026

Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

Yuanyi Wang, Yifan Yang, Su Lu, Yanggan Gu, Pengkai Wang, Wenjun Wang, Zhaoyi Yan, Congkai Xie, Jianmin Wu, Jialun Cao, Shing-Chi Cheung, Hongxia Yang

PDF

1 Repo

TL;DR

This paper introduces a geometric perspective on LLM continual post-training, identifying geometry conflict as a key factor in forgetting and proposing a new method, GCWM, to improve knowledge retention without replay data.

Contribution

It presents a novel geometry-based framework for understanding and controlling forgetting in LLM continual post-training, along with a data-free update method called GCWM.

Findings

01

GCWM outperforms data-free baselines in continual learning tasks.

02

Forgetting correlates with geometry misalignment between tasks and model state.

03

Compatibility of sequential updates predicts transfer success or interference.

Abstract

Continual post-training aims to extend large language models (LLMs) with new knowledge, skills, and behaviors, yet it remains unclear when sequential updates enable capability transfer and when they cause catastrophic forgetting. Existing methods mitigate forgetting through sequential fine-tuning, replay, regularization, or model merging, but offer limited criteria for determining when incorporating new updates is beneficial or harmful. In this work, we study LLM continual post-training through three questions: What drives forgetting? When do sequentially acquired capabilities transfer or interfere? How can compatibility be used to control update integration? We address these questions through task geometry: we represent each post-training task by its parameter update and study the covariance geometry induced by the update. Our central finding is that: forgetting can be considered as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wyy-code/GCWM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.