CapTrack: Multifaceted Evaluation of Forgetting in LLM Post-Training
Lukas Thede, Stefan Winzeck, Zeynep Akata, Jonathan Richard Schwarz

TL;DR
CapTrack introduces a capability-centric framework to analyze and understand the diverse aspects of forgetting in large language models after post-training, revealing that forgetting affects robustness and behavior beyond just factual knowledge.
Contribution
The paper presents CapTrack, a novel framework combining behavioral taxonomy and evaluation suite to analyze forgetting in LLMs across various models and post-training methods.
Findings
Forgetting impacts robustness and default behaviors beyond factual knowledge.
Instruction fine-tuning causes significant drift, while preference optimization is more conservative.
No universal method effectively mitigates forgetting across all models.
Abstract
Large language model (LLM) post-training enhances latent skills, unlocks value alignment, improves performance, and enables domain adaptation. Unfortunately, post-training is known to induce forgetting, especially in the ubiquitous use-case of leveraging third-party pre-trained models, which is typically understood as a loss of parametric or factual knowledge. We argue that this accuracy-centric view is insufficient for modern foundation models and instead define forgetting as systematic model drift that degrades behavior and user experience. In this context, we introduce \textbf{CapTrack}, a capability-centric framework for analyzing forgetting in LLMs that combines a behavioral taxonomy with an evaluation suite built on established benchmarks and targeted adaptations. Using CapTrack, we conduct a large-scale empirical study across post-training algorithms, domains, and model families,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Artificial Intelligence in Healthcare and Education
