Updating Parametric Knowledge with Context Distillation Retains Post-Training Capabilities

Shankar Padmanabhan; Mustafa Omer Gul; Tanya Goyal

arXiv:2602.16093·cs.CL·February 19, 2026

Updating Parametric Knowledge with Context Distillation Retains Post-Training Capabilities

Shankar Padmanabhan, Mustafa Omer Gul, Tanya Goyal

PDF

Open Access

TL;DR

This paper introduces DiSC, a context-distillation method that enables continual knowledge adaptation in large language models, effectively learning new information while preserving existing skills without explicit generation during training.

Contribution

DiSC is a novel context-distillation approach that improves continual knowledge adaptation in LLMs by balancing learning new knowledge and retaining prior capabilities.

Findings

01

DiSC outperforms prior methods in balancing learning and retention.

02

It effectively adapts models across multiple domains.

03

No explicit generation steps are required during training.

Abstract

Post-training endows pretrained LLMs with a variety of desirable skills, including instruction-following, reasoning, and others. However, these post-trained LLMs only encode knowledge up to a cut-off date, necessitating continual adaptation. Unfortunately, existing solutions cannot simultaneously learn new knowledge from an adaptation document corpora and mitigate the forgetting of earlier learned capabilities. To address this, we introduce Distillation via Split Contexts (DiSC), a simple context-distillation based approach for continual knowledge adaptation. \methodname~derives student and teacher distributions by conditioning on distinct segments of the training example and minimizes the KL divergence between the shared tokens. This allows us to efficiently apply context-distillation without requiring explicit generation steps during training. We run experiments on four post-trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Text Readability and Simplification