Updating Parametric Knowledge with Context Distillation Retains Post-Training Capabilities
Shankar Padmanabhan, Mustafa Omer Gul, Tanya Goyal

TL;DR
This paper introduces DiSC, a context-distillation method that enables continual knowledge adaptation in large language models, effectively learning new information while preserving existing skills without explicit generation during training.
Contribution
DiSC is a novel context-distillation approach that improves continual knowledge adaptation in LLMs by balancing learning new knowledge and retaining prior capabilities.
Findings
DiSC outperforms prior methods in balancing learning and retention.
It effectively adapts models across multiple domains.
No explicit generation steps are required during training.
Abstract
Post-training endows pretrained LLMs with a variety of desirable skills, including instruction-following, reasoning, and others. However, these post-trained LLMs only encode knowledge up to a cut-off date, necessitating continual adaptation. Unfortunately, existing solutions cannot simultaneously learn new knowledge from an adaptation document corpora and mitigate the forgetting of earlier learned capabilities. To address this, we introduce Distillation via Split Contexts (DiSC), a simple context-distillation based approach for continual knowledge adaptation. \methodname~derives student and teacher distributions by conditioning on distinct segments of the training example and minimizes the KL divergence between the shared tokens. This allows us to efficiently apply context-distillation without requiring explicit generation steps during training. We run experiments on four post-trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Text Readability and Simplification
