Efficient LLM Context Distillation
Rajesh Upadhayaya, Manish Raj Osti, Zachary Smith, Chritopher Kottmyer

TL;DR
This paper evaluates context distillation as an efficient method for adapting large language models, demonstrating its comparable in-domain accuracy and better out-of-domain generalization than in-context learning, with lower data and computational requirements.
Contribution
It provides a comparative analysis showing context distillation's effectiveness and efficiency in model adaptation, especially for small datasets, relative to in-context learning and fine-tuning.
Findings
Context distillation achieves similar in-domain accuracy to ICL.
It outperforms ICL in out-of-domain generalization.
It requires less data and computation than fine-tuning.
Abstract
Large Language Models (LLMs) demonstrate proficiency across diverse tasks but often require targeted adaptations for specific applications. Various methods have been proposed to facilitate this adaptation, including fewshot fine-tuning, in-context learning, and context distillation. This paper specifically investigates context distillation a method that extends the utility of task-specific examples by internalizing them, thus augmenting the example set accessible for model inference. We conduct a comparative analysis of context distillation with in-context learning (ICL) and few-shot fine-tuning (FT), aiming to ascertain the efficacy of context distillation in adapting models using minimal in-context examples. Employing matched datasets from Mobach, our experiments leverage OPT models of various sizes. The results indicate that context distillation effectively adapts models, with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization · Industrial Automation and Control Systems · Distributed and Parallel Computing Systems
MethodsOPT · Sparse Evolutionary Training
