Context Tuning for In-Context Optimization
Jack Lu, Ryan Teehan, Zhenbang Yang, Mengye Ren

TL;DR
Context Tuning enhances few-shot learning in language models by initializing prompts with task-specific examples, leveraging In-Context Learning, and outperforming traditional prompt methods without fine-tuning model parameters.
Contribution
The paper introduces Context Tuning, a novel prompt initialization method that improves few-shot adaptation by using task-specific demonstrations, without requiring model fine-tuning.
Findings
Outperforms traditional prompt-based adaptation methods.
Achieves competitive accuracy with Test-Time Training.
Demonstrates effectiveness across multiple benchmarks.
Abstract
We introduce Context Tuning, a simple and effective method to significantly enhance few-shot adaptation of language models (LLMs) without fine-tuning model parameters. While prompt-based adaptation techniques have demonstrated the effectiveness of lightweight adaptation methods for LLMs, they typically initialize a trainable prompt or prefix with irrelevant tokens for the task at hand. In contrast, Context Tuning initializes the trainable prompt or prefix with task-specific demonstration examples, leveraging the model's inherent In-Context Learning (ICL) ability to extract relevant information for improved few-shot learning performance. Extensive evaluations on benchmarks such as CrossFit, UnifiedQA, MMLU, BIG-Bench Hard, and ARC demonstrate that Context Tuning outperforms traditional prompt-based adaptation methods and achieves competitive accuracy to Test-Time Training with…
Peer Reviews
Decision·Submitted to ICLR 2026
- Parameter efficiency and stability. The method keeps model weights frozen and only learns a small set of prompt or prefix parameters, which preserves the base model while enabling task adaptation. - Consistent gains in few shot settings. Experiments show improvements over vanilla in context learning and over standard prompt or prefix tuning on several benchmarks. The gains are robust across a range of demonstration counts and across different base model sizes. But, much of the empirical benef
- Limited novelty in mechanism. The main algorithmic move is to initialize the trainable prompt or prefix from demonstration embeddings, then optimize as in standard prompt or prefix tuning. This is a strong and practical idea, but conceptually close to prior methods and may be seen as an improved initialization strategy rather than a new optimization principle. - Positioning relative to test time training. The paper reports results for TTT plus CT KV and presents this combined setting among th
1. **Clear and well-structured presentation.** The paper is well-written and easy to follow. It clearly explains the intuition behind context optimization, the difference between CT-Prompt and CT-KV, and the rationale for efficiency gains. Figures and ablations effectively illustrate the role of leave-one-out masking and token dropout. 2. **Consistent and measurable improvement.** The proposed CT-KV achieves clear and repeatable performance gains over standard ICL, prompt/prefix tuning, and even
1. **Limited novelty.** The method conceptually extends test-time training by performing parameter-efficient adaptation with in-context examples, but mainly replaces LoRA with other PEFT methods such as prompt-tuning or prefix-tuning. As such, the contribution feels incremental rather than conceptually new. Moreover, there is prior work on few-shot prompt/prefix tuning since 2022 (e.g, studies exploring better initialization and adaptation strategies [1][2][3]), which are not discussed, weakenin
- The draft is well-written and easy to understand, with clear notations and explanations. - It is well-aligned with related work in the literature, presenting them within an integrated framework. - The experiments cover a reasonable range of possibilities, addressing different tasks, models, and configurations.
- I appreciate the simplicity of the proposed idea, but it seems somewhat too incremental to merit a full-paper submission. This concern becomes particularly evident when compared with the previous work, TTT, where the only difference—at least as I understand it—lies in whether the model tunes LoRA adapters (possibly randomly initialized) or continuous prompts initialized with demonstration embeddings. Although ICLR does not offer a short-paper track, the contribution and scope of this work appe
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealthcare Technology and Patient Monitoring · Context-Aware Activity Recognition Systems · Building Energy and Comfort Optimization
