COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics
Kartik Sharma, Rakshit S. Trivedi

TL;DR
COLD-Steer is a training-free method that efficiently steers large language models at inference time by approximating the effects of gradient-based fine-tuning, requiring fewer samples and enabling flexible control.
Contribution
It introduces a novel inference-time activation steering framework that approximates gradient updates without retraining, improving sample efficiency and flexibility.
Findings
Achieves up to 95% steering effectiveness
Uses 50 times fewer samples than baselines
Enables diverse perspective accommodation
Abstract
Activation steering methods enable inference-time control of large language model (LLM) behavior without retraining, but current approaches face a fundamental trade-off: sample-efficient methods suboptimally capture steering signals from labeled examples, while methods that better extract these signals require hundreds to thousands of examples. We introduce COLD-Steer, a training-free framework that steers LLM activations by approximating the representational changes that would result from gradient descent on in-context examples. Our key insight is that the effect of fine-tuning on a small set of examples can be efficiently approximated at inference time without actual parameter updates. We formalize this through two complementary approaches: (i) a unit kernel approximation method that updates the activations directly using gradients with respect to them, normalized across examples, and…
Peer Reviews
Decision·ICLR 2026 Poster
- This paper tackles an important and interesting problem. - The work is grounded in prior literature and does a good job telling a coherent and concise story. - The proposed approaches are theoretically grounded. - There are several in-depth experiments that evaluate the proposed approaches in terms of effectiveness, generation quality, behavioral shift quality, and efficiency.
The evidence for the effectiveness of the kernel based approach is lacking. According to Figure 3, COLD-kernel approach doesn’t seem to be very effective on several tasks. Section 4.4 (maintaining pluralistic views) seems to be an afterthought to hide this weakness, but it seems a different task than shifting behavior, which is the main claimed goal of the paper.
- Strong theoretical motivation and a unifying perspective that generalizes existing methods. It would be valuable to further elaborate on the connections to other approaches such as [1,2,3]. - Includes computational complexity analysis, but a more explicit comparison with the complexity of existing methods would strengthen the contribution. - Extensive experimental setup, covering selection and open-generation tasks, distribution shifts, computational efficiency, and qualitative outputs. - Comp
See Questions
- Interesting idea of approximating learning dynamics to perform activation steering. - Training-free and efficient compared to fine-tuning or parameter-tuning approaches. - Works with few examples and across different LLM families. - Strong empirical results on several behavioral control tasks. - Both variants are complementary, with COLD-FD providing more consistent results than COLD-Kernel, though at the expense of computational efficiency.
- Theoretical justification of approximations (unit kernel, finite difference) is limited. - While examples of COLD-steered generations are given and discussed, the paper could benefit from more interpretability analysis of how activations are actually changed.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
