COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics

Kartik Sharma; Rakshit S. Trivedi

arXiv:2603.06495·cs.LG·March 9, 2026

COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics

Kartik Sharma, Rakshit S. Trivedi

PDF

Open Access 3 Reviews

TL;DR

COLD-Steer is a training-free method that efficiently steers large language models at inference time by approximating the effects of gradient-based fine-tuning, requiring fewer samples and enabling flexible control.

Contribution

It introduces a novel inference-time activation steering framework that approximates gradient updates without retraining, improving sample efficiency and flexibility.

Findings

01

Achieves up to 95% steering effectiveness

02

Uses 50 times fewer samples than baselines

03

Enables diverse perspective accommodation

Abstract

Activation steering methods enable inference-time control of large language model (LLM) behavior without retraining, but current approaches face a fundamental trade-off: sample-efficient methods suboptimally capture steering signals from labeled examples, while methods that better extract these signals require hundreds to thousands of examples. We introduce COLD-Steer, a training-free framework that steers LLM activations by approximating the representational changes that would result from gradient descent on in-context examples. Our key insight is that the effect of fine-tuning on a small set of examples can be efficiently approximated at inference time without actual parameter updates. We formalize this through two complementary approaches: (i) a unit kernel approximation method that updates the activations directly using gradients with respect to them, normalized across examples, and…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 3

Strengths

- This paper tackles an important and interesting problem. - The work is grounded in prior literature and does a good job telling a coherent and concise story. - The proposed approaches are theoretically grounded. - There are several in-depth experiments that evaluate the proposed approaches in terms of effectiveness, generation quality, behavioral shift quality, and efficiency.

Weaknesses

The evidence for the effectiveness of the kernel based approach is lacking. According to Figure 3, COLD-kernel approach doesn’t seem to be very effective on several tasks. Section 4.4 (maintaining pluralistic views) seems to be an afterthought to hide this weakness, but it seems a different task than shifting behavior, which is the main claimed goal of the paper.

Reviewer 02Rating 4Confidence 3

Strengths

- Strong theoretical motivation and a unifying perspective that generalizes existing methods. It would be valuable to further elaborate on the connections to other approaches such as [1,2,3]. - Includes computational complexity analysis, but a more explicit comparison with the complexity of existing methods would strengthen the contribution. - Extensive experimental setup, covering selection and open-generation tasks, distribution shifts, computational efficiency, and qualitative outputs. - Comp

Weaknesses

See Questions

Reviewer 03Rating 6Confidence 2

Strengths

- Interesting idea of approximating learning dynamics to perform activation steering. - Training-free and efficient compared to fine-tuning or parameter-tuning approaches. - Works with few examples and across different LLM families. - Strong empirical results on several behavioral control tasks. - Both variants are complementary, with COLD-FD providing more consistent results than COLD-Kernel, though at the expense of computational efficiency.

Weaknesses

- Theoretical justification of approximations (unit kernel, finite difference) is limited. - While examples of COLD-steered generations are given and discussed, the paper could benefit from more interpretability analysis of how activations are actually changed.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning