Context Steering: Controllable Personalization at Inference Time
Jerry Zhi-Yang He, Sashrika Pandey, Mariah L. Schrum, Anca Dragan

TL;DR
This paper introduces Context Steering, a training-free method to control personalization in large language models by adjusting the influence of context during inference, enabling flexible and effective personalized responses.
Contribution
The paper proposes Context Steering, a novel decoding technique that enhances personalization control in LLMs without additional training or fine-tuning.
Findings
CoS effectively amplifies context influence in LLMs.
CoS improves personalized recommendation performance.
CoS can quantify correlations between texts.
Abstract
To deliver high-quality, personalized responses, large language models (LLMs) must effectively incorporate context -- personal, demographic, and cultural information specific to an end-user. For example, asking the model to explain Newton's second law with the context "I am a toddler" should produce a response different from when the context is "I am a physics professor". However, leveraging the context in practice is a nuanced and challenging task, and is often dependent on the specific situation or user base. The model must strike a balance between providing specific, personalized responses and maintaining general applicability. Current solutions, such as prompt-engineering and fine-tuning, require collection of contextually appropriate responses as examples, making them time-consuming and less flexible to use across different contexts. In this work, we introduce Context Steering…
Peer Reviews
Decision·ICLR 2025 Poster
-The paper is well written and easy to read. -The proposed approach to achieve personalization is simple, novel, and training-free, applicable to various LLMs. -Extensive experiments demonstrate strong performance in personalized recommendations, identification of implicit intents and quantification of extent of “personalization”. -The experimental analysis is comprehensive.
-Focused primarily on a single context. The paper primarily focuses on scenarios with a single, dominant context. However, real-world situations often involve multiple, potentially conflicting contexts. For example, in the movie case, the user might be interested in comedy movies, science fiction but also movies with great storytelling. -Limited Discussion on Computational Complexity: While the authors mention that CoS requires twice the amount of compute compared to a vanilla forward pass, th
1. CoS is a simple method of personalizing LLM outputs to context, without requiring fine-tuning, or prompt tuning. The method saves on the cost and effort needed for training or prompt-tuning, while being effective in the tests carried out by the authors. 2. The framework can be used directly across many personalization contexts. Fine-tuning or prompt-tuning would require re-tuning for each new context. 3. The experiments show promise, and include human evaluations, GPT4 evaluations, and compar
1. Limited contexts: While CoS is effective for single, straightforward contexts (e.g., "I like {genre}"), user preferences are often more complex, involving various (possibly conflicting) likes and dislikes. It would be interesting to see the method's performance under more sophisticated and detailed contexts. 2. The baseline experiments in Figure 4 are unclear to me. How are various values of lambda used in the case of in-context learning and multi-turn QA? Also, could the supposedly worse pe
S1. Controlling the level of personalization by using the difference between LLM outputs with and without personalized context appears reasonable and straightforward, with the entire process completed at inference time. S2. The approach of inferring implicit intents from a given generation result is interesting. S3. A variety of experiments are presented.
W1. The experimental evaluation appears insufficiently convincing. It would be beneficial to include more evaluations with objective metrics. For instance, incorporating experiments conducted on established benchmarks for LLM personalization [1] and recommendation [2] would strengthen the analysis. W2. Some experiments and their results are difficult to follow, such as those related to movie recommendations and hate identification. In the recommendation experiments, it is unclear how the baseli
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersonal Information Management and User Behavior · Usability and User Interface Design
MethodsAttention Model
