CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering
Hang Lv, Sheng Liang, Hao Wang, Hongchao Gu, Yaxiong Wu, Wei Guo, Defu Lian, Yong Liu, Enhong Chen

TL;DR
CoSteer introduces a decoding-time personalization framework that enables real-time, resource-efficient, and privacy-preserving customization of large language models by leveraging local logit differences to steer cloud-based models.
Contribution
It presents a novel, tuning-free approach for real-time personalization that operates during decoding, using local logit differences to steer large models without fine-tuning.
Findings
Achieves high-quality personalized content generation.
Maintains system efficiency and user privacy.
Demonstrates robustness across various models and tasks.
Abstract
Personalization has become crucial for adapting models to the diverse and evolving needs of users across cultural, temporal, and contextual dimensions. While existing methods often rely on centralized fine-tuning or static preference alignment within a single model, they struggle to achieve both real-time and high-quality personalization under the resource and privacy constraints of personal devices. To address this challenge, we propose CoSteer, a collaborative framework that enables tuning-free, real-time personalization via decoding-time adaptation. By leveraging logit differences between context-aware and context-agnostic local small models, CoSteer steers cloud-based large models, ensuring effective personalization while preserving the large model's capabilities. Personalization is handled locally, with only final tokens sent to the cloud, maintaining both user context and system…
Peer Reviews
Decision·Submitted to ICLR 2026
+ The problem of enabling cloud LLM to be aware of local user data without direct data access is well-motivated. + CoSteer introduces a unique collaborative decoding-time personalization framework that enables real-time adaptation using local delta steering without requiring fine-tuning or directly exposing sensitive user data. + Extensive experiments across multiple datasets and model scales demonstrate that CoSteer improves personalized text generation while maintaining privacy and efficienc
- The core idea builds on existing context steering methods (He et al., 2025), mainly extending them to cloud-edge collaboration, offering limited theoretical or algorithmic innovation. - Experimental results in Table 2 show only slight improvements over strong personalized baselines. - Most evaluated datasets are mobile-centric, while CoSteer’s target use case emphasizes cloud LLM serving with personalized requirement. - The paper lacks a formal theoretical assessment of whether sharing SLM log
- The approach seems like a simple way to leverage the strengths of both personalized SLMs and general LLMs. - The problem being solved is interesting and relevant for generating good-quality personalized outputs while keeping private information local. - The paper provides thorough experimental results in a variety of settings, including different SLM-LLM combinations and hyperparameter ablations.
While the experimental section contains many experiments, the paper should further distinguish the approach from other methods. 1. The paper cites Table 3 to explain why their approach is unique, but I am not sure why exactly the constraints from the table are required. In particular, the main reason why Linear Alignment/Context Steering differ from CoSteer is that these models are not weak-to-strong collaborative. However, LA/CS seem to have fairly comparable performance to CoSteer without weak
- The paper identifies and tackles an interesting, intuitive, and increasingly relevant problem: how to balance the need for high-quality personalization from powerful cloud models with the critical and non-negotiable requirement of user privacy. - The proposed CoSteer framework is well-motivated and its core mechanism is straightforward to understand. The idea of using a local model to compute a "delta" to steer a remote model is an elegant solution to this problem. - The experimental evaluat
- My primary concern is the paper's limited technical novelty. The contribution is almost entirely in the problem setup and framework design. - The core optimization algorithm, which is central to the method's implementation, appears to be adopted directly from a previous work (Zhang et al., 2025b), specifically the use of FTRL for decoding-time alignment. - This lack of technical innovation places a very heavy burden on the novelty of the scenario itself. If this collaborative, delta-steering
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimedia Communication and Technology · Parallel Computing and Optimization Techniques · Advanced Data Storage Technologies
