SteerX: Disentangled Steering for LLM Personalization
Xiaoyan Zhao, Ming Yan, Yilun Qiu, Haoting Ni, Yang Zhang, Fuli Feng, Hong Cheng, Tat-Seng Chua

TL;DR
SteerX introduces a disentangled activation steering method for LLM personalization, isolating preference-driven signals to improve the accuracy and effectiveness of user-specific model tuning.
Contribution
The paper proposes SteerX, a novel causal inference-based disentangled steering approach that isolates true user preferences from irrelevant data for better LLM personalization.
Findings
SteerX improves steering vector quality across multiple methods.
Enhanced personalization results in more accurate LLM responses.
Experiments show consistent gains on real-world datasets.
Abstract
Large language models (LLMs) have shown remarkable success in recent years, enabling a wide range of applications, including intelligent assistants that support users' daily life and work. A critical factor in building such assistants is personalizing LLMs, as user preferences and needs vary widely. Activation steering, which directly leverages directions representing user preference in the LLM activation space to adjust its behavior, offers a cost-effective way to align the model's outputs with individual users. However, existing methods rely on all historical data to compute the steering vector, ignoring that not all content reflects true user preferences, which undermines the personalization signal. To address this, we propose SteerX, a disentangled steering method that isolates preference-driven components from preference-agnostic components. Grounded in causal inference theory,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
