Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics

Ziwen Xu; Chenyan Wu; Hengyu Sun; Haiwen Hong; Mengru Wang; Yunzhi Yao; Longtao Huang; Hui Xue; Shumin Deng; Zhixuan Chu; Huajun Chen; Ningyu Zhang

arXiv:2602.02343·cs.CL·April 14, 2026

Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics

Ziwen Xu, Chenyan Wu, Hengyu Sun, Haiwen Hong, Mengru Wang, Yunzhi Yao, Longtao Huang, Hui Xue, Shumin Deng, Zhixuan Chu, Huajun Chen, Ningyu Zhang

PDF

1 Repo

TL;DR

This paper unifies various language model control methods into a single framework, analyzing their effects on preference and utility, and introduces a new approach called SPLIT that balances these aspects.

Contribution

It provides a unified conceptual framework for different control techniques and proposes a novel method, SPLIT, to improve preference while maintaining utility.

Findings

01

Control methods trade off preference and utility predictably.

02

Control shifts representations along target directions, affecting model behavior.

03

SPLIT improves preference with better utility preservation.

Abstract

Methods for controlling large language models (LLMs), including local weight fine-tuning, LoRA-based adaptation, and activation-based interventions, are often studied in isolation, obscuring their connections and making comparison difficult. In this work, we present a unified view that frames these interventions as dynamic weight updates induced by a control signal, placing them within a single conceptual framework. Building on this view, we propose a unified preference-utility analysis that separates control effects into preference, defined as the tendency toward a target concept, and utility, defined as coherent and task-valid generation, and measures both on a shared log-odds scale using polarity-paired contrastive examples. Across methods, we observe a consistent trade-off between preference and utility: stronger control increases preference while predictably reducing utility. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zjunlp/EasyEdit/blob/main/examples/SPLIT.md
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.