Controlling Chat Style in Language Models via Single-Direction Editing
Zhenyu Xu, Victor S. Sheng

TL;DR
This paper demonstrates that stylistic attributes in large language models can be controlled through linear directions in activation space, enabling precise, training-free style editing with minimal computational overhead.
Contribution
It provides empirical evidence that style attributes are linearly encoded and introduces a lightweight, training-free method for style control in LLMs.
Findings
High style adherence achieved across multiple models
Supports linear style composition
Enhances safety by removing undesirable behaviors
Abstract
Controlling stylistic attributes in large language models (LLMs) remains challenging, with existing approaches relying on either prompt engineering or post-training alignment. This paper investigates this challenge through the lens of representation engineering, testing the hypothesis that distinct stylistic attributes - from emotional tone to linguistic structure - are encoded as linear directions in the model's activation space. We provide strong empirical evidence for this hypothesis across a wide range of styles and, based on this finding, present a lightweight, training-free method for precise style control. Our approach supports linear style composition, enhances safety by ablating undesirable behaviors, and, as confirmed by experiments on over a dozen models, achieves high style adherence while preserving core capabilities at minimal computational cost.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Mental Health via Writing · Authorship Attribution and Profiling
