Steering Language Models Before They Speak: Logit-Level Interventions

Hyeseon An; Shinwoo Park; Hyundong Jin; Yo-Sub Han

arXiv:2601.10960·cs.CL·January 19, 2026

Steering Language Models Before They Speak: Logit-Level Interventions

Hyeseon An, Shinwoo Park, Hyundong Jin, Yo-Sub Han

PDF

Open Access

TL;DR

This paper introduces a training-free, inference-time logit intervention method for steering large language models, enabling fine-grained control over output style, formality, and toxicity without internal model modifications.

Contribution

It proposes a novel, statistically grounded logit steering technique that is task-agnostic, effective, and does not require training or internal access to the model.

Findings

01

Achieves up to +47% accuracy in controlling output characteristics.

02

Demonstrates 50x improvement in F1 scores for targeted tasks.

03

Effective across diverse datasets and control objectives.

Abstract

Steering LLMs is essential for specialized applications such as style-sensitive text rewriting, user-adaptive communication, and toxicity mitigation. Current steering methods, such as prompting-based and activation-based approaches, are widely used to guide model behavior. However, activation-based techniques require deep access to internal layers, while prompting-based steering often fails to provide consistent or fine-grained control. In order to address these limitations, we propose a training-free inference-time logit intervention for controllable generation. Our approach utilizes a statistical token score table derived from z-normalized log-odds of labeled corpora to shift the decoding distribution. Empirical evaluations across three diverse datasets focusing on writing complexity, formality, and toxicity demonstrate that our method effectively steers output characteristics,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Hate Speech and Cyberbullying Detection