Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control

Julian Skifstad; Xinyue Annie Yang; Glen Chou

arXiv:2604.19018·cs.LG·April 22, 2026

Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control

Julian Skifstad, Xinyue Annie Yang, Glen Chou

PDF

1 Repo 1 Models

TL;DR

This paper introduces a novel activation steering method for large language models that models their inference as a locally-linear dynamical system, enabling feedback control for precise, robust alignment without retraining.

Contribution

It demonstrates that transformer layer dynamics are locally linear, allowing the adaptation of linear quadratic regulators for effective, online activation steering in LLMs.

Findings

01

Achieves state-of-the-art modulation of toxicity, truthfulness, and refusal in LLMs.

02

Provides theoretical bounds on setpoint tracking error.

03

Outperforms baseline steering methods across models and tasks.

Abstract

Inference-time LLM alignment methods, particularly activation steering, offer an alternative to fine-tuning by directly modifying activations during generation. Existing methods, however, often rely on non-anticipative interventions that ignore how perturbations propagate through transformer layers and lack online error feedback, resulting in suboptimal, open-loop control. To address this, we show empirically that, despite the nonlinear structure of transformer blocks, layer-wise dynamics across multiple LLM architectures and scales are well-approximated by locally-linear models. Exploiting this property, we model LLM inference as a linear time-varying dynamical system and adapt the classical linear quadratic regulator to compute feedback controllers using layer-wise Jacobians, steering activations toward desired semantic setpoints in closed-loop with minimal computational overhead and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

trustworthyrobotics/lqr-activation-steering
github

Models

🤗
SofiTesfay2010/aria-llm
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.