MidSteer: Optimal Affine Framework for Steering Generative Models
Tatiana Gaintseva, Andrew Stepanov, Ziquan Liu, Martin Benning, Gregory Slabaugh, Jiankang Deng, Ismail Elezi

TL;DR
MidSteer introduces a new affine framework for concept manipulation in generative models, providing a theoretical foundation and demonstrating its effectiveness across various models and tasks.
Contribution
The paper formalizes the theory of concept steering, introduces MidSteer as a general affine framework, and analyzes its optimality and practical performance.
Findings
MidSteer performs well across vision and language models.
It generalizes previous affine concept erasure methods.
Theoretical analysis characterizes conditions for optimal affine solutions.
Abstract
Steering intermediate representations has emerged as a powerful strategy for controlling generative models, particularly in post-deployment alignment and safety settings. However, despite its empirical success, it currently lacks a comprehensive theoretical framework. In this paper, we bridge this gap by formalizing the theory of concept steering. First, we establish a link between steering and affine concept erasure, proving that the standard approach for removing unwanted behaviors is a special case of LEACE (a closed-form method for affine erasure). Next, we formulate a principled theoretical framework for concept switching, LEACE-Switch, and characterize the assumptions under which it provides an optimal affine solution. Building on this analysis, we then introduce MidSteer (Minimal Disturbance concept Steering), a more general affine framework for concept manipulation that relaxes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
