Concept Steerers: Leveraging K-Sparse Autoencoders for Test-Time Controllable Generations

Dahye Kim; Deepti Ghadiyaram

arXiv:2501.19066·cs.CV·October 14, 2025

Concept Steerers: Leveraging K-Sparse Autoencoders for Test-Time Controllable Generations

Dahye Kim, Deepti Ghadiyaram

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel k-sparse autoencoder framework that enables efficient, test-time control over generative models to manipulate or remove unsafe concepts without retraining, improving safety and style control.

Contribution

It presents a new method using k-sparse autoencoders for interpretable, test-time concept manipulation in diffusion models, avoiding retraining and maintaining quality.

Findings

01

20.01% improvement in unsafe concept removal

02

Effective style manipulation capabilities

03

Approximately 5x faster than existing methods

Abstract

Despite the remarkable progress in text-to-image generative models, they are prone to adversarial attacks and inadvertently generate unsafe, unethical content. Existing approaches often rely on fine-tuning models to remove specific concepts, which is computationally expensive, lacks scalability, and/or compromises generation quality. In this work, we propose a novel framework leveraging k-sparse autoencoders (k-SAEs) to enable efficient and interpretable concept manipulation in diffusion models. Specifically, we first identify interpretable monosemantic concepts in the latent space of text embeddings and leverage them to precisely steer the generation away or towards a given concept (e.g., nudity) or to introduce a new concept (e.g., photographic style) -- all during test time. Through extensive experiments, we demonstrate that our approach is very simple, requires no retraining of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kim-dahye/steerers
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCancer-related molecular mechanisms research · Water Systems and Optimization · Machine Learning and Data Classification

MethodsDiffusion · Balanced Selection