Gradient Atoms: Unsupervised Discovery, Attribution and Steering of Model Behaviors via Sparse Decomposition of Training Gradients

J Rosser

arXiv:2603.14665·cs.AI·March 18, 2026

Gradient Atoms: Unsupervised Discovery, Attribution and Steering of Model Behaviors via Sparse Decomposition of Training Gradients

J Rosser

PDF

Open Access

TL;DR

Gradient Atoms is an unsupervised technique that decomposes training gradients into sparse components, revealing shared behaviors and enabling controllable model steering without predefined queries.

Contribution

It introduces an unsupervised gradient decomposition method that discovers interpretable model behaviors and provides a way to steer models by applying identified gradient atoms.

Findings

01

Discovered 500 gradient atoms capturing diverse behaviors

02

Atoms can be used to steer model outputs significantly

03

Method scales independently of the number of behaviors

Abstract

Training data attribution (TDA) methods ask which training documents are responsible for a model behavior. However, models often learn broad concepts shared across many examples. Moreover, existing TDA methods are supervised -- they require a predefined query behavior, then score every training document against it -- making them both expensive and unable to surface behaviors the user did not think to ask about. We present Gradient Atoms, an unsupervised method that decomposes per-document training gradients into sparse components ("atoms") via dictionary learning in a preconditioned eigenspace. Each atom captures a shared update direction induced by a cluster of functionally similar documents, directly recovering the collective structure that per-document methods do not address. Among 500 discovered atoms, the highest-coherence ones recover interpretable task-type behaviors -- refusal,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning