Activation Transport Operators

Andrzej Szablewski; Marek Masiak

arXiv:2508.17540·cs.LG·November 6, 2025

Activation Transport Operators

Andrzej Szablewski, Marek Masiak

PDF

TL;DR

This paper introduces Activation Transport Operators (ATO), linear maps that analyze how features are linearly transported through residual streams in transformer models, aiding understanding, safety, and debugging of LLMs.

Contribution

The paper proposes ATO, a novel linear operator framework to measure feature transport in residual streams, providing insights into linearity and efficiency in transformer models.

Findings

01

ATO can identify linearly transported features

02

Transport efficiency has an established upper bound

03

Empirical results show significant linear transport in residuals

Abstract

The residual stream mediates communication between transformer decoder layers via linear reads and writes of non-linear computations. While sparse-dictionary learning-based methods locate features in the residual stream, and activation patching methods discover circuits within the model, the mechanism by which features flow through the residual stream remains understudied. Understanding this dynamic can better inform jailbreaking protections, enable early detection of model mistakes, and their correction. In this work, we propose Activation Transport Operators (ATO), linear maps from upstream to downstream residuals $k$ layers later, evaluated in feature space using downstream SAE decoder projections. We empirically demonstrate that these operators can determine whether a feature has been linearly transported from a previous layer or synthesised from non-linear layer computation. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.