Characterizing stable regions in the residual stream of LLMs

Jett Janiak; Jacek Karwowski; Chatrik Singh Mangat; Giorgi Giglemiani,; Nora Petrova; Stefan Heimersheim

arXiv:2409.17113·cs.LG·November 19, 2024

Characterizing stable regions in the residual stream of LLMs

Jett Janiak, Jacek Karwowski, Chatrik Singh Mangat, Giorgi Giglemiani,, Nora Petrova, Stefan Heimersheim

PDF

Open Access

TL;DR

This paper identifies stable regions in the residual stream of Transformers, which are linked to semantic distinctions and emerge during training, offering insights into neural network interpretability and dynamics.

Contribution

It introduces the concept of stable regions in the residual stream, revealing their emergence, size, and semantic alignment, advancing understanding of model behavior and interpretability.

Findings

01

Stable regions are larger than previously studied polytopes.

02

Regions align with semantic distinctions and prompt clustering.

03

Activation within a region leads to similar predictions.

Abstract

We identify stable regions in the residual stream of Transformers, where the model's output remains insensitive to small activation changes, but exhibits high sensitivity at region boundaries. These regions emerge during training and become more defined as training progresses or model size increases. The regions appear to be much larger than previously studied polytopes. Our analysis suggests that these stable regions align with semantic distinctions, where similar prompts cluster within regions, and activations from the same region lead to similar next token predictions. This work provides a promising research direction for understanding the complexity of neural networks, shedding light on training dynamics, and advancing interpretability.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging and Pathology Studies

MethodsALIGN