Features have life history. And we should care

Philipp Stecher; Sandro Radovanovi\'c; Vlasta Sikimi\'c; Reinhard Kahle

arXiv:2605.18789·q-bio.NC·May 20, 2026

Features have life history. And we should care

Philipp Stecher, Sandro Radovanovi\'c, Vlasta Sikimi\'c, Reinhard Kahle

PDF

TL;DR

This paper investigates the life history of features in language models, revealing a stable, load-bearing core that emerges early and guides subsequent development, with implications for understanding model training dynamics.

Contribution

It identifies a persistent representational backbone in language models, characterizes its properties, and shows its early emergence and influence on training.

Findings

01

A stable scaffold of ~50 features exists in language models.

02

The scaffold emerges rapidly within the first 1% of training.

03

Scaffold features are highly load-bearing and predictive of future carriers.

Abstract

Features in language models have life history: they emerge, persist, and die during training, yet the importance of that history remains largely unexplored. We find evidence of a persistent representational backbone, which we identify in Pythia-160M and -410M as the carrier scaffold: $\sim 50$ sparse features with stable life histories, around which the model's representational structure organises. It has four properties. \emph{(i)}~\emph{It assembles early:} features emerge, die, and reorganise $\sim 40 \times$ faster in the first $1%$ of training than afterwards, and the scaffold is already largely fixed by then. \emph{(ii)}~\emph{It is load-bearing:} joint cross-layer ablation identifies the carriers as far more load-bearing than any count-matched non-scaffold population, a gap invisible to per-firing single-feature methods. \emph{(iii)}~\emph{Function precedes direction:} which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.