Features have life history. And we should care
Philipp Stecher, Sandro Radovanovi\'c, Vlasta Sikimi\'c, Reinhard Kahle

TL;DR
This paper investigates the life history of features in language models, revealing a stable, load-bearing core that emerges early and guides subsequent development, with implications for understanding model training dynamics.
Contribution
It identifies a persistent representational backbone in language models, characterizes its properties, and shows its early emergence and influence on training.
Findings
A stable scaffold of ~50 features exists in language models.
The scaffold emerges rapidly within the first 1% of training.
Scaffold features are highly load-bearing and predictive of future carriers.
Abstract
Features in language models have life history: they emerge, persist, and die during training, yet the importance of that history remains largely unexplored. We find evidence of a persistent representational backbone, which we identify in Pythia-160M and -410M as the carrier scaffold: sparse features with stable life histories, around which the model's representational structure organises. It has four properties. \emph{(i)}~\emph{It assembles early:} features emerge, die, and reorganise faster in the first of training than afterwards, and the scaffold is already largely fixed by then. \emph{(ii)}~\emph{It is load-bearing:} joint cross-layer ablation identifies the carriers as far more load-bearing than any count-matched non-scaffold population, a gap invisible to per-firing single-feature methods. \emph{(iii)}~\emph{Function precedes direction:} which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
