The Birth of Knowledge: Emergent Features across Time, Space, and Scale in Large Language Models

Shashata Sawmya; Micah Adler; Nir Shavit

arXiv:2505.19440·cs.CL·May 27, 2025

The Birth of Knowledge: Emergent Features across Time, Space, and Scale in Large Language Models

Shashata Sawmya, Micah Adler, Nir Shavit

PDF

Open Access

TL;DR

This paper investigates how interpretable semantic features emerge in large language models over time, across layers, and with different sizes, revealing thresholds and reactivation phenomena that challenge existing assumptions.

Contribution

It introduces a comprehensive analysis of feature emergence in LLMs across multiple dimensions using mechanistic interpretability techniques.

Findings

01

Semantic features emerge at specific training stages and scales

02

Early-layer features can reappear in later layers unexpectedly

03

Thresholds for feature emergence vary across domains

Abstract

This paper studies the emergence of interpretable categorical features within large language models (LLMs), analyzing their behavior across training checkpoints (time), transformer layers (space), and varying model sizes (scale). Using sparse autoencoders for mechanistic interpretability, we identify when and where specific semantic concepts emerge within neural activations. Results indicate clear temporal and scale-specific thresholds for feature emergence across multiple domains. Notably, spatial analysis reveals unexpected semantic reactivation, with early-layer features re-emerging at later layers, challenging standard assumptions about representational dynamics in transformer models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques