A Review of Developmental Interpretability in Large Language Models

Ihor Kendiukhov

arXiv:2508.15841·cs.CL·August 25, 2025

A Review of Developmental Interpretability in Large Language Models

Ihor Kendiukhov

PDF

TL;DR

This review explores the emerging field of developmental interpretability in large language models, focusing on understanding their training dynamics, capabilities development, and implications for AI safety.

Contribution

It provides a comprehensive overview of methodologies, key developmental insights, and future challenges in analyzing how LLMs learn and evolve.

Findings

01

Identification of circuit formation during training

02

Discovery of biphasic knowledge acquisition

03

Insights into emergent abilities as phase transitions

Abstract

This review synthesizes the nascent but critical field of developmental interpretability for Large Language Models. We chart the field's evolution from static, post-hoc analysis of trained models to a dynamic investigation of the training process itself. We begin by surveying the foundational methodologies, including representational probing, causal tracing, and circuit analysis, that enable researchers to deconstruct the learning process. The core of this review examines the developmental arc of LLM capabilities, detailing key findings on the formation and composition of computational circuits, the biphasic nature of knowledge acquisition, the transient dynamics of learning strategies like in-context learning, and the phenomenon of emergent abilities as phase transitions in training. We explore illuminating parallels with human cognitive and linguistic development, which provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.