TRACE: Training and Inference-Time Interpretability Analysis for Language Models

Nura Aljaafari; Danilo S. Carvalho; Andr\'e Freitas

arXiv:2507.03668·cs.CL·July 8, 2025

TRACE: Training and Inference-Time Interpretability Analysis for Language Models

Nura Aljaafari, Danilo S. Carvalho, Andr\'e Freitas

PDF

1 Video

TL;DR

TRACE is a modular toolkit that enables real-time interpretability analysis of transformer language models, revealing developmental linguistic phenomena and structural insights during training.

Contribution

It introduces a lightweight, in-training interpretability toolkit that integrates with synthetic data generators for comprehensive analysis of language model development.

Findings

01

Reveals early syntactic emergence in training

02

Detects delayed semantic acquisition

03

Identifies representational compression phenomena

Abstract

Understanding when and how linguistic knowledge emerges during language model training remains a central challenge for interpretability. Most existing tools are post hoc, rely on scalar metrics, or require nontrivial integration effort, making comprehensive interpretability analysis difficult to deploy and maintain. We introduce TRACE, a modular toolkit for training and inference-time interpretability analysis of transformer models. It enables lightweight, in-training analysis of linguistic and representational signals, including features probing, intrinsic dimensionality, Hessian curvature, and output diagnostics. It integrates with ABSynth, a controllable synthetic corpus generator that provides structured annotations for precise evaluation of linguistic feature acquisition. Experiments with autoregressive transformers demonstrate that TRACE reveals developmental phenomena such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

TRACE: Training and Inference-Time Interpretability Analysis for Language Models· underline