Navigating by Old Maps: The Pitfalls of Static Mechanistic Localization in LLM Post-Training

Hang Chen; Jiaying Zhu; Hongyang Chen; Hongxu Liu; Xinyu Yang; Wenya Wang

arXiv:2605.06076·cs.CL·May 8, 2026

Navigating by Old Maps: The Pitfalls of Static Mechanistic Localization in LLM Post-Training

Hang Chen, Jiaying Zhu, Hongyang Chen, Hongxu Liu, Xinyu Yang, Wenya Wang

PDF

TL;DR

This paper examines the limitations of static mechanistic interpretability in LLMs, showing that circuits evolve over time and static mechanisms are insufficient for guiding future updates.

Contribution

It introduces new metrics to analyze circuit evolution and demonstrates the need for predictive, foresight-based approaches in mechanistic interpretability.

Findings

01

Circuits exhibit 'Free Evolution' during parameter updates.

02

Static mechanisms suffer from temporal latency and are inadequate for future guidance.

03

A predictive framework for circuit evolution is proposed.

Abstract

The "Locate-then-Update" paradigm has become a predominant approach in the post-training of large language models (LLMs), identifying critical components via mechanistic interpretability for targeted parameter updates. However, this paradigm rests on a fundamental yet unverified assumption: can mechanisms derived from current static parameters reliably guide future dynamic parameter updates? To investigate this, we systematically track the structural evolution of Transformer circuits throughout the supervised fine-tuning (SFT) process, revealing the underlying dynamics of task mechanisms. We introduce three novel metrics-Circuit Distance, Circuit Stability, and Circuit Conflict-to analyze circuit evolution across three dimensions: neural migration, semantic stability, and cross-task interference. Our empirical results reveal that circuits inherently exhibit "Free Evolution" during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.