Layer-wise Representation Dynamics: An Empirical Investigation Across Embedders and Base LLMs

Jingzhou Jiang; Yi Yang; Kar Yan Tam

arXiv:2605.12714·cs.LG·May 14, 2026

Layer-wise Representation Dynamics: An Empirical Investigation Across Embedders and Base LLMs

Jingzhou Jiang, Yi Yang, Kar Yan Tam

PDF

TL;DR

This paper introduces Layer-wise Representation Dynamics (LRD), a framework for analyzing layer-wise changes in language models, revealing architectural and task differences, and guiding model selection and pruning.

Contribution

The paper proposes LRD, a novel framework with three measurement families, to analyze layer-wise dynamics across models and tasks, and demonstrates its utility in model selection and pruning.

Findings

01

LRD reveals differences across models and tasks not seen in final-layer analysis.

02

End-to-end subspace displacement correlates strongly with downstream performance.

03

GFMI measurement-guided pruning outperforms random pruning at certain budgets.

Abstract

Hidden states change substantially across the layers of modern language models, but most layer-wise analyses focus on one aspect of that change. We propose Layer-wise Representation Dynamics (LRD), a framework with three layer-wise measurement families: Frenet (Grassmann speed and curvature) for global subspace motion, Neighborhood Retention Score (NRS) for local nearest-neighbor retention, and Graph Filtration Mutual Information (GFMI) for alignment with the final layer. Applying LRD to 31 models (encoder-based and decoder-based embedders, plus base LLMs) on 30 MTEB tasks reveals architectural and task-level differences that are not apparent from final-layer representations alone. We then use LRD for two applications: label-free model selection and inference-time layer pruning. For selection, all three model-level scores correlate positively with downstream MTEB performance, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.