Layer-wise Representation Dynamics: An Empirical Investigation Across Embedders and Base LLMs
Jingzhou Jiang, Yi Yang, Kar Yan Tam

TL;DR
This paper introduces Layer-wise Representation Dynamics (LRD), a framework for analyzing layer-wise changes in language models, revealing architectural and task differences, and guiding model selection and pruning.
Contribution
The paper proposes LRD, a novel framework with three measurement families, to analyze layer-wise dynamics across models and tasks, and demonstrates its utility in model selection and pruning.
Findings
LRD reveals differences across models and tasks not seen in final-layer analysis.
End-to-end subspace displacement correlates strongly with downstream performance.
GFMI measurement-guided pruning outperforms random pruning at certain budgets.
Abstract
Hidden states change substantially across the layers of modern language models, but most layer-wise analyses focus on one aspect of that change. We propose Layer-wise Representation Dynamics (LRD), a framework with three layer-wise measurement families: Frenet (Grassmann speed and curvature) for global subspace motion, Neighborhood Retention Score (NRS) for local nearest-neighbor retention, and Graph Filtration Mutual Information (GFMI) for alignment with the final layer. Applying LRD to 31 models (encoder-based and decoder-based embedders, plus base LLMs) on 30 MTEB tasks reveals architectural and task-level differences that are not apparent from final-layer representations alone. We then use LRD for two applications: label-free model selection and inference-time layer pruning. For selection, all three model-level scores correlate positively with downstream MTEB performance, with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
