Spectral Insights into Data-Oblivious Critical Layers in Large Language Models
Xuyuan Liu, Lei Hsiung, Yaoqing Yang, Yujun Yan

TL;DR
This paper introduces a data-oblivious spectral method to identify intrinsic critical layers in large language models, revealing their role in semantic shifts and enabling improved domain adaptation and backdoor defense.
Contribution
It presents a novel spectral analysis approach using CKA to identify critical layers without data dependence, applicable across tasks and enhancing model robustness.
Findings
Critical layers show significant representation shifts during fine-tuning.
Spectral analysis links shifts to changes in top principal components.
Freezing critical layers improves backdoor defense effectiveness.
Abstract
Understanding how feature representations evolve across layers in large language models (LLMs) is key to improving their interpretability and robustness. While recent studies have identified critical layers linked to specific functions or behaviors, these efforts typically rely on data-dependent analyses of fine-tuned models, limiting their use to post-hoc settings. In contrast, we introduce a data-oblivious approach to identify intrinsic critical layers in pre-fine-tuned LLMs by analyzing representation dynamics via Centered Kernel Alignment(CKA). We show that layers with significant shifts in representation space are also those most affected during fine-tuning--a pattern that holds consistently across tasks for a given model. Our spectral analysis further reveals that these shifts are driven by changes in the top principal components, which encode semantic transitions from rationales…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
