Multi-Scale Manifold Alignment for Interpreting Large Language Models: A Unified Information-Geometric Framework
Yukun Zhang, Qi Dong

TL;DR
This paper introduces MSMA, a framework that decomposes large language model representations into multiple scales to analyze and improve their interpretability and information flow, demonstrating consistent hierarchical patterns and architecture-dependent effects.
Contribution
The paper proposes a novel multi-scale manifold alignment framework that captures hierarchical structures in LLM representations and enhances alignment metrics across various models.
Findings
MSMA improves alignment metrics like KL and MI across models.
Controlled interventions reveal architecture-dependent effects.
Hierarchical patterns are consistent across different LLMs.
Abstract
We present Multi-Scale Manifold Alignment(MSMA), an information-geometric framework that decomposes LLM representations into local, intermediate, and global manifolds and learns cross-scale mappings that preserve geometry and information. Across GPT-2, BERT, RoBERTa, and T5, we observe consistent hierarchical patterns and find that MSMA improves alignment metrics under multiple estimators (e.g., relative KL reduction and MI gains with statistical significance across seeds). Controlled interventions at different scales yield distinct and architecture-dependent effects on lexical diversity, sentence structure, and discourse coherence. While our theoretical analysis relies on idealized assumptions, the empirical results suggest that multi-objective alignment offers a practical lens for analyzing cross-scale information flow and guiding representation-level control.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
