StructLens: A Structural Lens for Language Models via Maximum Spanning Trees

Haruki Sakajo; Frederikus Hudi; Yusuke Sakai; Hidetaka Kamigaito; Taro Watanabe

arXiv:2603.03328·cs.CL·May 19, 2026

StructLens: A Structural Lens for Language Models via Maximum Spanning Trees

Haruki Sakajo, Frederikus Hudi, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe

PDF

1 Repo 3 Reviews

TL;DR

StructLens is a novel framework that uses maximum spanning trees to analyze and visualize the internal structural organization of language model representations across layers and training stages.

Contribution

It introduces a holistic structural analysis method for language model representations, revealing how token relationships and organizational units evolve during pre-training.

Findings

01

Middle layers exhibit the strongest local-span organization.

02

Smaller local units are detectable earlier in training, larger units emerge later.

03

StructLens provides new insights into token organization in language models.

Abstract

Language exhibits inherent structures, a property that explains both language acquisition and language change. Given this characteristic, we expect language models to manifest their own internal structures as well. While interpretability research has investigated how models compute representations mechanistically through attention patterns and Sparse AutoEncoders, the organization of the resulting representations is overlooked. To address this gap, we introduce StructLens, a framework to analyze representations through a holistic structural view. StructLens constructs maximum spanning trees based on the semantic representations in residual streams, inspired by tree representation in dependency parsing, and provides summaries of token relationships in representation space. We analyze how contiguous tokens are also nearby in representation space and find that middle layers show the…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

The idea of using a tree to define a structure summary of a layer's representation is interesting. The formulas and construction are clearly stated. The empirical patterns are visually compelling. The demonstrations of the correlation with confidence degradation and the pruning case study provide practical usefulness.

Weaknesses

Some simple baselines are not compared, and key design choices are insufficiently justified. Without these pieces, claims about superiority and practical utility are not yet established. ### 1. Baseline to be compared The paper contrasts StructLens primarily with token-aligned cosine (Eq. 6). However, global inter-layer similarity is standardly assessed with Centered Kernel Alignment (CKA) and close relatives SVCCA/PWCCA. For example, for two layer-representation matrices $X \in \mathbb{R}^{N

Reviewer 02Rating 2Confidence 4

Strengths

- StructLens is an original approach to language model interpretability, offering a global structural perspective that complements existing token-level and attention-based analyses. - The paper provides clear mathematical formulations for tree construction and for the presented similarity metrics. - The exploration of structure-aware metrics for layer pruning is practical and connects interpretability with model compression.

Weaknesses

- Only 50 instances per dataset is a small sample to obtain reliable or generalizable insights. - The results obtained in Section 4.2 are on a single instance of MMLU, which is too limited to extract conclusions (and occupy an entire page). - The layer pruning results in Table 5 are inconsistent. In some cases, structure-aware metrics underperform base cosine similarity. There is no statistical significance analysis. - The findings presented in the paper, e.g., the "island" patterns in Edge-Edit

Reviewer 03Rating 2Confidence 4

Strengths

The paper is clearly structured, and the framework is highly practical. The choice of algorithms and the detailed methodological treatment are commendable, particularly the consideration of a single root node to ensure consistency. The tree-based indicators proposed in the paper are effectively applied and validated in these experiments. Additionally, the case study presented in *Section 4.2: FREQUENT SUBTREES* is a clever choice, effectively illustrating the relationship between language struct

Weaknesses

(1) The feasibility of MST calculation and its related algorithms requires further verification. For very large models or long token sequences, the computational cost of this algorithm can be substantial. Moreover, the **Edge-Edit** and **Tree-Edit** indicators involve multiple operations which may further reduce computational efficiency. (2) The paper only conducts experiments only on **Llama 3.1 and Qwen 2.5**, limiting the number of models studied. The choice of models could be improved, esp

Code & Models

Repositories

naist-nlp/structlens
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Natural Language Processing Techniques