From Atoms to Trees: Building a Structured Feature Forest with Hierarchical Sparse Autoencoders

Yifan Luo; Yang Zhan; Jiedong Jiang; Tianyang Liu; Mingrui Wu; Zhennan Zhou; Bin Dong

arXiv:2602.11881·cs.AI·February 13, 2026

From Atoms to Trees: Building a Structured Feature Forest with Hierarchical Sparse Autoencoders

Yifan Luo, Yang Zhan, Jiedong Jiang, Tianyang Liu, Mingrui Wu, Zhennan Zhou, Bin Dong

PDF

Open Access

TL;DR

This paper introduces Hierarchical Sparse Autoencoders (HSAE), a novel method for uncovering hierarchical, semantically meaningful structures in large language models, enhancing interpretability without sacrificing reconstruction quality.

Contribution

HSAE jointly learns feature hierarchies and parent-child relationships, using structural constraints and perturbations, advancing the analysis of LLM internal representations.

Findings

01

HSAE reliably recovers semantic hierarchies across models and layers.

02

HSAE maintains high reconstruction fidelity and interpretability.

03

Qualitative and quantitative evaluations validate HSAE's effectiveness.

Abstract

Sparse autoencoders (SAEs) have proven effective for extracting monosemantic features from large language models (LLMs), yet these features are typically identified in isolation. However, broad evidence suggests that LLMs capture the intrinsic structure of natural language, where the phenomenon of "feature splitting" in particular indicates that such structure is hierarchical. To capture this, we propose the Hierarchical Sparse Autoencoder (HSAE), which jointly learns a series of SAEs and the parent-child relationships between their features. HSAE strengthens the alignment between parent and child features through two novel mechanisms: a structural constraint loss and a random feature perturbation mechanism. Extensive experiments across various LLMs and layers demonstrate that HSAE consistently recovers semantically meaningful hierarchies, supported by both qualitative case studies and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Domain Adaptation and Few-Shot Learning