Group-SAE: Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups

Davide Ghilardi; Federico Belotti; Marco Molinari; Tao Ma; Matteo Palmonari

arXiv:2410.21508·cs.CL·September 23, 2025

Group-SAE: Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups

Davide Ghilardi, Federico Belotti, Marco Molinari, Tao Ma, Matteo Palmonari

PDF

Open Access 1 Video

TL;DR

Group-SAE introduces a layer grouping strategy for training sparse autoencoders in large language models, significantly reducing training time while maintaining performance and interpretability.

Contribution

The paper proposes Group-SAE, a novel layer grouping method guided by the AMAD metric, to efficiently train SAEs across multiple layers of large language models.

Findings

01

Significantly accelerates SAE training with minimal quality loss.

02

Maintains comparable downstream task performance.

03

Provides a scalable approach for large models.

Abstract

SAEs have recently been employed as a promising unsupervised approach for understanding the representations of layers of Large Language Models (LLMs). However, with the growth in model size and complexity, training SAEs is computationally intensive, as typically one SAE is trained for each model layer. To address such limitation, we propose \textit{Group-SAE}, a novel strategy to train SAEs. Our method considers the similarity of the residual stream representations between contiguous layers to group similar layers and train a single SAE per group. To balance the trade-off between efficiency and performance, we further introduce \textit{AMAD} (Average Maximum Angular Distance), an empirical metric that guides the selection of an optimal number of groups based on representational similarity across layers. Experiments on models from the Pythia family show that our approach significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Group-SAE: Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques

MethodsPythia