Kronecker Factorization Improves Efficiency and Interpretability of Sparse Autoencoders

Vadim Kurochkin; Yaroslav Aksenov; Daniil Laptev; Daniil Gavrilov; Nikita Balagansky

arXiv:2505.22255·cs.LG·December 23, 2025

Kronecker Factorization Improves Efficiency and Interpretability of Sparse Autoencoders

Vadim Kurochkin, Yaroslav Aksenov, Daniil Laptev, Daniil Gavrilov, Nikita Balagansky

PDF

Open Access 3 Reviews

TL;DR

KronSAE introduces a Kronecker product-based factorization and a novel activation function to enhance the efficiency and interpretability of sparse autoencoders, especially at large scales.

Contribution

The paper proposes KronSAE, a new architecture that reduces computational costs of SAEs using Kronecker factorization and introduces mAND, a differentiable activation for better interpretability.

Findings

01

KronSAE significantly reduces memory and computation requirements.

02

mAND improves interpretability and performance of the autoencoder.

03

The approach scales effectively to large dictionary sizes.

Abstract

Sparse Autoencoders (SAEs) have demonstrated significant promise in interpreting the hidden states of language models by decomposing them into interpretable latent directions. However, training and interpreting SAEs at scale remains challenging, especially when large dictionary sizes are used. While decoders can leverage sparse-aware kernels for efficiency, encoders still require computationally intensive linear operations with large output dimensions. To address this, we propose KronSAE, a novel architecture that factorizes the latent representation via Kronecker product decomposition, drastically reducing memory and computational overhead. Furthermore, we introduce mAND, a differentiable activation function approximating the binary AND operation, which improves interpretability and performance in our factorized framework.

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 2

Strengths

- Applying Kronecker product decomposition to SAE seems new. The tensor factorisation in model compression have been considered before (e.g. Edalati et al. 2021) but the specific application to SAE latent spaces with head-wise decomposition as far as the reviewer can see is novel. - Aiming to address both computational efficiency (encoder bottleneck) as well as interpretability (compositional structure) is a more ambitious task than usual optimisation-first approaches. - The authors use a re

Weaknesses

- The paper mentions that Kronecker factorisation induces compositional/hierarchical features. But the mechanism seems to be underspecified and not rigorously justified/described - The mAND activation (Eq. 3) is one of the important elements of the method but the paper seems to lack principled justification beyond empirical performance. Why square root? How closely does mAND approximate true binary AND? (Fig. 10) How much the smoothness introduces unwanted activations (e.g. false positives in

Reviewer 02Rating 10Confidence 4

Strengths

The work introduces an innovative SAE architecture, KronSAE, that makes substantive improvements over prior SAEs in several dimensions (efficiency, feature absorption, compositionality, and interpretability), and authors clearly demonstrates these improvements empirically across comprehensive experiments. Each of these improvements presents real value to the community in their own right; and taken together, KronSAE represents a clear and significant contribution to the SAE literature.

Weaknesses

I do not see any particularly substantive weaknesses in terms of the technical contributions and empirical work presented in the paper. My one concern is that the paper fails to cite any works from the very closely-related research area of tensor product representation (TPR). TPR, first introduced in 1990 [1], has long studied how to encode compositional representations in dense embedding vectors via tensor products (a generalization of the Kronecker product), with more recent works leveraging

Reviewer 03Rating 6Confidence 5

Strengths

Introducing structured priors into SAEs is useful. This offers both regularization for the training process but also helps interpretability of latents. I think this paper hence makes meaningful progress in an important domain. Despite my later comments nit-picking sections, the overall presentation is sound. The text is clearly structured and the story makes sense. The presented experiments are thorough and most my questions were answered immediately.

Weaknesses

--- **Weaknesses** --- The correlation experiment seems heavily favoured towards your approach since this is precisely what the Kronecker structure relies on for extraction. It's nice to see the KronSAE succeeds but I'm fairly certain there's an equally contrived experiment where TopK will find ground truth structure much better than KronSAEs. Actually, looking at Figure 9, it seems that this correlation plot is basically the same regardless of the original patterns, which further undermines th

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning