Decomposing multimodal embedding spaces with group-sparse autoencoders

Chiraag Kaushik; Davis Barch; Andrea Fanelli

arXiv:2601.20028·cs.LG·January 29, 2026

Decomposing multimodal embedding spaces with group-sparse autoencoders

Chiraag Kaushik, Davis Barch, Andrea Fanelli

PDF

Open Access

TL;DR

This paper introduces a novel autoencoder method for decomposing multimodal embeddings, enhancing their interpretability and alignment across different data modalities like images, text, and audio.

Contribution

The authors propose a group-sparse autoencoder with cross-modal masking to improve multimodal embedding decomposition and alignment, addressing limitations of previous sparse autoencoders.

Findings

01

Learned more multimodal dictionaries with better alignment.

02

Reduced dead neurons compared to standard SAEs.

03

Enhanced interpretability and control in cross-modal tasks.

Abstract

The Linear Representation Hypothesis asserts that the embeddings learned by neural networks can be understood as linear combinations of features corresponding to high-level concepts. Based on this ansatz, sparse autoencoders (SAEs) have recently become a popular method for decomposing embeddings into a sparse combination of linear directions, which have been shown empirically to often correspond to human-interpretable semantics. However, recent attempts to apply SAEs to multimodal embedding spaces (such as the popular CLIP embeddings for image/text data) have found that SAEs often learn "split dictionaries", where most of the learned sparse features are essentially unimodal, active only for data of a single modality. In this work, we study how to effectively adapt SAEs for the setting of multimodal embeddings while ensuring multimodal alignment. We first argue that the existence of a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)