Conceptualizing Embeddings: Sparse Disentanglement for Vision-Language Models

Piotr Kubaty; Patryk Marsza{\l}ek; {\L}ukasz Struski; Adam Wr\'obel; Jacek Tabor; Marek \'Smieja

arXiv:2605.22679·cs.CV·May 22, 2026

Conceptualizing Embeddings: Sparse Disentanglement for Vision-Language Models

Piotr Kubaty, Patryk Marsza{\l}ek, {\L}ukasz Struski, Adam Wr\'obel, Jacek Tabor, Marek \'Smieja

PDF

TL;DR

CEDAR is a post-hoc method that reveals the compositional structure of pretrained vision-language embeddings by learning an invertible, sparse transformation, enhancing interpretability without increasing dimensionality.

Contribution

Introduces CEDAR, a novel approach for disentangling vision-language embeddings through adaptive rotation, improving interpretability and alignment with human perception.

Findings

01

CEDAR achieves a good balance between reconstruction quality and sparsity.

02

Coordinates in CEDAR embeddings can be interpreted with textual concepts.

03

The method improves interpretability of vision-language models without expanding dimensions.

Abstract

Vision-language models learn powerful multimodal embeddings, yet their internal semantics remain opaque. While sparse autoencoders (SAEs) can extract interpretable features, they rely on expanding the representation dimension, which compromises the original geometry and introduces redundancy. We introduce CEDAR (Conceptual Embedding Disentanglement via Adaptive Rotation), a post-hoc method that reveals the compositional structure of pretrained embeddings without increasing dimensionality. By learning an invertible transformation with a top- $k$ sparsity bottleneck, CEDAR concentrates semantic information into axis-aligned disentangled coordinates. In CLIP-like architecture, individual coordinates can be interpreted with textual concepts, while for generative models such as BLIP, they can be decoded into natural language descriptions. Experiments demonstrate that CEDAR achieves a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.