SCoCCA: Multi-modal Sparse Concept Decomposition via Canonical Correlation Analysis

Ehud Gordon; Meir Yossef Levi; Guy Gilboa

arXiv:2603.13884·cs.CV·March 17, 2026

SCoCCA: Multi-modal Sparse Concept Decomposition via Canonical Correlation Analysis

Ehud Gordon, Meir Yossef Levi, Guy Gilboa

PDF

Open Access

TL;DR

SCoCCA introduces a novel multi-modal concept decomposition framework using Canonical Correlation Analysis to improve interpretability and disentanglement of vision-language models, achieving state-of-the-art results in concept discovery.

Contribution

It proposes Sparse Concept CCA (SCoCCA), a new method that aligns cross-modal embeddings and enforces sparsity for better interpretability and concept disentanglement.

Findings

01

Achieves state-of-the-art in concept discovery tasks

02

Enhances interpretability through sparse, discriminative concepts

03

Improves concept ablation and semantic manipulation results

Abstract

Interpreting the internal reasoning of vision-language models is essential for deploying AI in safety-critical domains. Concept-based explainability provides a human-aligned lens by representing a model's behavior through semantically meaningful components. However, existing methods are largely restricted to images and overlook the cross-modal interactions. Text-image embeddings, such as those produced by CLIP, suffer from a modality gap, where visual and textual features follow distinct distributions, limiting interpretability. Canonical Correlation Analysis (CCA) offers a principled way to align features from different distributions, but has not been leveraged for multi-modal concept-level analysis. We show that the objectives of CCA and InfoNCE are closely related, such that optimizing CCA implicitly optimizes InfoNCE, providing a simple, training-free mechanism to enhance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning