TL;DR
VL-SAE introduces a unified concept set to interpret and improve vision-language model alignments by correlating neurons with semantic concepts, enhancing interpretability and downstream task performance.
Contribution
The paper proposes VL-SAE, a sparse autoencoder that maps vision-language representations to a concept set, enabling interpretability and alignment enhancement in VLMs.
Findings
VL-SAE effectively interprets vision-language representations.
VL-SAE improves zero-shot image classification accuracy.
VL-SAE reduces hallucinations in VLM outputs.
Abstract
The alignment of vision-language representations endows current Vision-Language Models (VLMs) with strong multi-modal reasoning capabilities. However, the interpretability of the alignment component remains uninvestigated due to the difficulty in mapping the semantics of multi-modal representations into a unified concept set. To address this problem, we propose VL-SAE, a sparse autoencoder that encodes vision-language representations into its hidden activations. Each neuron in its hidden layer correlates to a concept represented by semantically similar images and texts, thereby interpreting these representations with a unified concept set. To establish the neuron-concept correlation, we encourage semantically similar representations to exhibit consistent neuron activations during self-supervised training. First, to measure the semantic similarity of multi-modal representations, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
