Line of Sight: On Linear Representations in VLLMs

Achyuta Rajaram; Sarah Schwettmann; Jacob Andreas; and Arthur Conmy

arXiv:2506.04706·cs.CV·June 6, 2025

Line of Sight: On Linear Representations in VLLMs

Achyuta Rajaram, Sarah Schwettmann, Jacob Andreas, and Arthur Conmy

PDF

Open Access

TL;DR

This paper investigates how visual concepts are represented in large language models with multimodal capabilities, revealing linearly decodable features and their causal influence, and introduces multimodal Sparse Autoencoders for interpretability.

Contribution

It demonstrates the presence of linearly decodable image features in VLLMs and introduces multimodal Sparse Autoencoders to enhance interpretability of visual representations.

Findings

01

ImageNet classes are represented via linearly decodable features.

02

Features are causal, as shown by targeted edits affecting outputs.

03

Representation sharing increases in deeper layers.

Abstract

Language models can be equipped with multimodal capabilities by fine-tuning on embeddings of visual inputs. But how do such multimodal models represent images in their hidden activations? We explore representations of image concepts within LlaVA-Next, a popular open-source VLLM. We find a diverse set of ImageNet classes represented via linearly decodable features in the residual stream. We show that the features are causal by performing targeted edits on the model output. In order to increase the diversity of the studied linear features, we train multimodal Sparse Autoencoders (SAEs), creating a highly interpretable dictionary of text and image features. We find that although model representations across modalities are quite disjoint, they become increasingly shared in deeper layers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling

MethodsSparse Evolutionary Training