Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Kaichen Zhang; Yifei Shen; Bo Li; Ziwei Liu

arXiv:2411.14982·cs.CV·September 19, 2025

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Kaichen Zhang, Yifei Shen, Bo Li, Ziwei Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a framework to interpret the internal features of large multimodal models, enhancing understanding of their decision-making and error patterns by disentangling and analyzing their semantic representations.

Contribution

It presents a novel approach combining a Sparse Autoencoder and an automatic interpretation framework to analyze and understand the semantics within large multimodal models.

Findings

01

Features can steer model behavior effectively.

02

Insights into why models excel in specific tasks.

03

Understanding model mistakes and potential rectifications.

Abstract

Recent advances in Large Multimodal Models (LMMs) lead to significant breakthroughs in both academia and industry. One question that arises is how we, as humans, can understand their internal neural representations. This paper takes an initial step towards addressing this question by presenting a versatile framework to identify and interpret the semantics within LMMs. Specifically, 1) we first apply a Sparse Autoencoder(SAE) to disentangle the representations into human understandable features. 2) We then present an automatic interpretation framework to interpreted the open-semantic features learned in SAE by the LMMs themselves. We employ this framework to analyze the LLaVA-NeXT-8B model using the LLaVA-OV-72B model, demonstrating that these features can effectively steer the model's behavior. Our results contribute to a deeper understanding of why LMMs excel in specific tasks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EvolvingLMMs-Lab/multimodal-sae
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Web Data Mining and Analysis