MultiSHAP: A Shapley-Based Framework for Explaining Cross-Modal Interactions in Multimodal AI Models

Zhanliang Wang; Kai Wang

arXiv:2508.00576·cs.AI·February 18, 2026

MultiSHAP: A Shapley-Based Framework for Explaining Cross-Modal Interactions in Multimodal AI Models

Zhanliang Wang, Kai Wang

PDF

Open Access

TL;DR

MultiSHAP is a novel, model-agnostic framework that uses Shapley interactions to precisely explain how multimodal AI models combine visual and textual information at both individual and dataset levels.

Contribution

It introduces a new interpretability method that accurately quantifies cross-modal interactions, applicable to both open- and closed-source models, and provides detailed explanations at multiple levels.

Findings

01

Faithfully captures cross-modal reasoning mechanisms

02

Provides instance-level explanations of synergistic effects

03

Uncovers generalizable interaction patterns across datasets

Abstract

Multimodal AI models have achieved impressive performance in tasks that require integrating information from multiple modalities, such as vision and language. However, their "black-box" nature poses a major barrier to deployment in high-stakes applications where interpretability and trustworthiness are essential. How to explain cross-modal interactions in multimodal AI models remains a major challenge. While existing model explanation methods, such as attention map and Grad-CAM, offer coarse insights into cross-modal relationships, they cannot precisely quantify the synergistic effects between modalities, and are limited to open-source models with accessible internal weights. Here we introduce MultiSHAP, a model-agnostic interpretability framework that leverages the Shapley Interaction Index to attribute multimodal predictions to pairwise interactions between fine-grained visual and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI