Quantifying Modality Contributions via Disentangling Multimodal Representations

Padegal Amit; Omkar Mahesh Kashyap; Namitha Rayasam; Nidhi Shekhar; Surabhi Narayan

arXiv:2511.19470·cs.LG·November 26, 2025

Quantifying Modality Contributions via Disentangling Multimodal Representations

Padegal Amit, Omkar Mahesh Kashyap, Namitha Rayasam, Nidhi Shekhar, Surabhi Narayan

PDF

Open Access

TL;DR

This paper introduces a framework using Partial Information Decomposition to accurately quantify the unique, redundant, and synergistic contributions of each modality in multimodal models, addressing limitations of previous accuracy-based methods.

Contribution

It proposes a novel, scalable method based on PID and IPFP for disentangling modality contributions at the representation level without retraining.

Findings

01

Provides a principled way to analyze modality contributions

02

Enables inference-only analysis of multimodal models

03

Offers clearer insights than outcome-based metrics

Abstract

Quantifying modality contributions in multimodal models remains a challenge, as existing approaches conflate the notion of contribution itself. Prior work relies on accuracy-based approaches, interpreting performance drops after removing a modality as indicative of its influence. However, such outcome-driven metrics fail to distinguish whether a modality is inherently informative or whether its value arises only through interaction with other modalities. This distinction is particularly important in cross-attention architectures, where modalities influence each other's representations. In this work, we propose a framework based on Partial Information Decomposition (PID) that quantifies modality contributions by decomposing predictive information in internal embeddings into unique, redundant, and synergistic components. To enable scalable, inference-only analysis, we develop an algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Multimodal Machine Learning Applications