Efficient Quantification of Multimodal Interaction at Sample Level
Zequn Yang, Hongfa Wang, Di Hu

TL;DR
This paper introduces the LSMI estimator, a novel, efficient method for quantifying multimodal interactions at the sample level, enabling detailed analysis and practical applications in multimodal systems.
Contribution
We develop a sample-wise interaction estimation framework based on pointwise information theory, advancing the analysis of multimodal information dynamics.
Findings
LSMI provides precise, efficient sample-level interaction estimates.
The method reveals fine-grained multimodal dynamics at the sample and category levels.
Applications include redundancy-based sample partitioning and interaction-aware model ensembling.
Abstract
Interactions between modalities -- redundancy, uniqueness, and synergy -- collectively determine the composition of multimodal information. Understanding these interactions is crucial for analyzing information dynamics in multimodal systems, yet their accurate sample-level quantification presents significant theoretical and computational challenges. To address this, we introduce the Lightweight Sample-wise Multimodal Interaction (LSMI) estimator, rigorously grounded in pointwise information theory. We first develop a redundancy estimation framework, employing an appropriate pointwise information measure to quantify this most decomposable and measurable interaction. Building upon this, we propose a general interaction estimation method that employs efficient entropy estimation, specifically tailored for sample-wise estimation in continuous distributions. Extensive experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
