SHAPE: An Unified Approach to Evaluate the Contribution and Cooperation of Individual Modalities
Pengbo Hu, Xingyu Li, Yi Zhou

TL;DR
This paper introduces SHAPE scores, based on Shapley values, to quantify the contribution and cooperation of individual modalities in multi-modal deep learning models, aiding better understanding and fusion strategies.
Contribution
The paper proposes a novel SHAPE scoring method to systematically evaluate the contribution and cooperation of modalities in multi-modal models, addressing a gap in quantification.
Findings
Multi-modal models often rely on dominant modalities when modalities are complementary.
Models exploit cross-modal cooperation when modalities are indispensable.
Early-stage fusion is preferable when modalities significantly cooperate.
Abstract
As deep learning advances, there is an ever-growing demand for models capable of synthesizing information from multi-modal resources to address the complex tasks raised from real-life applications. Recently, many large multi-modal datasets have been collected, on which researchers actively explore different methods of fusing multi-modal information. However, little attention has been paid to quantifying the contribution of different modalities within the proposed models. In this paper, we propose the {\bf SH}apley v{\bf A}lue-based {\bf PE}rceptual (SHAPE) scores that measure the marginal contribution of individual modalities and the degree of cooperation across modalities. Using these scores, we systematically evaluate different fusion methods on different multi-modal datasets for different tasks. Our experiments suggest that for some tasks where different modalities are complementary,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Speech and dialogue systems
