Perceptual Score: What Data Modalities Does Your Model Perceive?
Itai Gat, Idan Schwartz, Alexander Schwing

TL;DR
This paper introduces the perceptual score, a metric to evaluate how much multi-modal models rely on different data modalities, revealing that recent models increasingly depend less on visual data, raising concerns about bias and shortcut learning.
Contribution
The paper proposes the perceptual score metric to quantify modality reliance in models and demonstrates its use in analyzing trends and biases in multi-modal datasets and models.
Findings
Recent models rely less on visual data than predecessors.
Perceptual score reveals biases and shortcut learning.
Encourages quantification of model perceptiveness.
Abstract
Machine learning advances in the last decade have relied significantly on large-scale datasets that continue to grow in size. Increasingly, those datasets also contain different data modalities. However, large multi-modal datasets are hard to annotate, and annotations may contain biases that we are often unaware of. Deep-net-based classifiers, in turn, are prone to exploit those biases and to find shortcuts. To study and quantify this concern, we introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features, i.e., modalities. Using the perceptual score, we find a surprisingly consistent trend across four popular datasets: recent, more accurate state-of-the-art multi-modal models for visual question-answering or visual dialog tend to perceive the visual data less than their predecessors. This trend is concerning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
