Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content   Dilutions

Gaurav Verma; Vishwa Vinay; Ryan A. Rossi; Srijan Kumar

arXiv:2211.02646·cs.LG·November 7, 2022·1 cites

Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions

Gaurav Verma, Vishwa Vinay, Ryan A. Rossi, Srijan Kumar

PDF

Open Access

TL;DR

This paper investigates the robustness of multimodal classifiers to cross-modal dilutions, showing that adding relevant but misleading text significantly reduces model accuracy in societal applications.

Contribution

The authors develop a model that generates relevant dilutions to test and demonstrate the brittleness of fusion-based multimodal classifiers.

Findings

01

Classifier performance drops by over 22% with dilutions.

02

Dilutions are highly relevant and topically coherent.

03

The method effectively exposes model vulnerabilities.

Abstract

As multimodal learning finds applications in a wide variety of high-stakes societal tasks, investigating their robustness becomes important. Existing work has focused on understanding the robustness of vision-and-language models to imperceptible variations on benchmark tasks. In this work, we investigate the robustness of multimodal classifiers to cross-modal dilutions - a plausible variation. We develop a model that, given a multimodal (image + text) input, generates additional dilution text that (a) maintains relevance and topical coherence with the image and existing text, and (b) when added to the original text, leads to misclassification of the multimodal input. Via experiments on Crisis Humanitarianism and Sentiment Detection tasks, we find that the performance of task-specific fusion-based multimodal classifiers drops by 23.3% and 22.5%, respectively, in the presence of dilutions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques