On the Adversarial Robustness of Multi-Modal Foundation Models
Christian Schlarmann, Matthias Hein

TL;DR
This paper demonstrates that imperceivable adversarial attacks on multi-modal foundation models can mislead honest users, highlighting the need for robust defenses against such attacks in deployed systems.
Contribution
The paper reveals a new security vulnerability in multi-modal foundation models caused by imperceivable image attacks affecting output accuracy.
Findings
Imperceivable attacks can alter model captions misleading users.
Malicious content can exploit these attacks to guide users to harmful sites.
Countermeasures are necessary for safe deployment of multi-modal models.
Abstract
Multi-modal foundation models combining vision and language models such as Flamingo or GPT-4 have recently gained enormous interest. Alignment of foundation models is used to prevent models from providing toxic or harmful output. While malicious users have successfully tried to jailbreak foundation models, an equally important question is if honest users could be harmed by malicious third-party content. In this paper we show that imperceivable attacks on images in order to change the caption output of a multi-modal foundation model can be used by malicious content providers to harm honest users e.g. by guiding them to malicious websites or broadcast fake information. This indicates that countermeasures to adversarial attacks should be used by any deployed multi-modal foundation model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · COVID-19 diagnosis using AI
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Label Smoothing · Layer Normalization · Softmax · Dense Connections
