Adversarial Robustness for Visual Grounding of Multimodal Large Language   Models

Kuofeng Gao; Yang Bai; Jiawang Bai; Yong Yang; Shu-Tao Xia

arXiv:2405.09981·cs.CV·May 17, 2024·2 cites

Adversarial Robustness for Visual Grounding of Multimodal Large Language Models

Kuofeng Gao, Yang Bai, Jiawang Bai, Yong Yang, Shu-Tao Xia

PDF

Open Access 1 Repo

TL;DR

This paper investigates the adversarial robustness of multimodal large language models in visual grounding, proposing new attack methods and establishing baselines to improve understanding and defense against such attacks.

Contribution

The paper introduces three novel adversarial attack paradigms for visual grounding in MLLMs and provides extensive experiments demonstrating their effectiveness.

Findings

01

Proposed attack methods successfully fool MLLMs in visual grounding tasks.

02

The attacks reveal vulnerabilities in current MLLMs' visual grounding capabilities.

03

Baseline benchmarks for adversarial robustness in visual grounding are established.

Abstract

Multi-modal Large Language Models (MLLMs) have recently achieved enhanced performance across various vision-language tasks including visual grounding capabilities. However, the adversarial robustness of visual grounding remains unexplored in MLLMs. To fill this gap, we use referring expression comprehension (REC) as an example task in visual grounding and propose three adversarial attack paradigms as follows. Firstly, untargeted adversarial attacks induce MLLMs to generate incorrect bounding boxes for each object. Besides, exclusive targeted adversarial attacks cause all generated outputs to the same target bounding box. In addition, permuted targeted adversarial attacks aim to permute all bounding boxes among different objects within a single image. Extensive experiments demonstrate that the proposed methods can successfully attack visual grounding capabilities of MLLMs. Our methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KuofengGao/MLLM-Grounding-Robustness
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)