PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems

Qi Guo; Xiaojun Jia; Shanmin Pang; Simeng Qin; Lin Wang; Ju Jia; Yang Liu; Qing Guo

arXiv:2508.05167·cs.CV·August 8, 2025

PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems

Qi Guo, Xiaojun Jia, Shanmin Pang, Simeng Qin, Lin Wang, Ju Jia, Yang Liu, Qing Guo

PDF

1 Video

TL;DR

PhysPatch is a novel adversarial patch method designed for multimodal large language models in autonomous driving, optimizing for physical realizability and transferability to effectively manipulate perception and planning outputs.

Contribution

It introduces a comprehensive framework that jointly optimizes patch attributes and employs semantic and alignment strategies for effective, real-world transferable attacks on MLLM-based autonomous driving systems.

Findings

01

Outperforms prior methods in attack success rate

02

Ensures patches are physically feasible in real-world scenes

03

Demonstrates robustness across various MLLMs

Abstract

Multimodal Large Language Models (MLLMs) are becoming integral to autonomous driving (AD) systems due to their strong vision-language reasoning capabilities. However, MLLMs are vulnerable to adversarial attacks, particularly adversarial patch attacks, which can pose serious threats in real-world scenarios. Existing patch-based attack methods are primarily designed for object detection models and perform poorly when transferred to MLLM-based systems due to the latter's complex architectures and reasoning abilities. To address these limitations, we propose PhysPatch, a physically realizable and transferable adversarial patch framework tailored for MLLM-based AD systems. PhysPatch jointly optimizes patch location, shape, and content to enhance attack effectiveness and real-world applicability. It introduces a semantic-based mask initialization strategy for realistic placement, an SVD-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems· underline