Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models

Ying Yang; Jie Zhang; Xiao Lv; Di Lin; Tao Xiang; Qing Guo

arXiv:2505.24227·cs.CV·June 2, 2025

Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models

Ying Yang, Jie Zhang, Xiao Lv, Di Lin, Tao Xiang, Qing Guo

PDF

Open Access

TL;DR

LightD introduces a novel semantically guided relighting framework using ChatGPT and a pretrained relighting model to generate natural adversarial samples for vision-language models, improving attack effectiveness and realism.

Contribution

The paper presents LightD, a new method combining ChatGPT and relighting models to craft natural adversarial samples for VLP models, expanding attack space while preserving scene semantics.

Findings

01

LightD outperforms existing attack methods in effectiveness.

02

Generated adversarial samples maintain high visual naturalness.

03

Demonstrated across multiple VLP tasks and models.

Abstract

While adversarial attacks on vision-and-language pretraining (VLP) models have been explored, generating natural adversarial samples crafted through realistic and semantically meaningful perturbations remains an open challenge. Existing methods, primarily designed for classification tasks, struggle when adapted to VLP models due to their restricted optimization spaces, leading to ineffective attacks or unnatural artifacts. To address this, we propose \textbf{LightD}, a novel framework that generates natural adversarial samples for VLP models via semantically guided relighting. Specifically, LightD leverages ChatGPT to propose context-aware initial lighting parameters and integrates a pretrained relighting model (IC-light) to enable diverse lighting adjustments. LightD expands the optimization space while ensuring perturbations align with scene semantics. Additionally, gradient-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems

MethodsALIGN