Adversarial Testing for Visual Grounding via Image-Aware Property   Reduction

Zhiyuan Chang; Mingyang Li; Junjie Wang; Cheng Li; Boyu Wu; Fanjiang; Xu; Qing Wang

arXiv:2403.01118·cs.CV·March 5, 2024·1 cites

Adversarial Testing for Visual Grounding via Image-Aware Property Reduction

Zhiyuan Chang, Mingyang Li, Junjie Wang, Cheng Li, Boyu Wu, Fanjiang, Xu, Qing Wang

PDF

Open Access

TL;DR

This paper introduces PEELING, an adversarial testing method for visual grounding that reduces property information in expressions while maintaining their descriptive power, effectively challenging models by exploiting image-text correlations.

Contribution

PEELING is a novel image-aware property reduction approach for adversarial testing of visual grounding models, addressing limitations of existing methods that ignore cross-modal correlations.

Findings

01

PEELING achieves a 21.4% MultiModal Impact score (MMI).

02

It outperforms state-of-the-art baselines by 8.2% to 15.1%.

03

Demonstrates effective adversarial testing on multiple datasets.

Abstract

Due to the advantages of fusing information from various modalities, multimodal learning is gaining increasing attention. Being a fundamental task of multimodal learning, Visual Grounding (VG), aims to locate objects in images through natural language expressions. Ensuring the quality of VG models presents significant challenges due to the complex nature of the task. In the black box scenario, existing adversarial testing techniques often fail to fully exploit the potential of both modalities of information. They typically apply perturbations based solely on either the image or text information, disregarding the crucial correlation between the two modalities, which would lead to failures in test oracles or an inability to effectively challenge VG models. To this end, we propose PEELING, a text perturbation approach via image-aware property reduction for adversarial testing of the VG…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Image and Object Detection Techniques