TL;DR
This paper introduces a semantics-guided framework for selecting adversarial targets in vision model attacks using pretrained language models, improving interpretability and effectiveness over traditional methods.
Contribution
It proposes a novel cross-modal knowledge transfer approach leveraging pretrained language and vision-language models for targeted adversarial testing.
Findings
Pretrained models outperform static lexical resources in target selection.
Semantic similarity sources influence attack success and target relevance.
Framework enables scalable, interpretable adversarial benchmarks.
Abstract
In targeted adversarial attacks on vision models, the selection of the target label is a critical yet often overlooked determinant of attack success. This target label corresponds to the class that the attacker aims to force the model to predict. Now, existing strategies typically rely on randomness, model predictions, or static semantic resources, limiting interpretability, reproducibility, or flexibility. This paper then proposes a semantics-guided framework for adversarial target selection using the cross-modal knowledge transfer from pretrained language and vision-language models. We evaluate several state-of-the-art models (BERT, TinyLLAMA, and CLIP) as similarity sources to select the most and least semantically related labels with respect to the ground truth, forming best- and worst-case adversarial scenarios. Our experiments on three vision models and five attack methods reveal…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
The paper addresses an important topic, i.e., evaluating model robustness under adversarial attacks.
1. The motivation of the work is not convincing. Targeted attacks aim to mislead the victim model into predicting a specific target class. The target class should not be chosen adaptively as the most effective one, as this undermines the purpose of evaluating model vulnerability. 2. The compared baselines are outdated and limited to basic attack methods. Recent approaches should be discussed and compared. 3. Only a few networks are evaluated (MobileNetV2, EfficientNetV2B0, and ResNet50V2). The s
- Well written. - The proposed method is interesting.
- This paper merely replaces WordNet/model weights with pretrained language/VL models but fails to justify why this is a paradigm shift rather than incremental improvement. - The NIPS 2017 dataset is outdated (2017) and small-scale, lacking the complexity of modern datasets with more diverse classes and realistic perturbations. - Only 3 vision models are tested, all of which are relatively shallow. The framework's performance on state-of-the-art VLMs remains unproven. - A GitHub link (https://g
- The paper standardizes target selection for targeted adversarial attacks via semantics-guided MS/LS choices with precomputed lookup tables, yielding a training-free, interpretable, and easily adoptable protocol - The dissimilarity metric (DM) serves as a practical pre-attack predictor and triage tool; observed trends between DM and TSR/FR support its potential value, while clarifying local and global semantics - The study spans three ImageNet models and five attacks, providing multi-axis cover
- Evidence for transferability robustness is incomplete, lacking thorough cross-model, cross-attack (under matched budgets), and cross-dataset targeted transfer analyses - Budget configurations (ε, steps, step size, etc.) are lack details, potentially inflating and confounding of semantic effects and attack performance - Baselines such as well-known targeted AutoAttack are missing, potentially weakening claims - Statistical rigor seems insufficient without random-seed confidence intervals, paire
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
