AGC: Adaptive Geodesic Correction for Adversarial Robustness on Vision-Language Models

Zhiwei Li; Jiacheng Xue; Weining Wang; Ajian Liu; Xingyu Gao; Zhenan Sun; Qi Li

arXiv:2605.15584·cs.CV·May 18, 2026

AGC: Adaptive Geodesic Correction for Adversarial Robustness on Vision-Language Models

Zhiwei Li, Jiacheng Xue, Weining Wang, Ajian Liu, Xingyu Gao, Zhenan Sun, Qi Li

PDF

TL;DR

This paper introduces AGC, a training-free, geometry-based defense mechanism that significantly enhances the adversarial robustness of vision-language models like CLIP with minimal inference overhead.

Contribution

AGC leverages geometric cues from specific data augmentations to correct adversarial perturbations without retraining or gradient optimization.

Findings

01

AGC improves robust accuracy by 44.4% on average across multiple datasets.

02

AGC reduces inference latency by a factor of 10 compared to gradient-based defenses.

03

AGC reveals a fundamental geometric property of CLIP features for robustness.

Abstract

Vision-language models like CLIP have demonstrated remarkable zero-shot transfer capabilities. However, their susceptibility to imperceptible adversarial perturbations remains a critical security concern. While test-time defenses offer a pragmatic solution for deployed models, existing approaches typically rely on gradient-based optimization during inference, incurring significant computational overhead. In this paper, we revisit the role of data augmentation in CLIP robustness and observe that augmentations are not equally effective: specific augmentations consistently provide robust geometric cues that align with correct class semantics in the hyperspherical feature space. Based on this, we propose Adaptive Geodesic Correction (AGC), a training-free defense mechanism that requires no parameter updates. AGC identifies a reliable augmentation as a geometric anchor and corrects the input…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.