AGC: Adaptive Geodesic Correction for Adversarial Robustness on Vision-Language Models
Zhiwei Li, Jiacheng Xue, Weining Wang, Ajian Liu, Xingyu Gao, Zhenan Sun, Qi Li

TL;DR
This paper introduces AGC, a training-free, geometry-based defense mechanism that significantly enhances the adversarial robustness of vision-language models like CLIP with minimal inference overhead.
Contribution
AGC leverages geometric cues from specific data augmentations to correct adversarial perturbations without retraining or gradient optimization.
Findings
AGC improves robust accuracy by 44.4% on average across multiple datasets.
AGC reduces inference latency by a factor of 10 compared to gradient-based defenses.
AGC reveals a fundamental geometric property of CLIP features for robustness.
Abstract
Vision-language models like CLIP have demonstrated remarkable zero-shot transfer capabilities. However, their susceptibility to imperceptible adversarial perturbations remains a critical security concern. While test-time defenses offer a pragmatic solution for deployed models, existing approaches typically rely on gradient-based optimization during inference, incurring significant computational overhead. In this paper, we revisit the role of data augmentation in CLIP robustness and observe that augmentations are not equally effective: specific augmentations consistently provide robust geometric cues that align with correct class semantics in the hyperspherical feature space. Based on this, we propose Adaptive Geodesic Correction (AGC), a training-free defense mechanism that requires no parameter updates. AGC identifies a reliable augmentation as a geometric anchor and corrects the input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
