Adversarially Robust CLIP Models Can Induce Better (Robust) Perceptual Metrics
Francesco Croce, Christian Schlarmann, Naman Deep Singh, Matthias Hein

TL;DR
This paper demonstrates that adversarially robust CLIP models produce superior, robust perceptual similarity metrics that outperform existing methods, especially under adversarial attacks, and enhance interpretability and related task performance.
Contribution
Introducing adversarially robust CLIP models (R-CLIP) that generate more reliable and interpretable perceptual metrics with improved robustness and applicability to related tasks.
Findings
Robust CLIP models outperform existing perceptual metrics in zero-shot settings.
The robust metrics maintain high accuracy under adversarial attacks.
Enhanced interpretability through feature and text inversion visualizations.
Abstract
Measuring perceptual similarity is a key tool in computer vision. In recent years perceptual metrics based on features extracted from neural networks with large and diverse training sets, e.g. CLIP, have become popular. At the same time, the metrics extracted from features of neural networks are not adversarially robust. In this paper we show that adversarially robust CLIP models, called R-CLIP, obtained by unsupervised adversarial fine-tuning induce a better and adversarially robust perceptual metric that outperforms existing metrics in a zero-shot setting, and further matches the performance of state-of-the-art metrics while being robust after fine-tuning. Moreover, our perceptual metric achieves strong performance on related tasks such as robust image-to-image retrieval, which becomes especially relevant when applied to "Not Safe for Work" (NSFW) content detection and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsContrastive Language-Image Pre-training
