Understanding the Vulnerability of CLIP to Image Compression

Cangxiong Chen; Vinay P. Namboodiri; Julian Padget

arXiv:2311.14029·cs.CV·November 27, 2023·1 cites

Understanding the Vulnerability of CLIP to Image Compression

Cangxiong Chen, Vinay P. Namboodiri, Julian Padget

PDF

Open Access 1 Repo

TL;DR

This paper reveals that CLIP, a popular vision-language model, is vulnerable to image compression, affecting its zero-shot recognition accuracy, and provides insights to improve its robustness.

Contribution

It demonstrates CLIP's vulnerability to image compression and uses attribution methods to analyze the impact, aiding future robustness improvements.

Findings

01

CLIP's recognition accuracy decreases with image compression.

02

Attribution analysis reveals how compression affects model decisions.

03

Extensive evaluation on CIFAR-10 and STL-10 supports findings.

Abstract

CLIP is a widely used foundational vision-language model that is used for zero-shot image recognition and other image-text alignment tasks. We demonstrate that CLIP is vulnerable to change in image quality under compression. This surprising result is further analysed using an attribution method-Integrated Gradients. Using this attribution method, we are able to better understand both quantitatively and qualitatively exactly the nature in which the compression affects the zero-shot recognition accuracy of this model. We evaluate this extensively on CIFAR-10 and STL-10. Our work provides the basis to understand this vulnerability of CLIP and can help us develop more effective methods to improve the robustness of CLIP and other vision-language models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CangxiongChen/understanding_CLIP_vulnerability
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Advanced Image Processing Techniques · Medical Imaging Techniques and Applications

MethodsContrastive Language-Image Pre-training