FoCLIP: A Feature-Space Misalignment Framework for CLIP-Based Image Manipulation and Detection
Yulin Chen, Zeyuan Wang, Tianyuan Yu, Yingmei Wei, Liang Bai

TL;DR
FoCLIP introduces a framework that manipulates feature-space alignment to fool CLIP-based image quality metrics, demonstrating high-quality adversarial images and a detection method based on color sensitivity.
Contribution
The paper presents a novel feature-space misalignment framework for fooling CLIP-based metrics and a color channel sensitivity method for tampering detection.
Findings
Optimized images achieve higher CLIPscore with high visual fidelity.
Grayscale conversion reduces CLIPscore while maintaining statistical similarity.
Color channel sensitivity method attains 91% detection accuracy.
Abstract
The well-aligned attribute of CLIP-based models enables its effective application like CLIPscore as a widely adopted image quality assessment metric. However, such a CLIP-based metric is vulnerable for its delicate multimodal alignment. In this work, we propose \textbf{FoCLIP}, a feature-space misalignment framework for fooling CLIP-based image quality metric. Based on the stochastic gradient descent technique, FoCLIP integrates three key components to construct fooling examples: feature alignment as the core module to reduce image-text modality gaps, the score distribution balance module and pixel-guard regularization, which collectively optimize multimodal output equilibrium between CLIPscore performance and image quality. Such a design can be engineered to maximize the CLIPscore predictions across diverse input prompts, despite exhibiting either visual unrecognizability or semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Digital Media Forensic Detection · Advanced Image Processing Techniques
