D4C: Data-Free Quantization for Contrastive Language-Image Pre-training Models
Wenlun Zhang, Yunshan Zhong, Zihao Ding, Xinyu Li, Kentaro Yoshioka

TL;DR
D4C introduces a data-free quantization framework for CLIP models, synthesizing semantically rich and diverse images to enable effective model compression without real data.
Contribution
This work is the first to adapt data-free quantization specifically for CLIP, addressing semantic content and diversity challenges with novel synthesis techniques.
Findings
D4C significantly improves quantization performance on CLIP models.
Synthesized images with D4C are both semantically meaningful and structurally diverse.
Extensive experiments demonstrate D4C's effectiveness across different models and bit-widths.
Abstract
Data-Free Quantization (DFQ) offers a practical solution for model compression without requiring access to real data, making it particularly attractive in privacy-sensitive scenarios. While DFQ has shown promise for unimodal models, its extension to Vision-Language Models such as Contrastive Language-Image Pre-training (CLIP) models remains underexplored. In this work, we reveal that directly applying existing DFQ techniques to CLIP results in substantial performance degradation due to two key limitations: insufficient semantic content and low intra-image diversity in synthesized samples. To tackle these challenges, we propose D4C, the first DFQ framework tailored for CLIP. D4C synthesizes semantically rich and structurally diverse pseudo images through three key components: 1) Prompt-Guided Semantic Injection aligns generated images with real-world semantics using text prompts; 2)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
