Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP

Zeliang Zhang; Zhuo Liu; Mingqian Feng; Chenliang Xu

arXiv:2409.15035·cs.CV·September 24, 2024

Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP

Zeliang Zhang, Zhuo Liu, Mingqian Feng, Chenliang Xu

PDF

Open Access 1 Video

TL;DR

This paper empirically investigates the quantity bias in CLIP, revealing that CLIP embeddings have a bias towards certain quantities, which affects the reliability of downstream image generation and understanding tasks.

Contribution

The study provides a comprehensive evaluation of CLIP's understanding of quantity across text, image, and cross-modal contexts, highlighting a significant bias.

Findings

01

CLIP exhibits a measurable quantity bias in its embeddings.

02

Quantity bias impacts the accuracy of downstream tasks.

03

Experimental results demonstrate the bias's effect on image generation reliability.

Abstract

CLIP has demonstrated great versatility in adapting to various downstream tasks, such as image editing and generation, visual question answering, and video understanding. However, CLIP-based applications often suffer from misunderstandings regarding user intent, leading to discrepancies between the required number of objects and the actual outputs in image generation tasks. In this work, we empirically investigate the quantity bias in CLIP. By carefully designing different experimental settings and datasets, we comprehensively evaluate CLIP's understanding of quantity from text, image, and cross-modal perspectives. Our experimental results reveal a quantity bias in CLIP embeddings, impacting the reliability of downstream tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP· underline

Taxonomy

TopicsHistory and Developments in Astronomy · Astronomical Observations and Instrumentation · Astronomy and Astrophysical Research

MethodsContrastive Language-Image Pre-training