How well does CLIP understand texture?
Chenyun Wu, Subhransu Maji

TL;DR
This paper evaluates CLIP's understanding of texture in natural images through zero-shot classification, compositional property representation, and fine-grained categorization, revealing its strengths and limitations in texture comprehension.
Contribution
It provides a comprehensive analysis of CLIP's ability to understand and utilize texture information in natural language descriptions and image classification tasks.
Findings
CLIP performs well on zero-shot texture classification tasks.
CLIP can represent compositional texture properties like color and pattern.
Texture information can aid fine-grained bird species categorization.
Abstract
We investigate how well CLIP understands texture in natural images described by natural language. To this end, we analyze CLIP's ability to: (1) perform zero-shot learning on various texture and material classification datasets; (2) represent compositional properties of texture such as red dots or yellow stripes on the Describable Texture in Detail(DTDD) dataset; and (3) aid fine-grained categorization of birds in photographs described by color and texture of their body parts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
MethodsContrastive Language-Image Pre-training
