Teach CLIP to Develop a Number Sense for Ordinal Regression
Yao Du, Qiang Zhai, Weihang Dai, Xiaomeng Li

TL;DR
This paper introduces NumCLIP, a method to enhance CLIP's ability for ordinal regression by discretizing numerical concepts and applying a novel ranking loss, improving performance on various tasks.
Contribution
The paper proposes NumCLIP, a simple yet effective approach that improves CLIP's understanding of number concepts for ordinal regression through discretization and a new ranking loss.
Findings
10% accuracy improvement on historical image dating
3.83% accuracy improvement on image aesthetics assessment
Effective generalization to multiple ordinal regression tasks
Abstract
Ordinal regression is a fundamental problem within the field of computer vision, with customised well-trained models on specific tasks. While pre-trained vision-language models (VLMs) have exhibited impressive performance on various vision tasks, their potential for ordinal regression has received less exploration. In this study, we first investigate CLIP's potential for ordinal regression, from which we expect the model could generalise to different ordinal regression tasks and scenarios. Unfortunately, vanilla CLIP fails on this task, since current VLMs have a well-documented limitation of encapsulating compositional concepts such as number sense. We propose a simple yet effective method called NumCLIP to improve the quantitative understanding of VLMs. We disassemble the exact image to number-specific text matching problem into coarse classification and fine prediction stages. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms · Neural Networks and Applications
MethodsContrastive Language-Image Pre-training
