Teach CLIP to Develop a Number Sense for Ordinal Regression

Yao Du; Qiang Zhai; Weihang Dai; Xiaomeng Li

arXiv:2408.03574·cs.CV·August 8, 2024

Teach CLIP to Develop a Number Sense for Ordinal Regression

Yao Du, Qiang Zhai, Weihang Dai, Xiaomeng Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces NumCLIP, a method to enhance CLIP's ability for ordinal regression by discretizing numerical concepts and applying a novel ranking loss, improving performance on various tasks.

Contribution

The paper proposes NumCLIP, a simple yet effective approach that improves CLIP's understanding of number concepts for ordinal regression through discretization and a new ranking loss.

Findings

01

10% accuracy improvement on historical image dating

02

3.83% accuracy improvement on image aesthetics assessment

03

Effective generalization to multiple ordinal regression tasks

Abstract

Ordinal regression is a fundamental problem within the field of computer vision, with customised well-trained models on specific tasks. While pre-trained vision-language models (VLMs) have exhibited impressive performance on various vision tasks, their potential for ordinal regression has received less exploration. In this study, we first investigate CLIP's potential for ordinal regression, from which we expect the model could generalise to different ordinal regression tasks and scenarios. Unfortunately, vanilla CLIP fails on this task, since current VLMs have a well-documented limitation of encapsulating compositional concepts such as number sense. We propose a simple yet effective method called NumCLIP to improve the quantitative understanding of VLMs. We disassemble the exact image to number-specific text matching problem into coarse classification and fine prediction stages. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xmed-lab/numclip
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical Methods and Algorithms · Neural Networks and Applications

MethodsContrastive Language-Image Pre-training