Improving Contrastive Learning for Referring Expression Counting

Kostas Triaridis; Panagiotis Kaliosis; E-Ro Nguyen; Jingyi Xu; Hieu Le; Dimitris Samaras

arXiv:2505.22850·cs.CV·May 30, 2025

Improving Contrastive Learning for Referring Expression Counting

Kostas Triaridis, Panagiotis Kaliosis, E-Ro Nguyen, Jingyi Xu, Hieu Le, Dimitris Samaras

PDF

1 Repo

TL;DR

C-REX is a contrastive learning framework that improves referring expression counting by enhancing discriminative features within images, leading to state-of-the-art accuracy and robustness in counting objects based on fine-grained attributes.

Contribution

The paper introduces C-REX, a novel contrastive learning method operating in image space that outperforms previous approaches in REC and generalizes to class-agnostic counting.

Findings

01

C-REX achieves over 22% improvement in MAE over previous methods.

02

The framework demonstrates strong performance in class-agnostic counting tasks.

03

Operating entirely within image space improves stability and robustness.

Abstract

Object counting has progressed from class-specific models, which count only known categories, to class-agnostic models that generalize to unseen categories. The next challenge is Referring Expression Counting (REC), where the goal is to count objects based on fine-grained attributes and contextual differences. Existing methods struggle with distinguishing visually similar objects that belong to the same category but correspond to different referring expressions. To address this, we propose C-REX, a novel contrastive learning framework, based on supervised contrastive learning, designed to enhance discriminative representation learning. Unlike prior works, C-REX operates entirely within the image space, avoiding the misalignment issues of image-text contrastive learning, thus providing a more stable contrastive signal. It also guarantees a significantly larger pool of negative samples,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cvlab-stonybrook/c-rex
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Learning · Masked autoencoder