TL;DR
C-REX is a contrastive learning framework that improves referring expression counting by enhancing discriminative features within images, leading to state-of-the-art accuracy and robustness in counting objects based on fine-grained attributes.
Contribution
The paper introduces C-REX, a novel contrastive learning method operating in image space that outperforms previous approaches in REC and generalizes to class-agnostic counting.
Findings
C-REX achieves over 22% improvement in MAE over previous methods.
The framework demonstrates strong performance in class-agnostic counting tasks.
Operating entirely within image space improves stability and robustness.
Abstract
Object counting has progressed from class-specific models, which count only known categories, to class-agnostic models that generalize to unseen categories. The next challenge is Referring Expression Counting (REC), where the goal is to count objects based on fine-grained attributes and contextual differences. Existing methods struggle with distinguishing visually similar objects that belong to the same category but correspond to different referring expressions. To address this, we propose C-REX, a novel contrastive learning framework, based on supervised contrastive learning, designed to enhance discriminative representation learning. Unlike prior works, C-REX operates entirely within the image space, avoiding the misalignment issues of image-text contrastive learning, thus providing a more stable contrastive signal. It also guarantees a significantly larger pool of negative samples,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Learning · Masked autoencoder
