FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional   Referring Expression Comprehension

Junzhuo Liu; Xuzheng Yang; Weiwei Li; Peng Wang

arXiv:2409.14750·cs.CV·January 14, 2025

FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension

Junzhuo Liu, Xuzheng Yang, Weiwei Li, Peng Wang

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

This paper introduces FineCops-Ref, a challenging new dataset for Referring Expression Comprehension that includes multi-level reasoning and negative samples, aiming to improve multi-modal understanding and grounding in AI models.

Contribution

The paper presents a novel dataset with controllable difficulty levels and negative samples, specifically designed to evaluate and enhance fine-grained multi-modal reasoning in REC tasks.

Findings

01

Significant performance gap in current models' grounding abilities

02

Dataset enables testing of multi-hop and attribute-based reasoning

03

Negative samples challenge models to reject incorrect references

Abstract

Referring Expression Comprehension (REC) is a crucial cross-modal task that objectively evaluates the capabilities of language understanding, image comprehension, and language-to-image grounding. Consequently, it serves as an ideal testing ground for Multi-modal Large Language Models (MLLMs). In pursuit of this goal, we have established a new REC dataset characterized by two key features: Firstly, it is designed with controllable varying levels of difficulty, necessitating multi-level fine-grained reasoning across object categories, attributes, and multi-hop relationships. Secondly, it includes negative text and images created through fine-grained editing and generation based on existing data, thereby testing the model's ability to correctly reject scenarios where the target object is not visible in the image--an essential aspect often overlooked in existing datasets and approaches.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liujunzhuo/FineCops-Ref
pytorchOfficial

Models

🤗
linhuixiao/Awesome-Visual-Grounding
model· ♡ 1
♡ 1

Videos

FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension· underline

Taxonomy

TopicsTopic Modeling · Machine Learning in Materials Science