DetermiNet: A Large-Scale Diagnostic Dataset for Complex   Visually-Grounded Referencing using Determiners

Clarence Lee; M Ganesh Kumar; Cheston Tan

arXiv:2309.03483·cs.CV·September 8, 2023

DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners

Clarence Lee, M Ganesh Kumar, Cheston Tan

PDF

Open Access 1 Repo

TL;DR

DetermiNet introduces a large-scale dataset with 250,000 synthetic images and captions focused on determiners to evaluate and improve models' understanding of object referencing and quantification in natural language.

Contribution

The paper presents DetermiNet, a novel dataset emphasizing determiners in visual grounding, addressing a gap in existing datasets that focus less on this aspect.

Findings

01

Current models perform poorly on determiner-based referencing tasks.

02

DetermiNet reveals limitations of existing visual grounding models.

03

The dataset enables targeted research on reference and quantification understanding.

Abstract

State-of-the-art visual grounding models can achieve high detection accuracy, but they are not designed to distinguish between all objects versus only certain objects of interest. In natural language, in order to specify a particular object or set of objects of interest, humans use determiners such as "my", "either" and "those". Determiners, as an important word class, are a type of schema in natural language about the reference or quantity of the noun. Existing grounded referencing datasets place much less emphasis on determiners, compared to other word classes such as nouns, verbs and adjectives. This makes it difficult to develop models that understand the full variety and complexity of object referencing. Thus, we have developed and released the DetermiNet dataset , which comprises 250,000 synthetically generated images and captions based on 25 determiners. The task is to predict…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

clarence-lee-sheng/determinet
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition