Harlequin: Color-driven Generation of Synthetic Data for Referring   Expression Comprehension

Luca Parolari; Elena Izzo; Lamberto Ballan

arXiv:2411.14807·cs.CV·November 25, 2024

Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension

Luca Parolari, Elena Izzo, Lamberto Ballan

PDF

Open Access

TL;DR

This paper introduces Harlequin, a synthetic data generation framework for Referring Expression Comprehension that creates large-scale, annotated datasets to improve model training without manual labeling.

Contribution

It presents a novel image synthesis pipeline that generates a large artificial dataset for REC, enhancing training and performance of deep learning models.

Findings

01

Pre-training on Harlequin improves REC model performance.

02

Harlequin dataset contains over 1 million queries.

03

Synthetic data reduces reliance on manual annotations.

Abstract

Referring Expression Comprehension (REC) aims to identify a particular object in a scene by a natural language expression, and is an important topic in visual language understanding. State-of-the-art methods for this task are based on deep learning, which generally requires expensive and manually labeled annotations. Some works tackle the problem with limited-supervision learning or relying on Large Vision and Language Models. However, the development of techniques to synthesize labeled data is overlooked. In this paper, we propose a novel framework that generates artificial data for the REC task, taking into account both textual and visual modalities. At first, our pipeline processes existing data to create variations in the annotations. Then, it generates an image using altered annotations as guidance. The result of this pipeline is a new dataset, called Harlequin, made by more than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsColor perception and design