GREx: Generalized Referring Expression Segmentation, Comprehension, and Generation
Henghui Ding, Chang Liu, Shuting He, Xudong Jiang, Yu-Gang Jiang

TL;DR
This paper introduces GREx, a set of new benchmarks and a large-scale dataset that extend referring expression tasks to multi-target, no-target, and single-target scenarios, enabling more realistic applications.
Contribution
The paper presents the first large-scale GREx dataset gRefCOCO supporting complex expressions and proposes ReLA, a baseline model that improves performance on these tasks.
Findings
ReLA achieves state-of-the-art results on GRES and GREC tasks.
gRefCOCO dataset includes multi-target, no-target, and single-target expressions.
GREx benchmarks reveal performance gaps in existing REx methods.
Abstract
Referring Expression Segmentation (RES) and Comprehension (REC) respectively segment and detect the object described by an expression, while Referring Expression Generation (REG) generates an expression for the selected object. Existing datasets and methods commonly support single-target expressions only, i.e., one expression refers to one object, not considering multi-target and no-target expressions. This greatly limits the real applications of REx (RES/REC/REG). This paper introduces three new benchmarks called Generalized Referring Expression Segmentation (GRES), Comprehension (GREC), and Generation (GREG), collectively denoted as GREx, which extend the classic REx to allow expressions to identify an arbitrary number of objects. We construct the first large-scale GREx dataset gRefCOCO that contains multi-target, no-target, and single-target expressions and their corresponding images…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
