Towards Unifying Reference Expression Generation and Comprehension

Duo Zheng; Tao Kong; Ya Jing; Jiaan Wang; Xiaojie Wang

arXiv:2210.13076·cs.CV·October 25, 2022

Towards Unifying Reference Expression Generation and Comprehension

Duo Zheng, Tao Kong, Ya Jing, Jiaan Wang, Xiaojie Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces UniRef, a unified model for reference expression generation and comprehension that leverages a novel fusion layer and joint pre-training to improve performance on both tasks.

Contribution

The paper presents UniRef, a novel unified model with a specialized fusion layer and joint pre-training strategies for REG and REC tasks, addressing their interrelated challenges.

Findings

01

Outperforms previous state-of-the-art on REG and REC tasks

02

Effective fusion of image, region, and text improves task performance

03

Joint pre-training enhances the shared representation quality

Abstract

Reference Expression Generation (REG) and Comprehension (REC) are two highly correlated tasks. Modeling REG and REC simultaneously for utilizing the relation between them is a promising way to improve both. However, the problem of distinct inputs, as well as building connections between them in a single model, brings challenges to the design and training of the joint model. To address the problems, we propose a unified model for REG and REC, named UniRef. It unifies these two tasks with the carefully-designed Image-Region-Text Fusion layer (IRTF), which fuses the image, region and text via the image cross-attention and region cross-attention. Additionally, IRTF could generate pseudo input regions for the REC task to enable a uniform way for sharing the identical representation space across the REC and REG. We further propose Vision-conditioned Masked Language Modeling (VMLM) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zd11024/uniref
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling