Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

Haojun Jiang; Yuanze Lin; Dongchen Han; Shiji Song; Gao Huang

arXiv:2203.08481·cs.CV·November 18, 2022·1 cites

Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

Haojun Jiang, Yuanze Lin, Dongchen Han, Shiji Song, Gao Huang

PDF

Open Access 1 Repo

TL;DR

Pseudo-Q is a novel approach that automatically generates pseudo language queries from unlabeled images to train visual grounding models, significantly reducing annotation costs while maintaining high performance.

Contribution

The paper introduces Pseudo-Q, a method that creates pseudo language queries for visual grounding, eliminating the need for manual annotations and improving weakly-supervised learning.

Findings

01

Reduces human annotation costs by 31% on RefCOCO.

02

Achieves superior or comparable performance to state-of-the-art methods.

03

Effective in all five datasets tested.

Abstract

Visual grounding, i.e., localizing objects in images according to natural language queries, is an important topic in visual language understanding. The most effective approaches for this task are based on deep learning, which generally require expensive manually labeled image-query or patch-query pairs. To eliminate the heavy dependence on human annotations, we present a novel method, named Pseudo-Q, to automatically generate pseudo language queries for supervised training. Our method leverages an off-the-shelf object detector to identify visual objects from unlabeled images, and then language queries for these objects are obtained in an unsupervised fashion with a pseudo-query generation module. Then, we design a task-related query prompt module to specifically tailor generated pseudo language queries for visual grounding tasks. Further, in order to fully capture the contextual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leaplabthu/pseudo-q
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques