A Real-Time Cross-modality Correlation Filtering Method for Referring   Expression Comprehension

Yue Liao; Si Liu; Guanbin Li; Fei Wang; Yanjie Chen; Chen Qian; Bo Li

arXiv:1909.07072·cs.CV·April 28, 2020·21 cites

A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension

Yue Liao, Si Liu, Guanbin Li, Fei Wang, Yanjie Chen, Chen Qian, Bo Li

PDF

Open Access 1 Models 1 Video

TL;DR

This paper introduces RCCF, a real-time cross-modality correlation filtering approach for referring expression comprehension, achieving high speed and improved accuracy by reformulating the task as a correlation filtering process.

Contribution

The paper proposes a novel one-stage correlation filtering method that enables real-time inference without accuracy loss, differing from traditional multi-stage approaches.

Findings

01

Runs at 40 FPS, outperforming existing methods in speed.

02

Almost doubles the state-of-the-art performance on RefClef dataset.

03

Achieves leading results on multiple benchmarks including RefCOCO and RefCOCO+.

Abstract

Referring expression comprehension aims to localize the object instance described by a natural language expression. Current referring expression methods have achieved good performance. However, none of them is able to achieve real-time inference without accuracy drop. The reason for the relatively slow inference speed is that these methods artificially split the referring expression comprehension into two sequential stages including proposal generation and proposal ranking. It does not exactly conform to the habit of human cognition. To this end, we propose a novel Realtime Cross-modality Correlation Filtering method (RCCF). RCCF reformulates the referring expression comprehension as a correlation filtering process. The expression is first mapped from the language domain to the visual domain and then treated as a template (kernel) to perform correlation filtering on the image feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
linhuixiao/Awesome-Visual-Grounding
model· ♡ 1
♡ 1

Videos

A Real-Time Cross-Modality Correlation Filtering Method for Referring Expression Comprehension· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Heatmap