Natural Language Guided Visual Relationship Detection
Wentong Liao, Lin Shuai, Bodo Rosenhahn, Michael Ying Yang

TL;DR
This paper introduces a natural language guided framework for visual relationship detection, leveraging semantic information to improve understanding of object interactions in images, especially for unseen relationships.
Contribution
It proposes a bi-directional RNN-based method that incorporates natural language cues, achieving state-of-the-art results and better zero-shot relationship prediction.
Findings
Achieved state-of-the-art performance on VRD and Visual Genome datasets.
Significantly improved zero-shot relationship recall from 76.42% to 89.79%.
Effective use of natural language to generalize relationship detection.
Abstract
Reasoning about the relationships between object pairs in images is a crucial task for holistic scene understanding. Most of the existing works treat this task as a pure visual classification task: each type of relationship or phrase is classified as a relation category based on the extracted visual features. However, each kind of relationships has a wide variety of object combination and each pair of objects has diverse interactions. Obtaining sufficient training samples for all possible relationship categories is difficult and expensive. In this work, we propose a natural language guided framework to tackle this problem. We propose to use a generic bi-directional recurrent neural network to predict the semantic connection between the participating objects in the relationship from the aspect of natural language. The proposed simple method achieves the state-of-the-art on the Visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
