Leveraging Auxiliary Text for Deep Recognition of Unseen Visual   Relationships

Gal Sadeh Kenigsfield; Ran El-Yaniv

arXiv:1910.12324·cs.CV·October 29, 2019·1 cites

Leveraging Auxiliary Text for Deep Recognition of Unseen Visual Relationships

Gal Sadeh Kenigsfield, Ran El-Yaniv

PDF

Open Access

TL;DR

This paper introduces a deep learning model that leverages auxiliary textual data to improve visual relationship detection, especially for unseen relationships, by integrating shared text-image representations.

Contribution

It is the first model to enable recognition of visual relationships absent in visual training data by utilizing auxiliary textual information from different sources.

Findings

01

Outperforms image-based text on unseen relationship recognition

02

Works better with book-originated text than image-originated text

03

Achieves comparable results to image-based models on seen relationships

Abstract

One of the most difficult tasks in scene understanding is recognizing interactions between objects in an image. This task is often called visual relationship detection (VRD). We consider the question of whether, given auxiliary textual data in addition to the standard visual data used for training VRD models, VRD performance can be improved. We present a new deep model that can leverage additional textual data. Our model relies on a shared text--image representation of subject-verb-object relationships appearing in the text, and object interactions in images. Our method is the first to enable recognition of visual relationships missing in the visual training data and appearing only in the auxiliary text. We test our approach on two different text sources: text originating in images and text originating in books. We test and validate our approach using two large-scale recognition tasks:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning

MethodsTest