Leveraging Auxiliary Text for Deep Recognition of Unseen Visual Relationships
Gal Sadeh Kenigsfield, Ran El-Yaniv

TL;DR
This paper introduces a deep learning model that leverages auxiliary textual data to improve visual relationship detection, especially for unseen relationships, by integrating shared text-image representations.
Contribution
It is the first model to enable recognition of visual relationships absent in visual training data by utilizing auxiliary textual information from different sources.
Findings
Outperforms image-based text on unseen relationship recognition
Works better with book-originated text than image-originated text
Achieves comparable results to image-based models on seen relationships
Abstract
One of the most difficult tasks in scene understanding is recognizing interactions between objects in an image. This task is often called visual relationship detection (VRD). We consider the question of whether, given auxiliary textual data in addition to the standard visual data used for training VRD models, VRD performance can be improved. We present a new deep model that can leverage additional textual data. Our model relies on a shared text--image representation of subject-verb-object relationships appearing in the text, and object interactions in images. Our method is the first to enable recognition of visual relationships missing in the visual training data and appearing only in the auxiliary text. We test our approach on two different text sources: text originating in images and text originating in books. We test and validate our approach using two large-scale recognition tasks:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
MethodsTest
