Visual Relationship Detection with Internal and External Linguistic   Knowledge Distillation

Ruichi Yu; Ang Li; Vlad I. Morariu; Larry S. Davis

arXiv:1707.09423·cs.CV·August 4, 2017·57 cites

Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation

Ruichi Yu, Ang Li, Vlad I. Morariu, Larry S. Davis

PDF

Open Access

TL;DR

This paper introduces a method that leverages internal and external linguistic knowledge distillation to improve visual relationship detection, especially for unseen relationships, by regularizing the learning process with linguistic statistics.

Contribution

It proposes a novel approach that distills linguistic knowledge from annotations and external text sources into a visual model to enhance generalization and zero-shot prediction capabilities.

Findings

01

Significant improvement in zero-shot recall on VRD dataset.

02

Outperforms state-of-the-art methods in visual relationship detection.

03

Effective use of linguistic knowledge from Wikipedia and annotations.

Abstract

Understanding visual relationships involves identifying the subject, the object, and a predicate relating them. We leverage the strong correlations between the predicate and the (subj,obj) pair (both semantically and spatially) to predict the predicates conditioned on the subjects and the objects. Modeling the three entities jointly more accurately reflects their relationships, but complicates learning since the semantic space of visual relationships is huge and the training data is limited, especially for the long-tail relationships that have few instances. To overcome this, we use knowledge of linguistic statistics to regularize visual model learning. We obtain linguistic knowledge by mining from both training annotations (internal knowledge) and publicly available text, e.g., Wikipedia (external knowledge), computing the conditional probability distribution of a predicate given a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning