Visual Relationship Detection with Language Priors

Cewu Lu; Ranjay Krishna; Michael Bernstein; Li Fei-Fei

arXiv:1608.00187·cs.CV·August 2, 2016·112 cites

Visual Relationship Detection with Language Priors

Cewu Lu, Ranjay Krishna, Michael Bernstein, Li Fei-Fei

PDF

Open Access

TL;DR

This paper introduces a scalable visual relationship detection model that leverages language priors from word embeddings to predict numerous relationships in images, improving object localization and content-based image retrieval.

Contribution

The authors propose a novel approach combining object and predicate models with language priors, enabling scalable prediction of thousands of relationships with limited training data.

Findings

01

Outperforms previous models in relationship prediction accuracy

02

Can predict thousands of relationships using few examples

03

Enhances image retrieval through relationship understanding

Abstract

Visual relationships capture a wide variety of interactions between pairs of objects in images (e.g. "man riding bicycle" and "man pushing bicycle"). Consequently, the set of possible relationships is extremely large and it is difficult to obtain sufficient training examples for all possible relationships. Because of this limitation, previous work on visual relationship detection has concentrated on predicting only a handful of relationships. Though most relationships are infrequent, their objects (e.g. "man" and "bicycle") and predicates (e.g. "riding" and "pushing") independently occur more frequently. We propose a model that uses this insight to train visual models for objects and predicates individually and later combines them together to predict multiple relationships per image. We improve on prior work by leveraging language priors from semantic word embeddings to finetune the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning