Improving Visual Relationship Detection using Semantic Modeling of Scene   Descriptions

Stephan Baier; Yunpu Ma; Volker Tresp

arXiv:1809.00204·cs.CL·September 10, 2018

Improving Visual Relationship Detection using Semantic Modeling of Scene Descriptions

Stephan Baier, Yunpu Ma, Volker Tresp

PDF

Open Access

TL;DR

This paper enhances visual relationship detection by combining semantic link prediction models with CNN-based object detection, significantly improving accuracy and generalization on complex scene datasets.

Contribution

It introduces a novel integration of semantic link prediction with visual models, enabling better detection and generalization of unseen scene triples.

Findings

01

Semantic modeling improves detection accuracy.

02

Link prediction generalizes to unseen triples.

03

Outperforms previous state-of-the-art methods.

Abstract

Structured scene descriptions of images are useful for the automatic processing and querying of large image databases. We show how the combination of a semantic and a visual statistical model can improve on the task of mapping images to their associated scene description. In this paper we consider scene descriptions which are represented as a set of triples (subject, predicate, object), where each triple consists of a pair of visual objects, which appear in the image, and the relationship between them (e.g. man-riding-elephant, man-wearing-hat). We combine a standard visual model for object detection, based on convolutional neural networks, with a latent variable model for link prediction. We apply multiple state-of-the-art link prediction methods and compare their capability for visual relationship detection. One of the main advantages of link prediction methods is that they can also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Advanced Graph Neural Networks