Mapping Images to Scene Graphs with Permutation-Invariant Structured   Prediction

Roei Herzig; Moshiko Raboh; Gal Chechik; Jonathan Berant; Amir; Globerson

arXiv:1802.05451·stat.ML·November 5, 2018·68 cites

Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction

Roei Herzig, Moshiko Raboh, Gal Chechik, Jonathan Berant, Amir, Globerson

PDF

Open Access 1 Repo

TL;DR

This paper introduces a permutation-invariant structured prediction model for mapping images to scene graphs, leveraging deep learning to better interpret complex visual scenes and achieve state-of-the-art results.

Contribution

It proposes a novel design principle based on permutation invariance for structured prediction models in image understanding tasks.

Findings

01

Achieves new state-of-the-art on Visual Genome scene graph labeling

02

Proves a necessary and sufficient condition for permutation-invariant architectures

03

Outperforms recent approaches in scene graph prediction

Abstract

Machine understanding of complex images is a key goal of artificial intelligence. One challenge underlying this task is that visual scenes contain multiple inter-related objects, and that global context plays an important role in interpreting the scene. A natural modeling framework for capturing such effects is structured prediction, which optimizes over complex labels, while modeling within-label interactions. However, it is unclear what principles should guide the design of a structured prediction model that utilizes the power of deep learning components. Here we propose a design principle for such architectures that follows from a natural requirement of permutation invariance. We prove a necessary and sufficient characterization for architectures that follow this invariance, and discuss its implication on model design. Finally, we show that the resulting model achieves new state of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shikorab/SceneGraph
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization