Segmentation-grounded Scene Graph Generation

Siddhesh Khandelwal; Mohammed Suhail; Leonid Sigal

arXiv:2104.14207·cs.CV·April 30, 2021

Segmentation-grounded Scene Graph Generation

Siddhesh Khandelwal, Mohammed Suhail, Leonid Sigal

PDF

TL;DR

This paper introduces a novel framework for pixel-level scene graph generation that integrates segmentation masks and relation grounding, enhancing the granularity and accuracy of scene understanding in images.

Contribution

It presents the first segmentation-grounded scene graph generation framework that is agnostic to underlying methods and leverages transfer and multi-task learning from auxiliary datasets.

Findings

01

Improved relation prediction accuracy.

02

Effective transfer learning from MS COCO.

03

End-to-end trainable multi-task framework.

Abstract

Scene graph generation has emerged as an important problem in computer vision. While scene graphs provide a grounded representation of objects, their locations and relations in an image, they do so only at the granularity of proposal bounding boxes. In this work, we propose the first, to our knowledge, framework for pixel-level segmentation-grounded scene graph generation. Our framework is agnostic to the underlying scene graph generation method and address the lack of segmentation annotations in target scene graph datasets (e.g., Visual Genome) through transfer and multi-task learning from, and with, an auxiliary dataset (e.g., MS COCO). Specifically, each target object being detected is endowed with a segmentation mask, which is expressed as a lingual-similarity weighted linear combination over categories that have annotations present in an auxiliary dataset. These inferred masks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.