Segmentation-grounded Scene Graph Generation
Siddhesh Khandelwal, Mohammed Suhail, Leonid Sigal

TL;DR
This paper introduces a novel framework for pixel-level scene graph generation that integrates segmentation masks and relation grounding, enhancing the granularity and accuracy of scene understanding in images.
Contribution
It presents the first segmentation-grounded scene graph generation framework that is agnostic to underlying methods and leverages transfer and multi-task learning from auxiliary datasets.
Findings
Improved relation prediction accuracy.
Effective transfer learning from MS COCO.
End-to-end trainable multi-task framework.
Abstract
Scene graph generation has emerged as an important problem in computer vision. While scene graphs provide a grounded representation of objects, their locations and relations in an image, they do so only at the granularity of proposal bounding boxes. In this work, we propose the first, to our knowledge, framework for pixel-level segmentation-grounded scene graph generation. Our framework is agnostic to the underlying scene graph generation method and address the lack of segmentation annotations in target scene graph datasets (e.g., Visual Genome) through transfer and multi-task learning from, and with, an auxiliary dataset (e.g., MS COCO). Specifically, each target object being detected is endowed with a segmentation mask, which is expressed as a lingual-similarity weighted linear combination over categories that have annotations present in an auxiliary dataset. These inferred masks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
