Learning to Generate Scene Graph from Natural Language Supervision

Yiwu Zhong; Jing Shi; Jianwei Yang; Chenliang Xu; Yin Li

arXiv:2109.02227·cs.CV·September 7, 2021

Learning to Generate Scene Graph from Natural Language Supervision

Yiwu Zhong, Jing Shi, Jianwei Yang, Chenliang Xu, Yin Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel method for generating scene graphs from images using natural language supervision, leveraging object detection and transformer models to improve accuracy and enable open-vocabulary scene graph generation.

Contribution

It presents one of the first approaches to learn scene graph generation from image-sentence pairs without relying on human-annotated scene graphs, achieving significant performance gains.

Findings

01

30% relative improvement over previous methods

02

Effective weakly and fully supervised scene graph generation

03

First open-set scene graph generation results

Abstract

Learning from image-text data has demonstrated recent success for many recognition tasks, yet is currently limited to visual features or individual visual concepts such as objects. In this paper, we propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph. To bridge the gap between images and texts, we leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph. Further, we design a Transformer-based model to predict these "pseudo" labels via a masked token prediction task. Learning from only image-sentence pairs, our model achieves 30% relative gain over a latest method trained with human-annotated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yiwuzhong/sgg_from_nls
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques