SGTR: End-to-end Scene Graph Generation with Transformer

Rongjie Li; Songyang Zhang; Xuming He

arXiv:2112.12970·cs.CV·April 1, 2022·1 cites

SGTR: End-to-end Scene Graph Generation with Transformer

Rongjie Li, Songyang Zhang, Xuming He

PDF

Open Access 1 Repo

TL;DR

This paper introduces SGTR, a transformer-based end-to-end method for scene graph generation that formulates the task as bipartite graph construction, achieving state-of-the-art results with higher efficiency.

Contribution

The paper proposes a novel transformer-based framework for scene graph generation, including entity-aware predicate representation and a graph assembling module for end-to-end inference.

Findings

01

Achieves state-of-the-art or comparable performance on benchmarks.

02

Surpasses most existing approaches in accuracy.

03

Offers higher inference efficiency.

Abstract

Scene Graph Generation (SGG) remains a challenging visual understanding task due to its compositional property. Most previous works adopt a bottom-up two-stage or a point-based one-stage approach, which often suffers from high time complexity or sub-optimal designs. In this work, we propose a novel SGG method to address the aforementioned issues, formulating the task as a bipartite graph construction problem. To solve the problem, we develop a transformer-based end-to-end framework that first generates the entity and predicate proposal set, followed by inferring directed edges to form the relation triplets. In particular, we develop a new entity-aware predicate representation based on a structural predicate generator that leverages the compositional property of relationships. Moreover, we design a graph assembling module to infer the connectivity of the bipartite scene graph based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

scarecrow0/sgtr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition