SGFormer: Semantic Graph Transformer for Point Cloud-based 3D Scene   Graph Generation

Changsheng Lv; Mengshi Qi; Xia Li; Zhengyuan Yang; Huadong Ma

arXiv:2303.11048·cs.CV·December 21, 2023·1 cites

SGFormer: Semantic Graph Transformer for Point Cloud-based 3D Scene Graph Generation

Changsheng Lv, Mengshi Qi, Xia Li, Zhengyuan Yang, Huadong Ma

PDF

Open Access 1 Repo 1 Video

TL;DR

SGFormer is a novel Transformer-based model for 3D scene graph generation from point clouds, effectively capturing global structure and integrating linguistic knowledge, significantly outperforming previous GCN-based methods.

Contribution

Introduces SGFormer with graph embedding and semantic injection layers, enabling global information passing and linguistic knowledge integration for improved 3D scene graph generation.

Findings

01

40.94% improvement in relationship prediction R@50

02

88.36% boost on complex scenes subset

03

Superior performance in long-tail and zero-shot scenarios

Abstract

In this paper, we propose a novel model called SGFormer, Semantic Graph TransFormer for point cloud-based 3D scene graph generation. The task aims to parse a point cloud-based scene into a semantic structural graph, with the core challenge of modeling the complex global structure. Existing methods based on graph convolutional networks (GCNs) suffer from the over-smoothing dilemma and can only propagate information from limited neighboring nodes. In contrast, SGFormer uses Transformer layers as the base building block to allow global information passing, with two types of newly-designed layers tailored for the 3D scene graph generation task. Specifically, we introduce the graph embedding layer to best utilize the global information in graph edges while maintaining comparable computation costs. Furthermore, we propose the semantic injection layer to leverage linguistic knowledge from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

andy20178/sgformer
pytorchOfficial

Videos

SGFormer: Semantic Graph Transformer for Point Cloud-Based 3D Scene Graph Generation· underline

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · 3D Shape Modeling and Analysis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Adam · Softmax · Label Smoothing · Byte Pair Encoding · Residual Connection