SGFormer: Semantic Graph Transformer for Point Cloud-based 3D Scene Graph Generation
Changsheng Lv, Mengshi Qi, Xia Li, Zhengyuan Yang, Huadong Ma

TL;DR
SGFormer is a novel Transformer-based model for 3D scene graph generation from point clouds, effectively capturing global structure and integrating linguistic knowledge, significantly outperforming previous GCN-based methods.
Contribution
Introduces SGFormer with graph embedding and semantic injection layers, enabling global information passing and linguistic knowledge integration for improved 3D scene graph generation.
Findings
40.94% improvement in relationship prediction R@50
88.36% boost on complex scenes subset
Superior performance in long-tail and zero-shot scenarios
Abstract
In this paper, we propose a novel model called SGFormer, Semantic Graph TransFormer for point cloud-based 3D scene graph generation. The task aims to parse a point cloud-based scene into a semantic structural graph, with the core challenge of modeling the complex global structure. Existing methods based on graph convolutional networks (GCNs) suffer from the over-smoothing dilemma and can only propagate information from limited neighboring nodes. In contrast, SGFormer uses Transformer layers as the base building block to allow global information passing, with two types of newly-designed layers tailored for the 3D scene graph generation task. Specifically, we introduce the graph embedding layer to best utilize the global information in graph edges while maintaining comparable computation costs. Furthermore, we propose the semantic injection layer to leverage linguistic knowledge from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · 3D Shape Modeling and Analysis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Adam · Softmax · Label Smoothing · Byte Pair Encoding · Residual Connection
