EGTR: Extracting Graph from Transformer for Scene Graph Generation
Jinbae Im, JeongYeon Nam, Nokyung Park, Hyungmin Lee, Seunghyun Park

TL;DR
This paper introduces EGTR, a lightweight one-stage scene graph generation model that extracts relation graphs from self-attention layers of DETR, utilizing relation smoothing and auxiliary connectivity prediction for improved performance.
Contribution
The paper proposes a novel method to extract relation graphs from self-attention in DETR, with relation smoothing and auxiliary tasks enhancing scene graph generation.
Findings
Effective relation graph extraction from self-attention layers.
Improved scene graph generation performance on Visual Genome and Open Image V6.
Efficient and lightweight model suitable for real-world applications.
Abstract
Scene Graph Generation (SGG) is a challenging task of detecting objects and predicting relationships between objects. After DETR was developed, one-stage SGG models based on a one-stage object detector have been actively studied. However, complex modeling is used to predict the relationship between objects, and the inherent relationship between object queries learned in the multi-head self-attention of the object detector has been neglected. We propose a lightweight one-stage SGG model that extracts the relation graph from the various relationships learned in the multi-head self-attention layers of the DETR decoder. By fully utilizing the self-attention by-products, the relation graph can be extracted effectively with a shallow relation extraction head. Considering the dependency of the relation extraction task on the object detection task, we propose a novel relation smoothing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Semantic Web and Ontologies · Topic Modeling
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Adam · Byte Pair Encoding · Feedforward Network · Absolute Position Encodings · Softmax · Convolution
