EGTR: Extracting Graph from Transformer for Scene Graph Generation

Jinbae Im; JeongYeon Nam; Nokyung Park; Hyungmin Lee; Seunghyun Park

arXiv:2404.02072·cs.CV·June 25, 2024·1 cites

EGTR: Extracting Graph from Transformer for Scene Graph Generation

Jinbae Im, JeongYeon Nam, Nokyung Park, Hyungmin Lee, Seunghyun Park

PDF

Open Access 1 Repo

TL;DR

This paper introduces EGTR, a lightweight one-stage scene graph generation model that extracts relation graphs from self-attention layers of DETR, utilizing relation smoothing and auxiliary connectivity prediction for improved performance.

Contribution

The paper proposes a novel method to extract relation graphs from self-attention in DETR, with relation smoothing and auxiliary tasks enhancing scene graph generation.

Findings

01

Effective relation graph extraction from self-attention layers.

02

Improved scene graph generation performance on Visual Genome and Open Image V6.

03

Efficient and lightweight model suitable for real-world applications.

Abstract

Scene Graph Generation (SGG) is a challenging task of detecting objects and predicting relationships between objects. After DETR was developed, one-stage SGG models based on a one-stage object detector have been actively studied. However, complex modeling is used to predict the relationship between objects, and the inherent relationship between object queries learned in the multi-head self-attention of the object detector has been neglected. We propose a lightweight one-stage SGG model that extracts the relation graph from the various relationships learned in the multi-head self-attention layers of the DETR decoder. By fully utilizing the self-attention by-products, the relation graph can be extracted effectively with a shallow relation extraction head. Considering the dependency of the relation extraction task on the object detection task, we propose a novel relation smoothing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

naver-ai/egtr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Semantic Web and Ontologies · Topic Modeling

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Adam · Byte Pair Encoding · Feedforward Network · Absolute Position Encodings · Softmax · Convolution