SG-Shuffle: Multi-aspect Shuffle Transformer for Scene Graph Generation
Anh Duc Bui, Soyeon Caren Han, Josiah Poon

TL;DR
SG-Shuffle introduces a multi-aspect shuffle transformer pipeline for scene graph generation, addressing bias and multi-perspective relationship prediction challenges with novel components and loss functions.
Contribution
It proposes a new pipeline with a parallel encoder, shuffle transformer, and weighted loss to improve relationship prediction in scene graphs.
Findings
Improved accuracy in scene graph generation.
Reduced bias toward common relationship labels.
Enhanced multi-perspective relationship modeling.
Abstract
Scene Graph Generation (SGG) serves a comprehensive representation of the images for human understanding as well as visual understanding tasks. Due to the long tail bias problem of the object and predicate labels in the available annotated data, the scene graph generated from current methodologies can be biased toward common, non-informative relationship labels. Relationship can sometimes be non-mutually exclusive, which can be described from multiple perspectives like geometrical relationships or semantic relationships, making it even more challenging to predict the most suitable relationship label. In this work, we proposed the SG-Shuffle pipeline for scene graph generation with 3 components: 1) Parallel Transformer Encoder, which learns to predict object relationships in a more exclusive manner by grouping relationship labels into groups of similar purpose; 2) Shuffle Transformer,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Image and Video Retrieval Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Linear Layer · Adam · Absolute Position Encodings · Layer Normalization
