BGT-Net: Bidirectional GRU Transformer Network for Scene Graph Generation
Naina Dhingra, Florian Ritter, Andreas Kunz

TL;DR
This paper introduces BGT-Net, a novel scene graph generation model combining bidirectional GRU and transformer encoders to improve object and relationship prediction in images, addressing dataset bias and outperforming existing methods.
Contribution
The paper presents a new BGT-Net model that integrates BiGRU and transformer encoders for enhanced scene graph generation, including a bias regulation technique for long-tailed data.
Findings
Outperforms state-of-the-art on Visual Genome, Open-Images, and VRD datasets.
Effective bias regulation for long-tailed relationship distributions.
Improved object and relationship prediction accuracy.
Abstract
Scene graphs are nodes and edges consisting of objects and object-object relationships, respectively. Scene graph generation (SGG) aims to identify the objects and their relationships. We propose a bidirectional GRU (BiGRU) transformer network (BGT-Net) for the scene graph generation for images. This model implements novel object-object communication to enhance the object information using a BiGRU layer. Thus, the information of all objects in the image is available for the other objects, which can be leveraged later in the object prediction step. This object information is used in a transformer encoder to predict the object class as well as to create object-specific edge information via the use of another transformer encoder. To handle the dataset bias induced by the long-tailed relationship distribution, softening with a log-softmax function and adding a bias adaptation term to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsBidirectional GRU · Gated Recurrent Unit
