BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation
Peng Hao, Weilong Wang, Xiaobing Wang, Yingying Jiang, Hanchao Jia, Shaowei Cui, Junhang Wei, Xiaoshuai Hao

TL;DR
This paper introduces BCTR, a novel bidirectional transformer model for scene graph generation that enhances interaction between entities and predicates, leading to state-of-the-art results on major benchmarks.
Contribution
The paper proposes a bidirectional conditioning factorization and a new model, BCTR, with modules for mutual feature augmentation and regularization, improving scene graph generation performance.
Findings
BCTR achieves state-of-the-art results on Visual Genome.
The bidirectional conditioning improves interaction modeling.
Regularization with RFA enhances generalization to unseen relationships.
Abstract
Scene Graph Generation (SGG) remains a challenging task due to its compositional property. Previous approaches improve prediction efficiency through end-to-end learning. However, these methods exhibit limited performance as they assume unidirectional conditioning between entities and predicates, which restricts effective information interaction. To address this limitation, we propose a novel bidirectional conditioning factorization in a semantic-aligned space for SGG, enabling efficient and generalizable interaction between entities and predicates. Specifically, we introduce an end-to-end scene graph generation model, the Bidirectional Conditioning Transformer (BCTR), to implement this factorization. BCTR consists of two key modules. First, the Bidirectional Conditioning Generator (BCG) performs multi-stage interactive feature augmentation between entities and predicates, enabling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Data Visualization and Analytics · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Adam · Label Smoothing · Linear Layer · Byte Pair Encoding · Layer Normalization · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Dense Connections
