Scene Graph Generation with External Knowledge and Image Reconstruction
Jiuxiang Gu, Handong Zhao, Zhe Lin, Sheng Li, Jianfei Cai, and Mingyang Ling

TL;DR
This paper introduces a novel scene graph generation method that leverages external knowledge and image reconstruction to improve accuracy and robustness against dataset biases and noise, achieving state-of-the-art results.
Contribution
The proposed approach integrates external commonsense knowledge and image reconstruction loss to enhance scene graph generation, addressing dataset bias and annotation noise.
Findings
Achieves state-of-the-art performance on Visual Relationship Detection dataset.
Outperforms existing methods on Visual Genome dataset.
Improves generalizability through external knowledge integration.
Abstract
Scene graph generation has received growing attention with the advancements in image understanding tasks such as object detection, attributes and relationship prediction,~\etc. However, existing datasets are biased in terms of object and relationship labels, or often come with noisy and missing annotations, which makes the development of a reliable scene graph prediction model very challenging. In this paper, we propose a novel scene graph generation algorithm with external knowledge and image reconstruction loss to overcome these dataset issues. In particular, we extract commonsense knowledge from the external knowledge base to refine object and phrase features for improving generalizability in scene graph generation. To address the bias of noisy object annotations, we introduce an auxiliary image reconstruction path to regularize the scene graph generation network. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Advanced Graph Neural Networks
