Informative Scene Graph Generation via Debiasing
Lianli Gao, Xinyu Lyu, Yuyu Guo, Yuxuan Hu, Yuan-Fang Li, Lu Xu, Heng, Tao Shen, Jingkuan Song

TL;DR
This paper introduces DB-SGG, a debiasing framework for scene graph generation that improves the prediction of informative predicates by addressing data imbalances, significantly enhancing performance across multiple datasets and tasks.
Contribution
The paper proposes a novel debiasing framework with Semantic Debiasing and Balanced Predicate Learning components, which outperforms existing models without relying on distribution fitting.
Findings
Outperforms Transformer by over 120% on mR@20 across sub-tasks.
Effective on multiple datasets including SGG-VG and SGG-GQA.
Improves downstream tasks like image captioning and sentence-to-graph retrieval.
Abstract
Scene graph generation aims to detect visual relationship triplets, (subject, predicate, object). Due to biases in data, current models tend to predict common predicates, e.g. "on" and "at", instead of informative ones, e.g. "standing on" and "looking at". This tendency results in the loss of precise information and overall performance. If a model only uses "stone on road" rather than "stone blocking road" to describe an image, it may be a grave misunderstanding. We argue that this phenomenon is caused by two imbalances: semantic space level imbalance and training sample level imbalance. For this problem, we propose DB-SGG, an effective framework based on debiasing but not the conventional distribution fitting. It integrates two components: Semantic Debiasing (SD) and Balanced Predicate Learning (BPL), for these imbalances. SD utilizes a confusion matrix and a bipartite graph to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Video Analysis and Summarization
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Layer Normalization · Label Smoothing · Adam · Residual Connection · Dense Connections · Dropout
