From General to Specific: Informative Scene Graph Generation via Balance   Adjustment

Yuyu Guo; Lianli Gao; Xuanhan Wang; Yuxuan Hu; Xing Xu; Xu Lu; Heng; Tao Shen; Jingkuan Song

arXiv:2108.13129·cs.CV·August 31, 2021

From General to Specific: Informative Scene Graph Generation via Balance Adjustment

Yuyu Guo, Lianli Gao, Xuanhan Wang, Yuxuan Hu, Xing Xu, Xu Lu, Heng, Tao Shen, Jingkuan Song

PDF

Open Access 1 Repo

TL;DR

This paper introduces BA-SGG, a framework that improves scene graph generation by addressing imbalance issues between informative and common predicates, significantly enhancing performance across multiple sub-tasks.

Contribution

The paper proposes a novel balance adjustment framework with semantic and training sample components, applicable to various models for better predicate informativeness.

Findings

01

Achieves 14.3% higher mean recall on Visual Genome

02

Effectively adjusts semantic and sample imbalance

03

Improves state-of-the-art scene graph generation performance

Abstract

The scene graph generation (SGG) task aims to detect visual relationship triplets, i.e., subject, predicate, object, in an image, providing a structural vision layout for scene understanding. However, current models are stuck in common predicates, e.g., "on" and "at", rather than informative ones, e.g., "standing on" and "looking at", resulting in the loss of precise information and overall performance. If a model only uses "stone on road" rather than "blocking" to describe an image, it is easy to misunderstand the scene. We argue that this phenomenon is caused by two key imbalances between informative predicates and common ones, i.e., semantic space level imbalance and training sample level imbalance. To tackle this problem, we propose BA-SGG, a simple yet effective SGG framework based on balance adjustment but not the conventional distribution fitting. It integrates two components:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhugekongkong/sgg-g2s
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Dropout · Layer Normalization · Dense Connections · Byte Pair Encoding · Softmax