Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased   Scene Graph Generation

Xingning Dong; Tian Gan; Xuemeng Song; Jianlong Wu; Yuan Cheng,; Liqiang Nie

arXiv:2203.09811·cs.CV·April 5, 2022·6 cites

Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation

Xingning Dong, Tian Gan, Xuemeng Song, Jianlong Wu, Yuan Cheng,, Liqiang Nie

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces a novel hybrid-attention encoder and a group collaborative learning decoder to improve scene graph generation, effectively reducing bias and enhancing predicate prediction accuracy.

Contribution

It proposes a stacked hybrid-attention network for better modality fusion and a group collaborative learning strategy to address class imbalance in scene graph generation.

Findings

01

Achieved state-of-the-art unbiased metric performance on VG and GQA datasets.

02

Nearly doubled performance compared to baseline methods.

03

Effectively reduces bias in predicate prediction.

Abstract

Scene Graph Generation, which generally follows a regular encoder-decoder pipeline, aims to first encode the visual contents within the given image and then parse them into a compact summary graph. Existing SGG approaches generally not only neglect the insufficient modality fusion between vision and language, but also fail to provide informative predicates due to the biased relationship predictions, leading SGG far from practical. Towards this end, in this paper, we first present a novel Stacked Hybrid-Attention network, which facilitates the intra-modal refinement as well as the inter-modal interaction, to serve as the encoder. We then devise an innovative Group Collaborative Learning strategy to optimize the decoder. Particularly, based upon the observation that the recognition capability of one classifier is limited towards an extremely unbalanced dataset, we first deploy a group of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dongxingning/sha-gcl-for-sgg
pytorchOfficial

Datasets

maelic/GQA200-coco-format
dataset· 33 dl
33 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning