Salience-SGG: Enhancing Unbiased Scene Graph Generation with Iterative Salience Estimation
Runfeng Qu, Ole Hall, Pia K Bideau, Julie Ouerfelli-Ethier, Martin Rolfs, Klaus Obermayer, Olaf Hellwich

TL;DR
Salience-SGG introduces an iterative salience decoder that improves unbiased scene graph generation by focusing on salient spatial structures, enhancing spatial understanding and achieving state-of-the-art results.
Contribution
The paper presents a novel iterative salience decoder and semantic-agnostic salience labels to improve unbiased scene graph generation, especially in spatial understanding.
Findings
Achieves state-of-the-art performance on multiple datasets.
Improves spatial understanding in unbiased SGG methods.
Enhances pairwise localization average precision.
Abstract
Scene Graph Generation (SGG) suffers from a long-tailed distribution, where a few predicate classes dominate while many others are underrepresented, leading to biased models that underperform on rare relations. Unbiased-SGG methods address this issue by implementing debiasing strategies, but often at the cost of spatial understanding, resulting in an over-reliance on semantic priors. We introduce Salience-SGG, a novel framework featuring an Iterative Salience Decoder (ISD) that emphasizes triplets with salient spatial structures. To support this, we propose semantic-agnostic salience labels guiding ISD. Evaluations on Visual Genome, Open Images V6, and GQA-200 show that Salience-SGG achieves state-of-the-art performance and improves existing Unbiased-SGG methods in their spatial understanding as demonstrated by the Pairwise Localization Average Precision
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Visual Attention and Saliency Detection
