Fine-Grained Predicates Learning for Scene Graph Generation
Xinyu Lyu, Lianli Gao, Yuyu Guo, Zhou Zhao, Hao Huang and, Heng Tao Shen, Jingkuan Song

TL;DR
This paper introduces Fine-Grained Predicates Learning (FGPL), a novel approach to improve scene graph generation by better distinguishing hard-to-separate predicates, significantly enhancing model performance on benchmark datasets.
Contribution
The paper proposes a new method with a Predicate Lattice and specialized loss functions to differentiate fine-grained predicates, outperforming existing models and state-of-the-art methods.
Findings
Boosts three benchmark models' mean recall by over 21%.
Outperforms state-of-the-art methods by up to 6.1% in mean recall.
Effectively distinguishes hard-to-separate predicates in scene graphs.
Abstract
The performance of current Scene Graph Generation models is severely hampered by some hard-to-distinguish predicates, e.g., "woman-on/standing on/walking on-beach" or "woman-near/looking at/in front of-child". While general SGG models are prone to predict head predicates and existing re-balancing strategies prefer tail categories, none of them can appropriately handle these hard-to-distinguish predicates. To tackle this issue, inspired by fine-grained image classification, which focuses on differentiating among hard-to-distinguish object classes, we propose a method named Fine-Grained Predicates Learning (FGPL) which aims at differentiating among hard-to-distinguish predicates for Scene Graph Generation task. Specifically, we first introduce a Predicate Lattice that helps SGG models to figure out fine-grained predicate pairs. Then, utilizing the Predicate Lattice, we propose a Category…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
