Addressing Class Imbalance in Scene Graph Parsing by Learning to Contrast and Score
He Huang, Shunta Saito, Yuta Kikuchi, Eiichi Matsumoto, Wei Tang,, Philip S. Yu

TL;DR
This paper introduces a combined classification and ranking framework with a new loss and scoring module to improve detection of rare relations in scene graph parsing, addressing class imbalance issues.
Contribution
It proposes a novel Contrasting Cross-Entropy loss and a Scorer module to enhance rare relation detection in scene graph models, improving state-of-the-art performance.
Findings
Improved detection of rare relations in scene graphs.
Enhanced recall and overall accuracy on benchmarks.
Effective integration into existing models.
Abstract
Scene graph parsing aims to detect objects in an image scene and recognize their relations. Recent approaches have achieved high average scores on some popular benchmarks, but fail in detecting rare relations, as the highly long-tailed distribution of data biases the learning towards frequent labels. Motivated by the fact that detecting these rare relations can be critical in real-world applications, this paper introduces a novel integrated framework of classification and ranking to resolve the class imbalance problem in scene graph parsing. Specifically, we design a new Contrasting Cross-Entropy loss, which promotes the detection of rare relations by suppressing incorrect frequent ones. Furthermore, we propose a novel scoring module, termed as Scorer, which learns to rank the relations based on the image features and relation features to improve the recall of predictions. Our framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
