Towards Overcoming False Positives in Visual Relationship Detection

Daisheng Jin; Xiao Ma; Chongzhi Zhang; Yizhuo Zhou; Jiashu Tao,; Mingyuan Zhang; Haiyu Zhao; Shuai Yi; Zhoujun Li; Xianglong Liu; Hongsheng Li

arXiv:2012.12510·cs.CV·December 25, 2020·5 cites

Towards Overcoming False Positives in Visual Relationship Detection

Daisheng Jin, Xiao Ma, Chongzhi Zhang, Yizhuo Zhou, Jiashu Tao,, Mingyuan Zhang, Haiyu Zhao, Shuai Yi, Zhoujun Li, Xianglong Liu, Hongsheng Li

PDF

Open Access

TL;DR

This paper introduces SABRA, a new VRD framework that reduces false positives by balanced negative sampling and enhanced spatial modeling, significantly improving performance on multiple datasets.

Contribution

SABRA employs a novel balanced negative proposal sampling strategy and advanced spatial modeling techniques to address false positives in VRD.

Findings

01

SABRA outperforms state-of-the-art methods on HOI and VRD datasets.

02

Balanced negative sampling reduces false positives effectively.

03

Enhanced spatial modeling improves detection accuracy.

Abstract

In this paper, we investigate the cause of the high false positive rate in Visual Relationship Detection (VRD). We observe that during training, the relationship proposal distribution is highly imbalanced: most of the negative relationship proposals are easy to identify, e.g., the inaccurate object detection, which leads to the under-fitting of low-frequency difficult proposals. This paper presents Spatially-Aware Balanced negative pRoposal sAmpling (SABRA), a robust VRD framework that alleviates the influence of false positives. To effectively optimize the model under imbalanced distribution, SABRA adopts Balanced Negative Proposal Sampling (BNPS) strategy for mini-batch sampling. BNPS divides proposals into 5 well defined sub-classes and generates a balanced training distribution according to the inverse frequency. BNPS gives an easier optimization landscape and significantly reduces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition