TL;DR
This paper introduces a novel visual relationship detection framework that leverages relative location information of object pairs throughout the process, enhancing predicate recognition and overall accuracy.
Contribution
The work proposes a new framework that deeply mines and utilizes relative location data, including a GGNN for better predicate relevance measurement, improving detection performance.
Findings
Significant performance improvement on VRD and VG datasets.
Enhanced clustering of spatially similar predicates.
Increased top n recall accuracy.
Abstract
Visual relationship detection, as a challenging task used to find and distinguish the interactions between object pairs in one image, has received much attention recently. In this work, we propose a novel visual relationship detection framework by deeply mining and utilizing relative location of object-pair in every stage of the procedure. In both the stages, relative location information of each object-pair is abstracted and encoded as auxiliary feature to improve the distinguishing capability of object-pairs proposing and predicate recognition, respectively; Moreover, one Gated Graph Neural Network(GGNN) is introduced to mine and measure the relevance of predicates using relative location. With the location-based GGNN, those non-exclusive predicates with similar spatial position can be clustered firstly and then be smoothed with close classification scores, thus the accuracy of top…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGated Graph Sequence Neural Networks
