Visual Relationship Detection with Low Rank Non-Negative Tensor Decomposition
Mohammed Haroon Dupty, Zhen Zhang, Wee Sun Lee

TL;DR
This paper introduces a novel low rank non-negative tensor decomposition approach to model the joint distribution of object relationships in images, capturing multimodality and improving visual relationship detection accuracy.
Contribution
It proposes a new tensor decomposition method for learning multimodal joint distributions of triplets, enhancing VRD performance over existing models.
Findings
Outperforms state-of-the-art on Visual Genome and VRD datasets.
Effectively captures multimodal relationships with tensor decomposition.
Improves accuracy by modeling joint distributions and priors.
Abstract
We address the problem of Visual Relationship Detection (VRD) which aims to describe the relationships between pairs of objects in the form of triplets of (subject, predicate, object). We observe that given a pair of bounding box proposals, objects often participate in multiple relations implying the distribution of triplets is multimodal. We leverage the strong correlations within triplets to learn the joint distribution of triplet variables conditioned on the image and the bounding box proposals, doing away with the hitherto used independent distribution of triplets. To make learning the triplet joint distribution feasible, we introduce a novel technique of learning conditional triplet distributions in the form of their normalized low rank non-negative tensor decompositions. Normalized tensor decompositions take form of mixture distributions of discrete variables and thus are able to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
