TL;DR
This paper introduces a novel few-shot visual relationship co-localization method that learns to localize subject-object pairs connected by common predicates across images with minimal supervision, using a meta-learning based optimization framework.
Contribution
It proposes a new optimization framework for VRC that employs relationship embedding and meta-learning to handle few-shot scenarios and unseen predicates.
Findings
Achieves impressive co-localization performance on VrR-VG and VG-150 datasets.
Effectively learns relationship embeddings as translation vectors in a shared space.
Utilizes a greedy approximation algorithm for efficient solution inference.
Abstract
In this paper, given a small bag of images, each containing a common but latent predicate, we are interested in localizing visual subject-object pairs connected via the common predicate in each of the images. We refer to this novel problem as visual relationship co-localization or VRC as an abbreviation. VRC is a challenging task, even more so than the well-studied object co-localization task. This becomes further challenging when using just a few images, the model has to learn to co-localize visual subject-object pairs connected via unseen predicates. To solve VRC, we propose an optimization framework to select a common visual relationship in each image of the bag. The goal of the optimization framework is to find the optimal solution by learning visual relationship similarity across images in a few-shot setting. To obtain robust visual relationship representation, we utilize a simple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
