Weakly-supervised learning of visual relations
Julia Peyre, Ivan Laptev, Cordelia Schmid, Josef Sivic

TL;DR
This paper presents a weakly-supervised approach for learning visual object relations using discriminative clustering, novel features, and a new dataset, achieving state-of-the-art results especially in zero-shot relation recognition.
Contribution
It introduces a weakly-supervised discriminative clustering model, new visual features for relation modeling, and a challenging dataset for evaluating visual relation retrieval.
Findings
State-of-the-art performance on visual relation dataset
Significant improvement in zero-shot relation recognition
Effective relation modeling with weak supervision
Abstract
This paper introduces a novel approach for modeling visual relations between pairs of objects. We call relation a triplet of the form (subject, predicate, object) where the predicate is typically a preposition (eg. 'under', 'in front of') or a verb ('hold', 'ride') that links a pair of objects (subject, object). Learning such relations is challenging as the objects have different spatial configurations and appearances depending on the relation in which they occur. Another major challenge comes from the difficulty to get annotations, especially at box-level, for all possible triplets, which makes both learning and evaluation difficult. The contributions of this paper are threefold. First, we design strong yet flexible visual features that encode the appearance and spatial configuration for pairs of objects. Second, we propose a weakly-supervised discriminative clustering model to learn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Weakly-supervised learning of visual relations· youtube
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
