Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features
Xu Yang, Hanwang Zhang, Jianfei Cai

TL;DR
This paper introduces a novel pre-training strategy called Shuffle-Then-Assemble to learn object-agnostic visual features, improving the generalization of visual relationship models to rare or unseen object pairs.
Contribution
It proposes a new pre-training method that reduces object bias in visual relationship models by recovering object pairs from unpaired object domains.
Findings
Pre-trained features improve relationship model performance.
Outperforms state-of-the-art relationship models.
Enhances generalization to rare or unseen object pairs.
Abstract
Due to the fact that it is prohibitively expensive to completely annotate visual relationships, i.e., the (obj1, rel, obj2) triplets, relationship models are inevitably biased to object classes of limited pairwise patterns, leading to poor generalization to rare or unseen object combinations. Therefore, we are interested in learning object-agnostic visual features for more generalizable relationship models. By "agnostic", we mean that the feature is less likely biased to the classes of paired objects. To alleviate the bias, we propose a novel \texttt{Shuffle-Then-Assemble} pre-training strategy. First, we discard all the triplet relationship annotations in an image, leaving two unpaired object domains without obj1-obj2 alignment. Then, our feature learning is to recover possible obj1-obj2 pairs. In particular, we design a cycle of residual transformations between the two domains, to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
