Tensor Composition Net for Visual Relationship Prediction
Yuting Qiang, Yongxin Yang, Xueting Zhang, Yanwen Guo, Timothy M., Hospedales

TL;DR
The paper introduces a Tensor Composition Net that leverages low-rank tensor properties to improve visual relationship prediction, enabling the prediction of unseen relationships and enhancing image-retrieval tasks.
Contribution
A novel Tensor Composition Net utilizing tensor decomposition for structured visual relationship prediction, including unseen relationships, outperforming existing methods.
Findings
Outperforms Multi-Label and eXtreme Multi-label Classification methods.
Can predict unseen visual relationships.
Provides efficient relation-based image retrieval.
Abstract
We present a novel Tensor Composition Net (TCN) to predict visual relationships in images. Visual Relationship Prediction (VRP) provides a more challenging test of image understanding than conventional image tagging and is difficult to learn due to a large label-space and incomplete annotation. The key idea of our TCN is to exploit the low-rank property of the visual relationship tensor, so as to leverage correlations within and across objects and relations and make a structured prediction of all visual relationships in an image. To show the effectiveness of our model, we first empirically compare our model with Multi-Label Image Classification (MLIC) methods, eXtreme Multi-label Classification (XMC) methods, and VRD methods. We then show that thanks to our tensor (de)composition layer, our model can predict visual relationships which have not been seen in the training dataset. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
