RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition
Jun Chen, Aniket Agarwal, Sherif Abdelkarim, Deyao Zhu, Mohamed, Elhoseiny

TL;DR
RelTransformer introduces a message-passing attention mechanism with a learnable memory to effectively recognize long-tail visual relationships in images, significantly improving performance on large-scale VRR benchmarks.
Contribution
It proposes a novel attention-based scene graph model with a learnable memory to address long-tail distribution challenges in visual relationship recognition.
Findings
Outperforms state-of-the-art on VG8K-LT with +2.0% accuracy
Achieves +26.0% accuracy on GQA-LT
Shows strong results on VG200 relation detection
Abstract
The visual relationship recognition (VRR) task aims at understanding the pairwise visual relationships between interacting objects in an image. These relationships typically have a long-tail distribution due to their compositional nature. This problem gets more severe when the vocabulary becomes large, rendering this task very challenging. This paper shows that modeling an effective message-passing flow through an attention mechanism can be critical to tackling the compositionality and long-tail challenges in VRR. The method, called RelTransformer, represents each image as a fully-connected scene graph and restructures the whole scene into the relation-triplet and global-scene contexts. It directly passes the message from each element in the relation-triplet and global-scene contexts to the target relation via self-attention. We also design a learnable memory to augment the long-tail…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
