Detecting Visual Relationships Using Box Attention
Alexander Kolesnikov, Alina Kuznetsova, Christoph H. Lampert and, Vittorio Ferrari

TL;DR
This paper introduces a Box Attention mechanism for detecting visual relationships in images, enhancing structured understanding by modeling object interactions without adding complexity to existing detection models.
Contribution
The paper presents a novel Box Attention approach that models pairwise object interactions efficiently within standard detection pipelines, avoiding extra complex components.
Findings
Strong results on V-COCO, Visual Relationships, and Open Images datasets.
Model achieves high accuracy in detecting diverse visual relationships.
Method outperforms previous approaches in qualitative and quantitative evaluations.
Abstract
We propose a new model for detecting visual relationships, such as "person riding motorcycle" or "bottle on table". This task is an important step towards comprehensive structured image understanding, going beyond detecting individual objects. Our main novelty is a Box Attention mechanism that allows to model pairwise interactions between objects using standard object detection pipelines. The resulting model is conceptually clean, expressive and relies on well-justified training and prediction procedures. Moreover, unlike previously proposed approaches, our model does not introduce any additional complex components or hyperparameters on top of those already required by the underlying detection model. We conduct an experimental evaluation on three challenging datasets, V-COCO, Visual Relationships and Open Images, demonstrating strong quantitative and qualitative results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
