Introduction to the 1st Place Winning Model of OpenImages Relationship   Detection Challenge

Ji Zhang; Kevin Shih; Andrew Tao; Bryan Catanzaro; Ahmed Elgammal

arXiv:1811.00662·cs.CV·November 9, 2018·6 cites

Introduction to the 1st Place Winning Model of OpenImages Relationship Detection Challenge

Ji Zhang, Kevin Shih, Andrew Tao, Bryan Catanzaro, Ahmed Elgammal

PDF

Open Access

TL;DR

This paper presents a top-performing model for visual relationship detection that combines language bias, spatial features, and feature fusion techniques, achieving first place in a Kaggle challenge.

Contribution

It introduces a novel combination of language bias, spatial features, and feature fusion methods that significantly improve relationship detection performance.

Findings

01

Language bias baseline is highly effective.

02

Spatial features are crucial for spatial relationships.

03

Feature fusion enhances overall model accuracy.

Abstract

This article describes the model we built that achieved 1st place in the OpenImage Visual Relationship Detection Challenge on Kaggle. Three key factors contribute the most to our success: 1) language bias is a powerful baseline for this task. We build the empirical distribution $P (p r e d i c a t e ∣ s u bj ec t, o bj ec t)$ in the training set and directly use that in testing. This baseline achieved the 2nd place when submitted; 2) spatial features are as important as visual features, especially for spatial relationships such as "under" and "inside of"; 3) It is a very effective way to fuse different features by first building separate modules for each of them, then adding their output logits before the final softmax layer. We show in ablation study that each factor can improve the performance to a non-trivial extent, and the model reaches optimal when all of them are combined.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Topic Modeling

MethodsSoftmax