Fixed-size Objects Encoding for Visual Relationship Detection

Hengyue Pan; Xin Niu; Rongchun Li; Siqi Shen; Yong Dou

arXiv:2005.14600·cs.CV·June 1, 2020

Fixed-size Objects Encoding for Visual Relationship Detection

Hengyue Pan, Xin Niu, Rongchun Li, Siqi Shen, Yong Dou

PDF

Open Access

TL;DR

This paper introduces FOE-VRD, a fixed-size object encoding method that improves visual relationship detection by encoding all objects, including background ones, into a single fixed-size vector for better predicate classification.

Contribution

The paper proposes a novel fixed-size encoding technique for all objects in an image, enhancing relationship detection performance over previous variable-sized methods.

Findings

01

Effective on VRD dataset for predicate classification

02

Improves zero-shot relationship detection

03

Outperforms previous methods in accuracy

Abstract

In this paper, we propose a fixed-size object encoding method (FOE-VRD) to improve performance of visual relationship detection tasks. Comparing with previous methods, FOE-VRD has an important feature, i.e., it uses one fixed-size vector to encoding all objects in each input image to assist the process of relationship detection. Firstly, we use a regular convolution neural network as a feature extractor to generate high-level features of input images. Then, for each relationship triplet in input images, i.e., $<$ subject-predicate-object $>$ , we apply ROI-pooling to get feature vectors of two regions on the feature maps that corresponding to bounding boxes of the subject and object. Besides the subject and object, our analysis implies that the results of predicate classification may also related to the rest objects in input images (we call them background objects). Due to the variable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques

MethodsConvolution