Object-Aware Multi-Branch Relation Networks for Spatio-Temporal Video   Grounding

Zhu Zhang; Zhou Zhao; Zhijie Lin; Baoxing Huai; Nicholas Jing Yuan

arXiv:2008.06941·cs.CV·August 25, 2020·1 cites

Object-Aware Multi-Branch Relation Networks for Spatio-Temporal Video Grounding

Zhu Zhang, Zhou Zhao, Zhijie Lin, Baoxing Huai, Nicholas Jing Yuan

PDF

Open Access

TL;DR

This paper introduces an object-aware multi-branch relation network that improves spatio-temporal video grounding by effectively modeling object relations in unaligned data and multi-form sentences.

Contribution

It proposes a novel multi-branch relation network with diversity loss for better object relation modeling in unaligned video grounding tasks.

Findings

01

Outperforms existing methods on benchmark datasets.

02

Effectively distinguishes notable objects in complex scenes.

03

Enhances relation reasoning between key objects.

Abstract

Spatio-temporal video grounding aims to retrieve the spatio-temporal tube of a queried object according to the given sentence. Currently, most existing grounding methods are restricted to well-aligned segment-sentence pairs. In this paper, we explore spatio-temporal video grounding on unaligned data and multi-form sentences. This challenging task requires to capture critical object relations to identify the queried target. However, existing approaches cannot distinguish notable objects and remain in ineffective relation modeling between unnecessary objects. Thus, we propose a novel object-aware multi-branch relation network for object-aware relation discovery. Concretely, we first devise multiple branches to develop object-aware region modeling, where each branch focuses on a crucial object mentioned in the sentence. We then propose multi-branch relation reasoning to capture critical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Natural Language Processing Techniques