Video Relation Detection with Trajectory-aware Multi-modal Features

Wentao Xie; Guanghui Ren; Si Liu

arXiv:2101.08165·cs.CV·January 21, 2021

Video Relation Detection with Trajectory-aware Multi-modal Features

Wentao Xie, Guanghui Ren, Si Liu

PDF

TL;DR

This paper introduces a trajectory-aware multi-modal feature approach for video relation detection, decomposing the task into object detection, trajectory proposal, and relation prediction, achieving top performance in a major challenge.

Contribution

It presents a novel multi-modal feature method combined with trajectory awareness, significantly improving video relation detection accuracy.

Findings

01

Achieved 11.74% mAP on Video Relation Understanding Grand Challenge

02

Outperformed existing methods by a large margin

03

Validated effectiveness of trajectory-aware multi-modal features

Abstract

Video relation detection problem refers to the detection of the relationship between different objects in videos, such as spatial relationship and action relationship. In this paper, we present video relation detection with trajectory-aware multi-modal features to solve this task. Considering the complexity of doing visual relation detection in videos, we decompose this task into three sub-tasks: object detection, trajectory proposal and relation prediction. We use the state-of-the-art object detection method to ensure the accuracy of object trajectory detection and multi-modal feature representation to help the prediction of relation between objects. Our method won the first place on the video relation detection task of Video Relation Understanding Grand Challenge in ACM Multimedia 2020 with 11.74\% mAP, which surpasses other methods by a large margin.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.