VrdONE: One-stage Video Visual Relation Detection

Xinjie Jiang; Chenxi Zheng; Xuemiao Xu; Bangzhen Liu; Weiying Zheng,; Huaidong Zhang; Shengfeng He

arXiv:2408.09408·cs.CV·October 17, 2024

VrdONE: One-stage Video Visual Relation Detection

Xinjie Jiang, Chenxi Zheng, Xuemiao Xu, Bangzhen Liu, Weiying Zheng,, Huaidong Zhang, Shengfeng He

PDF

1 Repo

TL;DR

VrdONE is a novel one-stage video visual relation detection model that efficiently captures spatiotemporal interactions between entities in videos, achieving state-of-the-art results without complex multi-step processes.

Contribution

It introduces VrdONE, a streamlined one-stage model with a Subject-Object Synergy module for improved relation detection across various temporal scales.

Findings

01

Achieves state-of-the-art performance on VidOR and ImageNet-VidVRD benchmarks.

02

Effectively captures both short-lived and long-lasting relations in videos.

03

Eliminates the need for proposal generation and post-processing steps.

Abstract

Video Visual Relation Detection (VidVRD) focuses on understanding how entities interact over time and space in videos, a key step for gaining deeper insights into video scenes beyond basic visual tasks. Traditional methods for VidVRD, challenged by its complexity, typically split the task into two parts: one for identifying what relation categories are present and another for determining their temporal boundaries. This split overlooks the inherent connection between these elements. Addressing the need to recognize entity pairs' spatiotemporal interactions across a range of durations, we propose VrdONE, a streamlined yet efficacious one-stage model. VrdONE combines the features of subjects and objects, turning predicate detection into 1D instance segmentation on their combined representations. This setup allows for both relation category identification and binary mask generation in one…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lucaspk512/vrdone
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.