What and When to Look?: Temporal Span Proposal Network for Video   Relation Detection

Sangmin Woo; Junhyug Noh; Kangil Kim

arXiv:2107.07154·cs.CV·October 28, 2024·1 cites

What and When to Look?: Temporal Span Proposal Network for Video Relation Detection

Sangmin Woo, Junhyug Noh, Kangil Kim

PDF

Open Access 1 Repo

TL;DR

This paper introduces the Temporal Span Proposal Network (TSPN), a novel method for video relation detection that predicts relation categories and temporal spans, improving efficiency and performance over existing approaches.

Contribution

The paper proposes TSPN, which sparsifies relation search and predicts temporal spans simultaneously, addressing limitations of previous segment- and window-based methods in VidVRD.

Findings

01

TSPN accelerates training by over 2 times compared to existing methods.

02

TSPN achieves competitive results on VidVRD benchmarks.

03

Ablative experiments validate the effectiveness of TSPN.

Abstract

Identifying relations between objects is central to understanding the scene. While several works have been proposed for relation modeling in the image domain, there have been many constraints in the video domain due to challenging dynamics of spatio-temporal interactions (e.g., between which objects are there an interaction? when do relations start and end?). To date, two representative methods have been proposed to tackle Video Visual Relation Detection (VidVRD): segment-based and window-based. We first point out limitations of these methods and propose a novel approach named Temporal Span Proposal Network (TSPN). TSPN tells what to look: it sparsifies relation search space by scoring relationness of object pair, i.e., measuring how probable a relation exist. TSPN tells when to look: it simultaneously predicts start-end timestamps (i.e., temporal spans) and categories of the all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sangminwoo/Temporal-Span-Proposal-Network-VidVRD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning