TVNet: Temporal Voting Network for Action Localization

Hanyuan Wang; Dima Damen; Majid Mirmehdi; Toby Perrett

arXiv:2201.00434·cs.CV·January 4, 2022

TVNet: Temporal Voting Network for Action Localization

Hanyuan Wang, Dima Damen, Majid Mirmehdi, Toby Perrett

PDF

Open Access 1 Repo

TL;DR

TVNet introduces a novel temporal voting mechanism for more accurate action boundary detection in untrimmed videos, significantly improving localization performance across multiple benchmarks.

Contribution

The paper presents a new Voting Evidence Module within TVNet that enhances temporal boundary detection and integrates action-independent evidence for improved confidence scoring.

Findings

01

Achieves 34.6% mAP on ActivityNet-1.3 at high IoU

02

Outperforms previous methods on THUMOS14 at 0.5 IoU

03

Demonstrates strong boundary localization accuracy

Abstract

We propose a Temporal Voting Network (TVNet) for action localization in untrimmed videos. This incorporates a novel Voting Evidence Module to locate temporal boundaries, more accurately, where temporal contextual evidence is accumulated to predict frame-level probabilities of start and end action boundaries. Our action-independent evidence module is incorporated within a pipeline to calculate confidence scores and action classes. We achieve an average mAP of 34.6% on ActivityNet-1.3, particularly outperforming previous methods with the highest IoU of 0.95. TVNet also achieves mAP of 56.0% when combined with PGCN and 59.1% with MUSES at 0.5 IoU on THUMOS14 and outperforms prior work at all thresholds. Our code is available at https://github.com/hanielwang/TVNet.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hanielwang/tvnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications