TVNet: Temporal Voting Network for Action Localization
Hanyuan Wang, Dima Damen, Majid Mirmehdi, Toby Perrett

TL;DR
TVNet introduces a novel temporal voting mechanism for more accurate action boundary detection in untrimmed videos, significantly improving localization performance across multiple benchmarks.
Contribution
The paper presents a new Voting Evidence Module within TVNet that enhances temporal boundary detection and integrates action-independent evidence for improved confidence scoring.
Findings
Achieves 34.6% mAP on ActivityNet-1.3 at high IoU
Outperforms previous methods on THUMOS14 at 0.5 IoU
Demonstrates strong boundary localization accuracy
Abstract
We propose a Temporal Voting Network (TVNet) for action localization in untrimmed videos. This incorporates a novel Voting Evidence Module to locate temporal boundaries, more accurately, where temporal contextual evidence is accumulated to predict frame-level probabilities of start and end action boundaries. Our action-independent evidence module is incorporated within a pipeline to calculate confidence scores and action classes. We achieve an average mAP of 34.6% on ActivityNet-1.3, particularly outperforming previous methods with the highest IoU of 0.95. TVNet also achieves mAP of 56.0% when combined with PGCN and 59.1% with MUSES at 0.5 IoU on THUMOS14 and outperforms prior work at all thresholds. Our code is available at https://github.com/hanielwang/TVNet.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
