Temporal Zoom Networks: Distance Regression and Continuous Depth for Efficient Action Localization
Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

TL;DR
This paper introduces Temporal Zoom Networks with Boundary Distance Regression and Adaptive Temporal Refinement, achieving efficient and accurate action localization by focusing computation on difficult boundaries, reducing FLOPs and latency.
Contribution
It presents a novel distance regression approach and a continuous depth allocation mechanism, improving efficiency and accuracy in temporal action localization.
Findings
Achieves 56.5% [email protected] on THUMOS14 with fewer FLOPs
Reduces FLOPs by 36% compared to ActionFormer++
Improves short action detection by 4.2% mAP
Abstract
Temporal action localization requires both precise boundary detection and computational efficiency. Current methods apply uniform computation across all temporal positions, wasting resources on easy boundaries while struggling with ambiguous ones. We address this through two complementary innovations: Boundary Distance Regression (BDR), which replaces classification-based boundary detection with signed-distance regression achieving 3.3--16.7 lower variance; and Adaptive Temporal Refinement (ATR), which allocates transformer depth continuously () to concentrate computation near difficult boundaries. On THUMOS14, our method achieves 56.5\% [email protected] and 58.2\% average mAP@[0.3:0.7] with 151G FLOPs, using 36\% fewer FLOPs than ActionFormer++ (55.7\% [email protected] at 235G). Compared to uniform baselines, we achieve +2.9\% [email protected] (+1.8\% avg mAP, 5.4\% relative) with 24\%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Action Observation and Synchronization
