Temporal Zoom Networks: Distance Regression and Continuous Depth for Efficient Action Localization

Ibne Farabi Shihab; Sanjeda Akter; Anuj Sharma

arXiv:2511.03943·cs.CV·November 14, 2025

Temporal Zoom Networks: Distance Regression and Continuous Depth for Efficient Action Localization

Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

PDF

Open Access

TL;DR

This paper introduces Temporal Zoom Networks with Boundary Distance Regression and Adaptive Temporal Refinement, achieving efficient and accurate action localization by focusing computation on difficult boundaries, reducing FLOPs and latency.

Contribution

It presents a novel distance regression approach and a continuous depth allocation mechanism, improving efficiency and accuracy in temporal action localization.

Findings

01

Achieves 56.5% [email protected] on THUMOS14 with fewer FLOPs

02

Reduces FLOPs by 36% compared to ActionFormer++

03

Improves short action detection by 4.2% mAP

Abstract

Temporal action localization requires both precise boundary detection and computational efficiency. Current methods apply uniform computation across all temporal positions, wasting resources on easy boundaries while struggling with ambiguous ones. We address this through two complementary innovations: Boundary Distance Regression (BDR), which replaces classification-based boundary detection with signed-distance regression achieving 3.3--16.7 $\times$ lower variance; and Adaptive Temporal Refinement (ATR), which allocates transformer depth continuously ( $τ \in [0, 1]$ ) to concentrate computation near difficult boundaries. On THUMOS14, our method achieves 56.5\% [email protected] and 58.2\% average mAP@[0.3:0.7] with 151G FLOPs, using 36\% fewer FLOPs than ActionFormer++ (55.7\% [email protected] at 235G). Compared to uniform baselines, we achieve +2.9\% [email protected] (+1.8\% avg mAP, 5.4\% relative) with 24\%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Action Observation and Synchronization