Deep Point-wise Prediction for Action Temporal Proposal
Luxuan Li, Tao Kong, Fuchun Sun, Huaping Liu

TL;DR
This paper introduces Deep Point-wise Prediction (DPP), a fast, end-to-end method for generating temporal action proposals in videos without relying on sliding windows or grouping, achieving real-time performance.
Contribution
The paper proposes a novel end-to-end approach for temporal action proposal generation that predicts action likelihoods and locations simultaneously, eliminating the need for handcrafted strategies.
Findings
DPP achieves over 1000 frames per second in processing.
DPP demonstrates superior effectiveness, generality, and robustness on THUMOS14.
The method outperforms previous approaches in speed and accuracy.
Abstract
Detecting actions in videos is an important yet challenging task. Previous works usually utilize (a) sliding window paradigms, or (b) per-frame action scoring and grouping to enumerate the possible temporal locations. Their performances are also limited to the designs of sliding windows or grouping strategies. In this paper, we present a simple and effective method for temporal action proposal generation, named Deep Point-wise Prediction (DPP). DPP simultaneously predicts the action existing possibility and the corresponding temporal locations, without the utilization of any handcrafted sliding window or grouping. The whole system is end-to-end trained with joint loss of temporal action proposal classification and location prediction. We conduct extensive experiments to verify its effectiveness, generality and robustness on standard THUMOS14 dataset. DPP runs more than 1000 frames per…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Multimodal Machine Learning Applications
