TadML: A fast temporal action detection with Mechanics-MLP
Bowen Deng, Dongchang Liu

TL;DR
TadML introduces a fast, one-stage, anchor-free temporal action detection method using Mechanics-MLP that achieves high accuracy and significantly faster inference speeds on untrimmed videos without optical flow computation.
Contribution
The paper proposes a novel Mechanics-MLP architecture for real-time, one-stage temporal action detection using only RGB data, improving inference speed over existing methods.
Findings
Achieves 4.44 videos per second inference speed on THUMOS14
Maintains comparable accuracy with state-of-the-art models
Eliminates the need for optical flow computation
Abstract
Temporal Action Detection(TAD) is a crucial but challenging task in video understanding.It is aimed at detecting both the type and start-end frame for each action instance in a long, untrimmed video.Most current models adopt both RGB and Optical-Flow streams for the TAD task. Thus, original RGB frames must be converted manually into Optical-Flow frames with additional computation and time cost, which is an obstacle to achieve real-time processing. At present, many models adopt two-stage strategies, which would slow the inference speed down and complicatedly tuning on proposals generating.By comparison, we propose a one-stage anchor-free temporal localization method with RGB stream only, in which a novel Newtonian Mechanics-MLP architecture is established. It has comparable accuracy with all existing state-of-the-art models, while surpasses the inference speed of these methods by a large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis
MethodsAverage Pooling · Layer Normalization · Residual Connection · Dropout · Global Average Pooling · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · MLP-Mixer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
