Post-Processing Temporal Action Detection
Sauradip Nag, Xiatian Zhu, Yi-Zhe Song, Tao Xiang

TL;DR
This paper introduces a model-agnostic post-processing method called Gaussian Approximated Post-processing (GAP) that improves temporal boundary detection in action detection models, enhances performance at lower resolutions, and does not require retraining.
Contribution
The paper proposes a novel post-processing approach that models action start and end points with Gaussian distributions, enabling sub-snippet boundary inference without retraining existing models.
Findings
GAP improves average mAP on ActivityNet and THUMOS benchmarks.
GAP can be integrated with existing models for performance enhancement.
GAP enables lower-resolution, more efficient inference.
Abstract
Existing Temporal Action Detection (TAD) methods typically take a pre-processing step in converting an input varying-length video into a fixed-length snippet representation sequence, before temporal boundary estimation and action classification. This pre-processing step would temporally downsample the video, reducing the inference resolution and hampering the detection performance in the original temporal resolution. In essence, this is due to a temporal quantization error introduced during the resolution downsampling and recovery. This could negatively impact the TAD performance, but is largely ignored by existing methods. To address this problem, in this work we introduce a novel model-agnostic post-processing method without model redesign and retraining. Specifically, we model the start and end points of action instances with a Gaussian distribution for enabling temporal boundary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
