Faster Diffusion Action Segmentation
Shuaibing Wang, Shunli Wang, Mingcheng Li, Dingkang Yang, Haopeng, Kuang, Ziyun Qian, Lihua Zhang

TL;DR
EffiDiffAct is a novel, efficient diffusion-based approach for temporal action segmentation that reduces computational costs and improves accuracy by using lightweight encoders and adaptive strategies.
Contribution
The paper introduces EffiDiffAct, combining a lightweight encoder and adaptive skip strategy to enhance diffusion-based TAS with lower computational demands.
Findings
Outperforms existing methods on 50Salads, Breakfast, and GTEA datasets.
Reduces computational overhead significantly.
Improves segmentation accuracy and efficiency.
Abstract
Temporal Action Segmentation (TAS) is an essential task in video analysis, aiming to segment and classify continuous frames into distinct action segments. However, the ambiguous boundaries between actions pose a significant challenge for high-precision segmentation. Recent advances in diffusion models have demonstrated substantial success in TAS tasks due to their stable training process and high-quality generation capabilities. However, the heavy sampling steps required by diffusion models pose a substantial computational burden, limiting their practicality in real-time applications. Additionally, most related works utilize Transformer-based encoder architectures. Although these architectures excel at capturing long-range dependencies, they incur high computational costs and face feature-smoothing issues when processing long video sequences. To address these challenges, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Advanced Neural Network Applications
MethodsDiffusion
