DiffAnt: Diffusion Models for Action Anticipation
Zeyun Zhong, Chengzhi Wu, Manuel Martin, Michael Voit, Juergen Gall,, J\"urgen Beyerer

TL;DR
DiffAnt introduces a generative diffusion model approach for action anticipation, effectively capturing multiple plausible future actions from observed videos and outperforming or matching state-of-the-art methods on several benchmarks.
Contribution
This work pioneers the use of diffusion models for action anticipation, addressing the uncertainty in future action prediction with a generative framework.
Findings
Achieves superior or comparable results on four benchmark datasets.
Effectively models multiple future actions from a single observed video.
Demonstrates the effectiveness of generative diffusion models in action anticipation.
Abstract
Anticipating future actions is inherently uncertain. Given an observed video segment containing ongoing actions, multiple subsequent actions can plausibly follow. This uncertainty becomes even larger when predicting far into the future. However, the majority of existing action anticipation models adhere to a deterministic approach, neglecting to account for future uncertainties. In this work, we rethink action anticipation from a generative view, employing diffusion models to capture different possible future actions. In this framework, future actions are iteratively generated from standard Gaussian noise in the latent space, conditioned on the observed video, and subsequently transitioned into the action space. Extensive experiments on four benchmark datasets, i.e., Breakfast, 50Salads, EpicKitchens, and EGTEA Gaze+, are performed and the proposed method achieves superior or comparable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging
MethodsDiffusion
