MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation
Olga Zatsarynna, Emad Bahrami, Yazan Abu Farha, Gianpiero Francesca,, Juergen Gall

TL;DR
This paper introduces MANTA, a novel diffusion-based model that significantly improves long-term dense action anticipation by enabling efficient, long-range temporal understanding with linear complexity, achieving state-of-the-art results.
Contribution
MANTA is a new diffusion model that enhances long-term action anticipation with efficient long-range temporal modeling and linear complexity.
Findings
Achieves state-of-the-art results on three datasets.
Significantly improves computational and memory efficiency.
Enables effective long-term temporal modeling for very long sequences.
Abstract
Long-term dense action anticipation is very challenging since it requires predicting actions and their durations several minutes into the future based on provided video observations. To model the uncertainty of future outcomes, stochastic models predict several potential future action sequences for the same observation. Recent work has further proposed to incorporate uncertainty modelling for observed frames by simultaneously predicting per-frame past and future actions in a unified manner. While such joint modelling of actions is beneficial, it requires long-range temporal capabilities to connect events across distant past and future time points. However, the previous work struggles to achieve such a long-range understanding due to its limited and/or sparse receptive field. To alleviate this issue, we propose a novel MANTA (MAmba for ANTicipation) network. Our model enables effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Wireless Communication Technologies
