Learning to Anticipate Egocentric Actions by Imagination
Yu Wu, Linchao Zhu, Xiaohan Wang, Yi Yang, Fei Wu

TL;DR
This paper introduces a novel approach for egocentric action anticipation by imagining future visual features and predicting action labels, significantly improving performance on large-scale datasets.
Contribution
It proposes ImagineRNN, a contrastive learning-based model that predicts future features through a residual anticipation mechanism, enhancing action anticipation accuracy.
Findings
Outperforms previous methods on EPIC Kitchens datasets
Effective in both seen and unseen test sets
Validates the benefit of future feature imagination for anticipation
Abstract
Anticipating actions before they are executed is crucial for a wide range of practical applications, including autonomous driving and robotics. In this paper, we study the egocentric action anticipation task, which predicts future action seconds before it is performed for egocentric videos. Previous approaches focus on summarizing the observed content and directly predicting future action based on past observations. We believe it would benefit the action anticipation if we could mine some cues to compensate for the missing information of the unobserved frames. We then propose to decompose the action anticipation into a series of future feature predictions. We imagine how the visual feature changes in the near future and then predicts future action labels based on these imagined representations. Differently, our ImagineRNN is optimized in a contrastive learning way instead of feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Learning
