TL;DR
This paper introduces a Self-Regulated Learning framework for egocentric video activity anticipation that improves prediction accuracy by emphasizing novel information and correlating current frames with past observations, outperforming existing methods.
Contribution
The proposed SRL framework effectively regulates intermediate representations using contrastive loss and dynamic reweighing, enhancing activity anticipation in egocentric videos.
Findings
SRL outperforms state-of-the-art methods on multiple datasets.
The framework accurately identifies supporting action and object concepts.
Multi-task learning further improves activity representation quality.
Abstract
Future activity anticipation is a challenging problem in egocentric vision. As a standard future activity anticipation paradigm, recursive sequence prediction suffers from the accumulation of errors. To address this problem, we propose a simple and effective Self-Regulated Learning framework, which aims to regulate the intermediate representation consecutively to produce representation that (a) emphasizes the novel information in the frame of the current time-stamp in contrast to previously observed content, and (b) reflects its correlation with previously observed frames. The former is achieved by minimizing a contrastive loss, and the latter can be achieved by a dynamic reweighing mechanism to attend to informative frames in the observed content with a similarity comparison between feature of the current frame and observed frames. The learned final video representation can be further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
