Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022
Jiachen Lei, Shuang Ma, Zhongjie Ba, Sai Vemprala, Ashish Kapoor and, Kui Ren

TL;DR
This paper describes the application of masked autoencoders to egocentric video understanding tasks, achieving high rankings in the Ego4D Challenge 2022, with results on object state change classification and temporal localization.
Contribution
The paper introduces the use of masked autoencoders for egocentric video tasks and demonstrates their effectiveness through empirical results in a competitive challenge.
Findings
Ranked 2nd in Object State Change Classification
Ranked 2nd in PNR Temporal Localization
Code will be publicly available
Abstract
In this report, we present our approach and empirical results of applying masked autoencoders in two egocentric video understanding tasks, namely, Object State Change Classification and PNR Temporal Localization, of Ego4D Challenge 2022. As team TheSSVL, we ranked 2nd place in both tasks. Our code will be made available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Machine Learning in Healthcare
