My View is the Best View: Procedure Learning from Egocentric Videos
Siddhant Bansal, Chetan Arora, C.V. Jawahar

TL;DR
This paper introduces a self-supervised framework called CnC for learning procedures from egocentric videos, overcoming challenges like head motion and unrelated frames, and demonstrates improved performance on benchmark datasets.
Contribution
The paper proposes a novel self-supervised method leveraging temporal correspondences for procedure learning from egocentric videos, and introduces the EgoProceL dataset.
Findings
CnC outperforms state-of-the-art on ProceL and CrossTask datasets.
EgoProceL dataset contains 62 hours of videos from 130 subjects.
The approach effectively handles extreme camera view changes and unrelated frames.
Abstract
Procedure learning involves identifying the key-steps and determining their logical order to perform a task. Existing approaches commonly use third-person videos for learning the procedure, making the manipulated object small in appearance and often occluded by the actor, leading to significant errors. In contrast, we observe that videos obtained from first-person (egocentric) wearable cameras provide an unobstructed and clear view of the action. However, procedure learning from egocentric videos is challenging because (a) the camera view undergoes extreme changes due to the wearer's head motion, and (b) the presence of unrelated frames due to the unconstrained nature of the videos. Due to this, current state-of-the-art methods' assumptions that the actions occur at approximately the same time and are of the same duration, do not hold. Instead, we propose to use the signal provided by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Stroke Rehabilitation and Recovery · Virtual Reality Applications and Impacts
