How You Move Your Head Tells What You Do: Self-supervised Video Representation Learning with Egocentric Cameras and IMU Sensors
Satoshi Tsutsui, Ruta Desai, Karl Ridgeway

TL;DR
This paper introduces a self-supervised learning method that leverages head-motion data from IMU sensors to learn video representations for recognizing activities in egocentric videos, reducing reliance on labeled data.
Contribution
The work presents a novel SSL approach that uses head-motion data to learn video representations, improving activity recognition without extensive manual annotations.
Findings
Effective activity recognition for humans and dogs.
Improved representation quality with self-supervised learning.
Reduced need for labeled data.
Abstract
Understanding users' activities from head-mounted cameras is a fundamental task for Augmented and Virtual Reality (AR/VR) applications. A typical approach is to train a classifier in a supervised manner using data labeled by humans. This approach has limitations due to the expensive annotation cost and the closed coverage of activity labels. A potential way to address these limitations is to use self-supervised learning (SSL). Instead of relying on human annotations, SSL leverages intrinsic properties of data to learn representations. We are particularly interested in learning egocentric video representations benefiting from the head-motion generated by users' daily activities, which can be easily obtained from IMU sensors embedded in AR/VR devices. Towards this goal, we propose a simple but effective approach to learn video representation by learning to tell the corresponding pairs of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Multimodal Machine Learning Applications
