SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition
Victor Escorcia, Ricardo Guerrero, Xiatian Zhu, Brais Martinez

TL;DR
This paper introduces SOS, a self-supervised learning method that pre-trains an object representation model from egocentric videos, improving action recognition by leveraging object set relationships without requiring detailed object annotations.
Contribution
The paper proposes SOS, a novel self-supervised approach that pre-trains object representations from video object sets, reducing annotation needs and decoupling object and action models for better flexibility.
Findings
OIC improves performance of state-of-the-art models on EPIC-KITCHENS-100.
OIC reduces reliance on object class annotations.
OIC achieves significant accuracy boosts in experiments.
Abstract
Learning an egocentric action recognition model from video data is challenging due to distractors (e.g., irrelevant objects) in the background. Further integrating object information into an action model is hence beneficial. Existing methods often leverage a generic object detector to identify and represent the objects in the scene. However, several important issues remain. Object class annotations of good quality for the target domain (dataset) are still required for learning good object representation. Besides, previous methods deeply couple the existing action models and need to retrain them jointly with object representation, leading to costly and inflexible integration. To overcome both limitations, we introduce Self-Supervised Learning Over Sets (SOS), an approach to pre-train a generic Objects In Contact (OIC) representation model from video object regions detected by an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Hand Gesture Recognition Systems
