STEPs: Self-Supervised Key Step Extraction and Localization from   Unlabeled Procedural Videos

Anshul Shah; Benjamin Lundell; Harpreet Sawhney; Rama Chellappa

arXiv:2301.00794·cs.CV·September 12, 2023

STEPs: Self-Supervised Key Step Extraction and Localization from Unlabeled Procedural Videos

Anshul Shah, Benjamin Lundell, Harpreet Sawhney, Rama Chellappa

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces STEPs, a self-supervised method for extracting and localizing key steps in unlabeled procedural videos, leveraging multi-cue features and a novel contrastive loss to improve AR-based training applications.

Contribution

It presents a new self-supervised learning framework with BMC2 loss and techniques for training lightweight temporal modules using multiple cues, enhancing key step extraction without labels.

Findings

01

Significant improvements in key step localization accuracy

02

Effective use of multi-cue information like optical flow, depth, and gaze

03

Qualitative results show meaningful and succinct key step representations

Abstract

We address the problem of extracting key steps from unlabeled procedural videos, motivated by the potential of Augmented Reality (AR) headsets to revolutionize job training and performance. We decompose the problem into two steps: representation learning and key steps extraction. We propose a training objective, Bootstrapped Multi-Cue Contrastive (BMC2) loss to learn discriminative representations for various steps without any labels. Different from prior works, we develop techniques to train a light-weight temporal module which uses off-the-shelf features for self supervision. Our approach can seamlessly leverage information from multiple cues like optical flow, depth or gaze to learn discriminative features for key-steps, making it amenable for AR applications. We finally extract key steps via a tunable algorithm that clusters the representations and samples. We show significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anshulbshah/steps
pytorchOfficial

Videos

STEPs: Self-Supervised Key Step Extraction and Localization from Unlabeled Procedural Videos· youtube

Taxonomy

TopicsAdvanced Vision and Imaging