Leveraging Procedural Knowledge and Task Hierarchies for Efficient   Instructional Video Pre-training

Karan Samel; Nitish Sontakke; Irfan Essa

arXiv:2502.17352·cs.CV·February 25, 2025

Leveraging Procedural Knowledge and Task Hierarchies for Efficient Instructional Video Pre-training

Karan Samel, Nitish Sontakke, Irfan Essa

PDF

Open Access

TL;DR

This paper introduces Pivot, a pre-training approach for instructional videos that leverages task hierarchies and procedural knowledge to improve task and step recognition, especially when data and compute are limited.

Contribution

The paper presents a novel pre-training method that explicitly incorporates task hierarchies and procedural steps for instructional video understanding, enhancing efficiency and generalization.

Findings

01

Outperforms baselines in limited data scenarios

02

Effective in task and step recognition

03

Improves step prediction accuracy

Abstract

Instructional videos provide a convenient modality to learn new tasks (ex. cooking a recipe, or assembling furniture). A viewer will want to find a corresponding video that reflects both the overall task they are interested in as well as contains the relevant steps they need to carry out the task. To perform this, an instructional video model should be capable of inferring both the tasks and the steps that occur in an input video. Doing this efficiently and in a generalizable fashion is key when compute or relevant video topics used to train this model are limited. To address these requirements we explicitly mine task hierarchies and the procedural steps associated with instructional videos. We use this prior knowledge to pre-train our model, $Pivot$ , for step and task prediction. During pre-training, we also provide video augmentation and early stopping strategies to optimally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning

MethodsEarly Stopping