Towards Automatic Learning of Procedures from Web Instructional Videos
Luowei Zhou, Chenliang Xu, Jason J. Corso

TL;DR
This paper introduces a new approach for segmenting instructional videos into procedure steps using only visual data, without relying on action labels, and demonstrates its effectiveness on a large-scale cooking video dataset.
Contribution
It presents the problem of procedure segmentation from unconstrained videos, creates the YouCook2 dataset, and proposes a segment-level recurrent network that outperforms baselines.
Findings
Proposed model outperforms baselines in procedure segmentation
Created the large-scale YouCook2 dataset for this task
Segments can be used for dense captioning and event parsing
Abstract
The potential for agents, whether embodied or software, to learn by observing other agents performing procedures involving objects and actions is rich. Current research on automatic procedure learning heavily relies on action labels or video subtitles, even during the evaluation phase, which makes them infeasible in real-world scenarios. This leads to our question: can the human-consensus structure of a procedure be learned from a large set of long, unconstrained videos (e.g., instructional videos from YouTube) with only visual evidence? To answer this question, we introduce the problem of procedure segmentation--to segment a video procedure into category-independent procedure segments. Given that no large-scale dataset is available for this problem, we collect a large-scale procedure segmentation dataset with procedure segments temporally localized and described; we use cooking videos…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
