GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
Jiafeng Liang, Shixin Jiang, Zekun Wang, Haojie Pan, Zerui Chen, Zheng, Chu, Ming Liu, Ruiji Fu, Zhongyuan Wang, Bing Qin

TL;DR
GUIDE introduces a comprehensive dataset with task-level guidelines and annotations for instructional videos, enabling improved model understanding and generation of task descriptions, summaries, and guided captions.
Contribution
The paper presents a new dataset with task-level guidelines and annotations, along with benchmarks for instructional video comprehension tasks, addressing limitations of existing datasets.
Findings
Models can generate step captions and guidelines effectively.
Guideline-guided captioning improves task understanding.
GUIDE serves as a new benchmark for instructional video comprehension.
Abstract
There are substantial instructional videos on the Internet, which provide us tutorials for completing various tasks. Existing instructional video datasets only focus on specific steps at the video level, lacking experiential guidelines at the task level, which can lead to beginners struggling to learn new tasks due to the lack of relevant experience. Moreover, the specific steps without guidelines are trivial and unsystematic, making it difficult to provide a clear tutorial. To address these problems, we present the GUIDE (Guideline-Guided) dataset, which contains 3.5K videos of 560 instructional tasks in 8 domains related to our daily life. Specifically, we annotate each instructional task with a guideline, representing a common pattern shared by all task-related videos. On this basis, we annotate systematic specific steps, including their associated guideline steps, specific step…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTechnology-Enhanced Education Studies · Online Learning and Analytics · Multimodal Machine Learning Applications
MethodsFocus
