Unsupervised Task Graph Generation from Instructional Video Transcripts
Lajanugen Logeswaran, Sungryull Sohn, Yunseok Jang, Moontae Lee,, Honglak Lee

TL;DR
This paper presents an unsupervised method for generating task graphs from instructional video transcripts, effectively identifying key steps and their dependencies using language models combined with clustering and ranking.
Contribution
It introduces a novel unsupervised approach that leverages instruction-tuned language models for task graph generation from text transcripts, outperforming supervised methods.
Findings
More accurate task graphs than supervised approaches
Effective in real-world activity scenarios
Applicable to datasets like ProceL and CrossTask
Abstract
This work explores the problem of generating task graphs of real-world activities. Different from prior formulations, we consider a setting where text transcripts of instructional videos performing a real-world activity (e.g., making coffee) are provided and the goal is to identify the key steps relevant to the task as well as the dependency relationship between these key steps. We propose a novel task graph generation approach that combines the reasoning capabilities of instruction-tuned language models along with clustering and ranking components to generate accurate task graphs in a completely unsupervised manner. We show that the proposed approach generates more accurate task graphs compared to a supervised learning approach on tasks from the ProceL and CrossTask datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Online Learning and Analytics · Topic Modeling
