Graph2Vid: Flow graph to Video Grounding for Weakly-supervised Multi-Step Localization
Nikita Dvornik, Isma Hadji, Hai Pham, Dhaivat Bhatt, Brais Martinez,, Afsaneh Fazly, Allan D. Jepson

TL;DR
This paper introduces Graph2Vid, a method that uses procedure flow graphs to localize steps in instructional videos without requiring explicit step order annotations, improving efficiency and accuracy.
Contribution
The paper proposes a novel flow graph to video grounding approach that reduces annotation needs and infers step order directly from videos, advancing weakly-supervised multi-step localization.
Findings
Graph2Vid outperforms baselines in step localization accuracy.
The method reduces annotation time for training and testing.
Extended CrossTask dataset with flow graph info improves evaluation.
Abstract
In this work, we consider the problem of weakly-supervised multi-step localization in instructional videos. An established approach to this problem is to rely on a given list of steps. However, in reality, there is often more than one way to execute a procedure successfully, by following the set of steps in slightly varying orders. Thus, for successful localization in a given video, recent works require the actual order of procedure steps in the video, to be provided by human annotators at both training and test times. Instead, here, we only rely on generic procedural text that is not tied to a specific video. We represent the various ways to complete the procedure by transforming the list of instructions into a procedure flow graph which captures the partial order of steps. Using the flow graphs reduces both training and test time annotation requirements. To this end, we introduce the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques
MethodsTest
