A Closer Look at Temporal Ordering in the Segmentation of Instructional Videos
Anil Batra, Shreyank N Gowda, Frank Keller, Laura Sevilla-Lara

TL;DR
This paper improves procedure segmentation and summarization in instructional videos by introducing a new order-aware metric, a constrained matching algorithm, and multi-modal training, leading to significant performance gains.
Contribution
It proposes an order-aware segmentation metric, a differentiable order-constrained matching algorithm, and multi-modal feature training for better instructional video analysis.
Findings
Improved segmentation accuracy by ~7% on YouCook2.
Enhanced summarization performance by ~2.5%.
Demonstrated effectiveness on two instructional video datasets.
Abstract
Understanding the steps required to perform a task is an important skill for AI systems. Learning these steps from instructional videos involves two subproblems: (i) identifying the temporal boundary of sequentially occurring segments and (ii) summarizing these steps in natural language. We refer to this task as Procedure Segmentation and Summarization (PSS). In this paper, we take a closer look at PSS and propose three fundamental improvements over current methods. The segmentation task is critical, as generating a correct summary requires each step of the procedure to be correctly identified. However, current segmentation metrics often overestimate the segmentation quality because they do not consider the temporal order of segments. In our first contribution, we propose a new segmentation metric that takes into account the order of segments, giving a more reliable measure of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition
