TL;DW? Summarizing Instructional Videos with Task Relevance &   Cross-Modal Saliency

Medhini Narasimhan; Arsha Nagrani; Chen Sun; Michael Rubinstein,; Trevor Darrell; Anna Rohrbach; Cordelia Schmid

arXiv:2208.06773·cs.CV·August 16, 2022

TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency

Medhini Narasimhan, Arsha Nagrani, Chen Sun, Michael Rubinstein,, Trevor Darrell, Anna Rohrbach, Cordelia Schmid

PDF

Open Access

TL;DR

This paper introduces a novel method for summarizing instructional videos by leveraging task relevance and cross-modal saliency, using pseudo summaries for training and a new benchmark for evaluation.

Contribution

It proposes an automatic pseudo summary generation approach and a new summarization network tailored for instructional videos, improving over existing methods.

Findings

01

Outperforms baseline models on the WikiHow Summaries dataset.

02

Uses pseudo summaries for weak supervision effectively.

03

Demonstrates the importance of task relevance and cross-modal cues in summarization.

Abstract

YouTube users looking for instructions for a specific task may spend a long time browsing content trying to find the right video that matches their needs. Creating a visual summary (abridged version of a video) provides viewers with a quick overview and massively reduces search time. In this work, we focus on summarizing instructional videos, an under-explored area of video summarization. In comparison to generic videos, instructional videos can be parsed into semantically meaningful segments that correspond to important steps of the demonstrated task. Existing video summarization datasets rely on manual frame-level annotations, making them subjective and limited in size. To overcome this, we first automatically generate pseudo summaries for a corpus of instructional videos by exploiting two key assumptions: (i) relevant steps are likely to appear in multiple videos of the same task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

MethodsTest