VILT: Video Instructions Linking for Complex Tasks

Sophie Fischer; Carlos Gemmell; Iain Mackie; Jeffrey Dalton

arXiv:2208.10858·cs.IR·August 24, 2022

VILT: Video Instructions Linking for Complex Tasks

Sophie Fischer, Carlos Gemmell, Iain Mackie, Jeffrey Dalton

PDF

5 Repos

TL;DR

This paper introduces VILT, a task of linking instructional videos to complex task steps, demonstrating its effectiveness in improving interactive cooking assistance through a new benchmark, retrieval methods, and user studies.

Contribution

It presents the VILT task, a new benchmark dataset, and evaluates retrieval methods and user experience for linking instructional videos to complex tasks.

Findings

01

Dense retrieval with ANCE achieves best retrieval performance.

02

Users learn more effectively with manually linked videos.

03

Automatically linked videos still significantly aid task learning.

Abstract

This work addresses challenges in developing conversational assistants that support rich multimodal video interactions to accomplish real-world tasks interactively. We introduce the task of automatically linking instructional videos to task steps as "Video Instructions Linking for Complex Tasks" (VILT). Specifically, we focus on the domain of cooking and empowering users to cook meals interactively with a video-enabled Alexa skill. We create a reusable benchmark with 61 queries from recipe tasks and curate a collection of 2,133 instructional "How-To" cooking videos. Studying VILT with state-of-the-art retrieval methods, we find that dense retrieval with ANCE is the most effective, achieving an NDCG@3 of 0.566 and P@1 of 0.644. We also conduct a user study that measures the effect of incorporating videos in a real-world task setting, where 10 participants perform several cooking tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.