A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks

Angela S. Lin; Sudha Rao; Asli Celikyilmaz; Elnaz Nouri; Chris; Brockett; Debadeepta Dey; Bill Dolan

arXiv:2005.09606·cs.CL·May 20, 2020

A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks

Angela S. Lin, Sudha Rao, Asli Celikyilmaz, Elnaz Nouri, Chris, Brockett, Debadeepta Dey, Bill Dolan

PDF

1 Repo

TL;DR

This paper introduces a method to align multimodal recipe instructions across text and video sources, creating a large dataset that enhances understanding of procedural tasks through rich, commonsense aligned data.

Contribution

It presents an unsupervised alignment algorithm and a graph-based approach to align multiple text and video recipes, along with releasing a large, annotated dataset.

Findings

01

Successfully aligned 150K recipe instructions across modalities

02

Created a dataset with rich commonsense information for 4,262 dishes

03

Demonstrated the effectiveness of the alignment method

Abstract

Many high-level procedural tasks can be decomposed into sequences of instructions that vary in their order and choice of tools. In the cooking domain, the web offers many partially-overlapping text and video recipes (i.e. procedures) that describe how to make the same dish (i.e. high-level task). Aligning instructions for the same dish across different sources can yield descriptive visual explanations that are far richer semantically than conventional textual instructions, providing commonsense insight into how real-world procedures are structured. Learning to align these different instruction sets is challenging because: a) different recipes vary in their order of instructions and use of ingredients; and b) video instructions can be noisy and tend to contain far more information than text instructions. To address these challenges, we first use an unsupervised alignment algorithm that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/multimodal-aligned-recipe-corpus
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.