StoryBench: A Multifaceted Benchmark for Continuous Story Visualization

Emanuele Bugliarello; Hernan Moraldo; Ruben Villegas; Mohammad; Babaeizadeh; Mohammad Taghi Saffar; Han Zhang; Dumitru Erhan; Vittorio; Ferrari; Pieter-Jan Kindermans; Paul Voigtlaender

arXiv:2308.11606·cs.CV·October 13, 2023·1 cites

StoryBench: A Multifaceted Benchmark for Continuous Story Visualization

Emanuele Bugliarello, Hernan Moraldo, Ruben Villegas, Mohammad, Babaeizadeh, Mohammad Taghi Saffar, Han Zhang, Dumitru Erhan, Vittorio, Ferrari, Pieter-Jan Kindermans, Paul Voigtlaender

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

StoryBench is a comprehensive benchmark designed to evaluate and advance text-to-video generation models across multiple tasks, emphasizing realism, consistency, and adherence to prompts, with a focus on multi-task evaluation and human assessment.

Contribution

It introduces a new multi-task benchmark with annotated datasets for evaluating text-to-video models and provides guidelines for human evaluation and insights into automatic metrics.

Findings

01

Training on story-like data improves model performance.

02

Current models struggle with complex story generation tasks.

03

Guidelines help standardize human evaluation of video stories.

Abstract

Generating video stories from text prompts is a complex task. In addition to having high visual quality, videos need to realistically adhere to a sequence of text prompts whilst being consistent throughout the frames. Creating a benchmark for video generation requires data annotated over time, which contrasts with the single caption used often in video datasets. To fill this gap, we collect comprehensive human annotations on three existing datasets, and introduce StoryBench: a new, challenging multi-task benchmark to reliably evaluate forthcoming text-to-video models. Our benchmark includes three video generation tasks of increasing difficulty: action execution, where the next action must be generated starting from a conditioning video; story continuation, where a sequence of actions must be executed starting from a conditioning video; and story generation, where a video must be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google/storybench
pytorchOfficial

Datasets

ingoziegler/StoryFrames
dataset· 924 dl
924 dl

Videos

StoryBench: A Multifaceted Benchmark for Continuous Story Visualization· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition