Plot and Rework: Modeling Storylines for Visual Storytelling

Chi-Yang Hsu; Yun-Wei Chu; Ting-Hao 'Kenneth' Huang; Lun-Wei Ku

arXiv:2105.06950·cs.CL·July 8, 2021

Plot and Rework: Modeling Storylines for Visual Storytelling

Chi-Yang Hsu, Yun-Wei Chu, Ting-Hao 'Kenneth' Huang, Lun-Wei Ku

PDF

Open Access 1 Repo

TL;DR

This paper presents PR-VIST, a novel framework for visual storytelling that models storylines as graphs and iteratively refines stories, resulting in more diverse, coherent, and human-like narratives.

Contribution

It introduces a story graph approach and an iterative training process to improve visual storytelling quality over existing methods.

Findings

01

Stories generated are more diverse and coherent.

02

Human evaluations favor PR-VIST over baselines.

03

Ablation shows plotting and reworking are crucial.

Abstract

Writing a coherent and engaging story is not easy. Creative writers use their knowledge and worldview to put disjointed elements together to form a coherent storyline, and work and rework iteratively toward perfection. Automated visual storytelling (VIST) models, however, make poor use of external knowledge and iterative generation when attempting to create stories. This paper introduces PR-VIST, a framework that represents the input image sequence as a story graph in which it finds the best path to form a storyline. PR-VIST then takes this path and learns to generate the final story via an iterative training process. This framework produces stories that are superior in terms of diversity, coherence, and humanness, per both automatic and human evaluations. An ablation study shows that both plotting and reworking contribute to the model's superiority.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ethan5437/PR-VIST
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Games · Video Analysis and Summarization