Improving Generation and Evaluation of Visual Stories via Semantic   Consistency

Adyasha Maharana; Darryl Hannan; Mohit Bansal

arXiv:2105.10026·cs.CL·May 24, 2021

Improving Generation and Evaluation of Visual Stories via Semantic Consistency

Adyasha Maharana, Darryl Hannan, Mohit Bansal

PDF

1 Repo

TL;DR

This paper enhances visual story generation by introducing semantic consistency techniques, including dual learning, copy-transform mechanisms, and transformer models, leading to improved coherence, relevance, and evaluation methods.

Contribution

It proposes novel methods for improving visual story generation, focusing on semantic alignment, sequential consistency, and complex frame interactions, along with new evaluation metrics.

Findings

01

Improved visual coherence and relevance in generated stories.

02

Enhanced evaluation metrics correlating with human judgment.

03

Effective ablation of each proposed technique's impact.

Abstract

Story visualization is an under-explored task that falls at the intersection of many important research directions in both computer vision and natural language processing. In this task, given a series of natural language captions which compose a story, an agent must generate a sequence of images that correspond to the captions. Prior work has introduced recurrent generative models which outperform text-to-image synthesis models on this task. However, there is room for improvement of generated images in terms of visual quality, coherence and relevance. We present a number of improvements to prior modeling approaches, including (1) the addition of a dual learning framework that utilizes video captioning to reinforce the semantic alignment between the story and generated images, (2) a copy-transform mechanism for sequentially-consistent story visualization, and (3) MART-based transformers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adymaharana/StoryViz
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.