StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story   Continuation

Adyasha Maharana; Darryl Hannan; and Mohit Bansal

arXiv:2209.06192·cs.CV·September 14, 2022·5 cites

StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation

Adyasha Maharana, Darryl Hannan, and Mohit Bansal

PDF

Open Access 1 Repo

TL;DR

This paper introduces StoryDALL-E, a method to adapt pretrained text-to-image transformers for story continuation tasks, enabling better generalization to new narratives and characters through task-specific modules and fine-tuning strategies.

Contribution

The paper proposes a novel approach to adapt pretrained text-to-image models for story continuation, including task-specific modules and evaluation on multiple datasets, outperforming GAN-based models.

Findings

01

Retro-fitting improves story continuity and element copying.

02

Pretrained transformers struggle with multi-character narratives.

03

Fine-tuning enhances model performance on story datasets.

Abstract

Recent advances in text-to-image synthesis have led to large pretrained transformers with excellent capabilities to generate visualizations from a given text. However, these models are ill-suited for specialized tasks like story visualization, which requires an agent to produce a sequence of images given a corresponding sequence of captions, forming a narrative. Moreover, we find that the story visualization task fails to accommodate generalization to unseen plots and characters in new narratives. Hence, we first propose the task of story continuation, where the generated visual story is conditioned on a source image, allowing for better generalization to narratives with new characters. Then, we enhance or 'retro-fit' the pretrained text-to-image synthesis models with task-specific modules for (a) sequential image generation and (b) copying relevant elements from an initial frame. Then,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adymaharana/storydalle
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Human Motion and Animation