Make-A-Story: Visual Memory Conditioned Consistent Story Generation

Tanzila Rahman; Hsin-Ying Lee; Jian Ren; Sergey Tulyakov; Shweta; Mahajan; Leonid Sigal

arXiv:2211.13319·cs.CV·May 9, 2023·1 cites

Make-A-Story: Visual Memory Conditioned Consistent Story Generation

Tanzila Rahman, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Shweta, Mahajan, Leonid Sigal

PDF

Open Access 1 Repo

TL;DR

This paper introduces Make-A-Story, a diffusion-based model with visual memory for coherent multi-frame story visualization, effectively handling references and maintaining scene consistency across complex storylines.

Contribution

It presents a novel autoregressive diffusion framework with a visual memory module and reference-aware attention for improved story visualization.

Findings

01

Outperforms prior methods in visual quality and story consistency.

02

Effectively models actor and background references across frames.

03

Validated on extended MUGEN, PororoSV, and FlintstonesSV datasets.

Abstract

There has been a recent explosion of impressive generative models that can produce high quality images (or videos) conditioned on text descriptions. However, all such approaches rely on conditional sentences that contain unambiguous descriptions of scenes and main actors in them. Therefore employing such models for more complex task of story visualization, where naturally references and co-references exist, and one requires to reason about when to maintain consistency of actors and backgrounds across frames/scenes, and when not to, based on story progression, remains a challenge. In this work, we address the aforementioned challenges and propose a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background context across the generated frames. Sentence-conditioned soft attention over the memories enables effective reference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ubc-vision/make-a-story
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Digital Storytelling and Education