Storynizor: Consistent Story Generation via Inter-Frame Synchronized and   Shuffled ID Injection

Yuhang Ma; Wenting Xu; Chaoyi Zhao; Keqiang Sun; Qinfeng Jin; Zeng; Zhao; Changjie Fan; Zhipeng Hu

arXiv:2409.19624·cs.CV·October 1, 2024

Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection

Yuhang Ma, Wenting Xu, Chaoyi Zhao, Keqiang Sun, Qinfeng Jin, Zeng, Zhao, Changjie Fan, Zhipeng Hu

PDF

Open Access

TL;DR

Storynizor is a novel model that generates coherent stories with consistent characters, diverse poses, and vivid backgrounds by employing inter-frame synchronization and ID injection techniques, supported by a new large-scale dataset.

Contribution

The paper introduces Storynizor, a new model with ID-Synchronizer and ID-Injector modules, and a large dataset, enabling high-quality, consistent story image generation.

Findings

01

Superior character consistency and background fidelity.

02

Effective pose variation and diversity.

03

Outperforms existing methods in coherence and quality.

Abstract

Recent advances in text-to-image diffusion models have spurred significant interest in continuous story image generation. In this paper, we introduce Storynizor, a model capable of generating coherent stories with strong inter-frame character consistency, effective foreground-background separation, and diverse pose variation. The core innovation of Storynizor lies in its key modules: ID-Synchronizer and ID-Injector. The ID-Synchronizer employs an auto-mask self-attention module and a mask perceptual loss across inter-frame images to improve the consistency of character generation, vividly representing their postures and backgrounds. The ID-Injector utilize a Shuffling Reference Strategy (SRS) to integrate ID features into specific locations, enhancing ID-based consistent character generation. Additionally, to facilitate the training of Storynizor, we have curated a novel dataset called…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Topic Modeling · Music and Audio Processing

MethodsDiffusion