Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection
Yuhang Ma, Wenting Xu, Chaoyi Zhao, Keqiang Sun, Qinfeng Jin, Zeng, Zhao, Changjie Fan, Zhipeng Hu

TL;DR
Storynizor is a novel model that generates coherent stories with consistent characters, diverse poses, and vivid backgrounds by employing inter-frame synchronization and ID injection techniques, supported by a new large-scale dataset.
Contribution
The paper introduces Storynizor, a new model with ID-Synchronizer and ID-Injector modules, and a large dataset, enabling high-quality, consistent story image generation.
Findings
Superior character consistency and background fidelity.
Effective pose variation and diversity.
Outperforms existing methods in coherence and quality.
Abstract
Recent advances in text-to-image diffusion models have spurred significant interest in continuous story image generation. In this paper, we introduce Storynizor, a model capable of generating coherent stories with strong inter-frame character consistency, effective foreground-background separation, and diverse pose variation. The core innovation of Storynizor lies in its key modules: ID-Synchronizer and ID-Injector. The ID-Synchronizer employs an auto-mask self-attention module and a mask perceptual loss across inter-frame images to improve the consistency of character generation, vividly representing their postures and backgrounds. The ID-Injector utilize a Shuffling Reference Strategy (SRS) to integrate ID features into specific locations, enhancing ID-based consistent character generation. Additionally, to facilitate the training of Storynizor, we have curated a novel dataset called…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Topic Modeling · Music and Audio Processing
MethodsDiffusion
