TL;DR
RealDiffusion is a novel diffusion-based framework that enhances multi-character storybook generation by maintaining character coherence and narrative dynamism through physics-informed attention and a dissipative prior.
Contribution
It introduces a training-free, physics-informed attention mechanism and a heat diffusion prior to improve coherence and dynamism in sequential image generation.
Findings
Achieves better character identity preservation across frames.
Maintains scene and pose evolution without sacrificing coherence.
Outperforms existing methods in narrative consistency and visual quality.
Abstract
While modern diffusion models excel at generating diverse single images, extending this to sequential generation reveals a fundamental challenge: balancing narrative dynamism with multi-character coherence. Existing methods often falter at this trade-off, leading to artifacts where characters lose their identity or the story stagnates. To resolve this critical tension, we introduce RealDiffusion, a unified framework designed to reconcile robust coherence with narrative dynamism. Heat diffusion serves as a dissipative prior that averages neighboring features along the sequence and removes high-frequency noise within the subject region. This suppresses attribute drift and stabilizes identity across frames. A region-aware stochastic process then introduces small perturbations that explore nearby modes and prevent collapse so the story maintains pose change and scene evolution. We thus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
