RealDiffusion: Physics-informed Attention for Multi-character Storybook Generation

Qi Zhao; Jun Chen; Ivor Tsang; Guang Dai

arXiv:2605.11927·cs.CV·May 13, 2026

RealDiffusion: Physics-informed Attention for Multi-character Storybook Generation

Qi Zhao, Jun Chen, Ivor Tsang, Guang Dai

PDF

1 Repo

TL;DR

RealDiffusion is a novel diffusion-based framework that enhances multi-character storybook generation by maintaining character coherence and narrative dynamism through physics-informed attention and a dissipative prior.

Contribution

It introduces a training-free, physics-informed attention mechanism and a heat diffusion prior to improve coherence and dynamism in sequential image generation.

Findings

01

Achieves better character identity preservation across frames.

02

Maintains scene and pose evolution without sacrificing coherence.

03

Outperforms existing methods in narrative consistency and visual quality.

Abstract

While modern diffusion models excel at generating diverse single images, extending this to sequential generation reveals a fundamental challenge: balancing narrative dynamism with multi-character coherence. Existing methods often falter at this trade-off, leading to artifacts where characters lose their identity or the story stagnates. To resolve this critical tension, we introduce RealDiffusion, a unified framework designed to reconcile robust coherence with narrative dynamism. Heat diffusion serves as a dissipative prior that averages neighboring features along the sequence and removes high-frequency noise within the subject region. This suppresses attribute drift and stabilizes identity across frames. A region-aware stochastic process then introduces small perturbations that explore nearby modes and prevent collapse so the story maintains pose change and scene evolution. We thus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ShmilyQi-CN/RealDiffusion
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.