DeCorStory: Gram-Schmidt Prompt Embedding Decorrelation for Consistent Storytelling
Ayushman Sarkar, Zhenyu Yu, Mohd Yamani Idna Idris

TL;DR
DeCorStory introduces a training-free, inference-time method that decorrelates prompt embeddings to improve visual and semantic consistency in text-to-image storytelling, outperforming existing approaches.
Contribution
It proposes Gram-Schmidt prompt embedding decorrelation and identity-preserving techniques that enhance storytelling consistency without model training or fine-tuning.
Findings
Improves prompt-image alignment
Enhances identity consistency
Increases visual diversity
Abstract
Maintaining visual and semantic consistency across frames is a key challenge in text-to-image storytelling. Existing training-free methods, such as One-Prompt-One-Story, concatenate all prompts into a single sequence, which often induces strong embedding correlation and leads to color leakage, background blending, and identity drift. We propose DeCorStory, a training-free inference-time framework that explicitly reduces inter-frame semantic interference. DeCorStory applies Gram-Schmidt prompt embedding decorrelation to orthogonalize frame-level semantics, followed by singular value reweighting to strengthen prompt-specific information and identity-preserving cross-attention to stabilize character identity during diffusion. The method requires no model modification or fine-tuning and can be seamlessly integrated into existing diffusion pipelines. Experiments demonstrate consistent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
