DeCorStory: Gram-Schmidt Prompt Embedding Decorrelation for Consistent Storytelling

Ayushman Sarkar; Zhenyu Yu; Mohd Yamani Idna Idris

arXiv:2602.01306·cs.CV·February 3, 2026

DeCorStory: Gram-Schmidt Prompt Embedding Decorrelation for Consistent Storytelling

Ayushman Sarkar, Zhenyu Yu, Mohd Yamani Idna Idris

PDF

Open Access

TL;DR

DeCorStory introduces a training-free, inference-time method that decorrelates prompt embeddings to improve visual and semantic consistency in text-to-image storytelling, outperforming existing approaches.

Contribution

It proposes Gram-Schmidt prompt embedding decorrelation and identity-preserving techniques that enhance storytelling consistency without model training or fine-tuning.

Findings

01

Improves prompt-image alignment

02

Enhances identity consistency

03

Increases visual diversity

Abstract

Maintaining visual and semantic consistency across frames is a key challenge in text-to-image storytelling. Existing training-free methods, such as One-Prompt-One-Story, concatenate all prompts into a single sequence, which often induces strong embedding correlation and leads to color leakage, background blending, and identity drift. We propose DeCorStory, a training-free inference-time framework that explicitly reduces inter-frame semantic interference. DeCorStory applies Gram-Schmidt prompt embedding decorrelation to orthogonalize frame-level semantics, followed by singular value reweighting to strengthen prompt-specific information and identity-preserving cross-attention to stabilize character identity during diffusion. The method requires no model modification or fine-tuning and can be seamlessly integrated into existing diffusion pipelines. Experiments demonstrate consistent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning