StorySync: Training-Free Subject Consistency in Text-to-Image Generation via Region Harmonization

Gopalji Gaur; Mohammadreza Zolfaghari; Thomas Brox

arXiv:2508.03735·cs.CV·August 7, 2025

StorySync: Training-Free Subject Consistency in Text-to-Image Generation via Region Harmonization

Gopalji Gaur, Mohammadreza Zolfaghari, Thomas Brox

PDF

TL;DR

StorySync offers a training-free method for maintaining subject consistency in text-to-image generation, using region harmonization and attention sharing to produce coherent visual stories without retraining models.

Contribution

It introduces a novel, training-free approach combining masked cross-image attention sharing and regional feature harmonization for subject consistency in diffusion models.

Findings

01

Achieves consistent subjects across story scenes

02

Maintains creative diversity of generated images

03

Operates efficiently without model fine-tuning

Abstract

Generating a coherent sequence of images that tells a visual story, using text-to-image diffusion models, often faces the critical challenge of maintaining subject consistency across all story scenes. Existing approaches, which typically rely on fine-tuning or retraining models, are computationally expensive, time-consuming, and often interfere with the model's pre-existing capabilities. In this paper, we follow a training-free approach and propose an efficient consistent-subject-generation method. This approach works seamlessly with pre-trained diffusion models by introducing masked cross-image attention sharing to dynamically align subject features across a batch of images, and Regional Feature Harmonization to refine visually similar details for improved subject consistency. Experimental results demonstrate that our approach successfully generates visually consistent subjects across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.