Infinite-Story: A Training-Free Consistent Text-to-Image Generation

Jihun Park; Kyoungmin Lee; Jongmin Gim; Hyeonseo Jo; Minseok Oh; Wonhyeok Choi; Kyumin Hwang; Jaeyeul Kim; Minwoo Choi; Sunghoon Im

arXiv:2511.13002·cs.CV·November 18, 2025

Infinite-Story: A Training-Free Consistent Text-to-Image Generation

Jihun Park, Kyoungmin Lee, Jongmin Gim, Hyeonseo Jo, Minseok Oh, Wonhyeok Choi, Kyumin Hwang, Jaeyeul Kim, Minwoo Choi, Sunghoon Im

PDF

Open Access 1 Video

TL;DR

Infinite-Story is a training-free framework for consistent text-to-image generation in storytelling, addressing identity and style consistency issues with novel techniques, achieving state-of-the-art results with faster inference.

Contribution

The paper introduces a training-free, test-time method for consistent T2I generation that effectively maintains identity and style across multiple prompts, outperforming diffusion-based approaches.

Findings

01

Achieves state-of-the-art consistency in T2I generation.

02

Operates entirely at test time without training.

03

Over 6X faster inference than existing methods.

Abstract

We present Infinite-Story, a training-free framework for consistent text-to-image (T2I) generation tailored for multi-prompt storytelling scenarios. Built upon a scale-wise autoregressive model, our method addresses two key challenges in consistent T2I generation: identity inconsistency and style inconsistency. To overcome these issues, we introduce three complementary techniques: Identity Prompt Replacement, which mitigates context bias in text encoders to align identity attributes across prompts; and a unified attention guidance mechanism comprising Adaptive Style Injection and Synchronized Guidance Adaptation, which jointly enforce global style and identity appearance consistency while preserving prompt fidelity. Unlike prior diffusion-based approaches that require fine-tuning or suffer from slow inference, Infinite-Story operates entirely at test time, delivering high identity and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Infinite-Story: A Training-Free Consistent Text-to-Image Generation· underline

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis