RAISE: Requirement-Adaptive Evolutionary Refinement for Training-Free Text-to-Image Alignment

Liyao Jiang; Ruichen Chen; Chao Gao; Di Niu

arXiv:2603.00483·cs.CV·March 3, 2026

RAISE: Requirement-Adaptive Evolutionary Refinement for Training-Free Text-to-Image Alignment

Liyao Jiang, Ruichen Chen, Chao Gao, Di Niu

PDF

Open Access

TL;DR

RAISE introduces a training-free, adaptive evolutionary framework that improves text-to-image alignment by dynamically refining images at inference time based on requirement satisfaction, reducing computation and enhancing fidelity.

Contribution

It presents a novel requirement-driven evolutionary method for inference-time image refinement that adapts to prompt complexity without additional training or fine-tuning.

Findings

01

Achieves state-of-the-art alignment scores on GenEval with fewer samples.

02

Reduces generated samples by 30-40% and VLM calls by 80% compared to prior methods.

03

Demonstrates effective, model-agnostic self-improvement across benchmarks.

Abstract

Recent text-to-image (T2I) diffusion models achieve remarkable realism, yet faithful prompt-image alignment remains challenging, particularly for complex prompts with multiple objects, relations, and fine-grained attributes. Existing training-free inference-time scaling methods rely on fixed iteration budgets that cannot adapt to prompt difficulty, while reflection-tuned models require carefully curated reflection datasets and extensive joint fine-tuning of diffusion and vision-language models, often overfitting to reflection paths data and lacking transferability across models. We introduce RAISE (Requirement-Adaptive Self-Improving Evolution), a training-free, requirement-driven evolutionary framework for adaptive T2I generation. RAISE formulates image generation as a requirement-driven adaptive scaling process, evolving a population of candidates at inference time through a diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning