DynT2I-Eval: A Dynamic Evaluation Framework for Text-to-Image Models

Juntong Wang; Jiarui Wang; Huiyu Duan; Lewei Li; Guangtao Zhai; Xiongkuo Min

arXiv:2605.06170·cs.CV·May 8, 2026

DynT2I-Eval: A Dynamic Evaluation Framework for Text-to-Image Models

Juntong Wang, Jiarui Wang, Huiyu Duan, Lewei Li, Guangtao Zhai, Xiongkuo Min

PDF

TL;DR

DynT2I-Eval introduces a dynamic, automated evaluation framework for text-to-image models that generates fresh prompts to assess performance more robustly and reduce overfitting.

Contribution

It proposes a novel dynamic evaluation framework that constructs controllable prompt spaces and maintains a stable online leaderboard for T2I models.

Findings

01

Continually refreshed prompts improve evaluation robustness.

02

The ranking framework balances convergence, discovery, and fidelity.

03

Experiments show reduced overfitting and more reliable benchmarking.

Abstract

Existing text-to-image (T2I) benchmarks largely rely on fixed prompt sets, leaving them vulnerable to overfitting and benchmark contamination once publicly released and repeatedly reused. In this work, we propose DynT2I-Eval, a fully automated dynamic evaluation framework for T2I models. It constructs a structured visual semantic space from long-form descriptions, decomposing prompts into controllable dimensions (e.g., subject, logical constraint, environment, and composition). This enables the continuous generation of fresh prompts via task-specific spaces and difficulty-aware sampling. DynT2I-Eval evaluates model performance across text alignment, perceptual quality, and aesthetics. Heterogeneous outputs are unified into prompt-conditioned pairwise comparisons, allowing a dynamic scheduler, micro-batch aggregation, and weighted Bayesian updates to maintain a stable online leaderboard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.