DynT2I-Eval: A Dynamic Evaluation Framework for Text-to-Image Models
Juntong Wang, Jiarui Wang, Huiyu Duan, Lewei Li, Guangtao Zhai, Xiongkuo Min

TL;DR
DynT2I-Eval introduces a dynamic, automated evaluation framework for text-to-image models that generates fresh prompts to assess performance more robustly and reduce overfitting.
Contribution
It proposes a novel dynamic evaluation framework that constructs controllable prompt spaces and maintains a stable online leaderboard for T2I models.
Findings
Continually refreshed prompts improve evaluation robustness.
The ranking framework balances convergence, discovery, and fidelity.
Experiments show reduced overfitting and more reliable benchmarking.
Abstract
Existing text-to-image (T2I) benchmarks largely rely on fixed prompt sets, leaving them vulnerable to overfitting and benchmark contamination once publicly released and repeatedly reused. In this work, we propose DynT2I-Eval, a fully automated dynamic evaluation framework for T2I models. It constructs a structured visual semantic space from long-form descriptions, decomposing prompts into controllable dimensions (e.g., subject, logical constraint, environment, and composition). This enables the continuous generation of fresh prompts via task-specific spaces and difficulty-aware sampling. DynT2I-Eval evaluates model performance across text alignment, perceptual quality, and aesthetics. Heterogeneous outputs are unified into prompt-conditioned pairwise comparisons, allowing a dynamic scheduler, micro-batch aggregation, and weighted Bayesian updates to maintain a stable online leaderboard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
