FAGER: Factually Grounded Evaluation and Refinement of Text-to-Image Models

Youngsun Lim; Cusuh Ham; Pin-Yu Chen; Deepti Ghadiyaram

arXiv:2605.19111·cs.CV·May 20, 2026

FAGER: Factually Grounded Evaluation and Refinement of Text-to-Image Models

Youngsun Lim, Cusuh Ham, Pin-Yu Chen, Deepti Ghadiyaram

PDF

TL;DR

FAGER is a new framework for evaluating and improving text-to-image models by focusing on factual correctness grounded in prompts, outperforming previous metrics across diverse datasets.

Contribution

FAGER introduces a structured factual evaluation method combining LLMs and visual verification, and enables training-free refinement of generated images for enhanced factual accuracy.

Findings

01

FAGER outperforms prior metrics in factuality preference tests across multiple datasets.

02

FAGER can refine T2I outputs without additional training, improving factual correctness.

03

The framework effectively evaluates factual grounding in science, history, products, and culture contexts.

Abstract

Existing text-to-image (T2I) evaluation metrics mainly assess whether generated images align with information explicitly stated in the prompt, but often fail to capture factual requirements that are implicit, externally grounded, or identity-defining. As a result, they are not well suited for evaluating factual correctness in prompts involving scientific knowledge, historical facts, products, or culture-specific concepts. We propose FActually Grounded Evaluation and Refinement (FAGER), an agentic framework that evaluates whether generated images correctly reflect visually verifiable facts grounded in or implied by the prompt, while also providing actionable feedback for improvement. FAGER first constructs a structured factual rubric by combining LLM-based fact proposal with reference-guided visual fact extraction and verification, then converts the rubric into question-answer pairs for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.