TL;DR
This paper introduces KVBench, a comprehensive benchmark for evaluating knowledge-intensive text-to-image models across science subjects, and proposes KE-Check, a framework to enhance scientific accuracy and reduce hallucinations in generated images.
Contribution
The paper presents KVBench, a new benchmark with expert-curated prompts for knowledge-intensive T2I tasks, and KE-Check, a novel two-stage framework to improve scientific fidelity in image generation.
Findings
Open-source models underperform proprietary ones in knowledge-intensive tasks.
KE-Check reduces scientific hallucinations and improves model fidelity.
Benchmark reveals deficiencies in logical reasoning and multilingual robustness.
Abstract
Recent text-to-image (T2I) models have demonstrated impressive capabilities in photorealistic synthesis and instruction following. However, their reliability in knowledge-intensive settings remains largely unexplored. Unlike natural image generation, knowledge visualization requires not only semantic alignment but also strict adherence to domain knowledge, structural constraints, and symbolic conventions, exposing a critical gap between visual plausibility and scientific correctness. To systematically study this problem, we introduce KVBench, a curriculum-grounded benchmark for evaluating knowledge-intensive T2I generation. KVBench covers six senior high-school subjects: Biology, Chemistry, Geography, History, Mathematics, and Physics. The benchmark consists of 1,800 expert-curated prompts derived from over 30 authoritative textbooks. Using this benchmark, we evaluate 14…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
