AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation

Zhaohe Liao; Kaixun Jiang; Zhihang Liu; Yujie Wei; Junqiu Yu; Quanhao Li; Hong-Tao Yu; Pandeng Li; Yuzheng Wang; Zhen Xing; Shiwei Zhang; Chen-Wei Xie; Yun Zheng; Xihui Liu

arXiv:2603.28068·cs.CV·April 1, 2026

AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation

Zhaohe Liao, Kaixun Jiang, Zhihang Liu, Yujie Wei, Junqiu Yu, Quanhao Li, Hong-Tao Yu, Pandeng Li, Yuzheng Wang, Zhen Xing, Shiwei Zhang, Chen-Wei Xie, Yun Zheng, Xihui Liu

PDF

1 Repo

TL;DR

AIBench is a novel benchmark that evaluates the logical correctness and aesthetics of academic illustrations generated by AI models using VQA and VLM techniques.

Contribution

It introduces the first benchmark for assessing visual-logical consistency in academic illustrations, highlighting the challenges and potential improvements in AI-generated scientific images.

Findings

01

VQA-based evaluation provides more accurate assessments of logical correctness.

02

Models show a significant performance gap on this complex reasoning task.

03

Test-time scaling improves both logical and aesthetic performance.

Abstract

Although image generation has boosted various applications via its rapid evolution, whether the state-of-the-art models are able to produce ready-to-use academic illustrations for papers is still largely unexplored. Directly comparing or evaluating the illustration with VLM is native but requires oracle multi-modal understanding ability, which is unreliable for long and complex texts and illustrations. To address this, we propose AIBench, the first benchmark using VQA for evaluating logic correctness of the academic illustrations and VLMs for assessing aesthetics. In detail, we designed four levels of questions proposed from a logic diagram summarized from the method part of the paper, which query whether the generated illustration aligns with the paper on different scales. Our VQA-based approach raises more accurate and detailed evaluations on visual-logical consistency while relying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ali-vilab/AIBench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.