BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation

Yan Li; Zezi Zeng; Ziwei Zhou; Xin Gao; Muzhao Tian; Yifan Yang; Mingxi Cheng; Qi Dai; Yuqing Yang; Lili Qiu; Zhendong Wang; Zhengyuan Yang; Xue Yang; Lijuan Wang; Ji Li; Chong Luo

arXiv:2603.25732·cs.CV·March 27, 2026

BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation

Yan Li, Zezi Zeng, Ziwei Zhou, Xin Gao, Muzhao Tian, Yifan Yang, Mingxi Cheng, Qi Dai, Yuqing Yang, Lili Qiu, Zhendong Wang, Zhengyuan Yang, Xue Yang, Lijuan Wang, Ji Li, Chong Luo

PDF

Open Access 1 Datasets

TL;DR

BizGenEval is a comprehensive benchmark designed to evaluate the performance of image generation models in complex, real-world commercial visual content creation tasks across multiple document types and capability dimensions.

Contribution

This work introduces BizGenEval, the first systematic benchmark specifically targeting commercial visual content generation with diverse tasks and human-verified evaluation metrics.

Findings

01

Significant performance gaps between current models and professional content creation needs.

02

Benchmark covers five document types and four key capability dimensions.

03

Evaluation of 26 popular image generation systems reveals areas for improvement.

Abstract

Recent advances in image generation models have expanded their applications beyond aesthetic imagery toward practical visual content creation. However, existing benchmarks mainly focus on natural image synthesis and fail to systematically evaluate models under the structured and multi-constraint requirements of real-world commercial design tasks. In this work, we introduce BizGenEval, a systematic benchmark for commercial visual content generation. The benchmark spans five representative document types: slides, charts, webpages, posters, and scientific figures, and evaluates four key capability dimensions: text rendering, layout control, attribute binding, and knowledge-based reasoning, forming 20 diverse evaluation tasks. BizGenEval contains 400 carefully curated prompts and 8000 human-verified checklist questions to rigorously assess whether generated images satisfy complex visual and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

microsoft/BizGenEval
dataset· 33 dl
33 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Aesthetic Perception and Analysis