TL;DR
ServImage introduces a comprehensive benchmark with datasets, scoring, and a payment prediction model to evaluate the commercial viability of image generation and editing models in real-world design projects.
Contribution
It provides the first large-scale, economically grounded benchmark for assessing image models' performance in commercial design contexts.
Findings
The dataset includes over 1,000 paid commercial tasks and 2,000 designer deliverables.
The scoring system combines requirements, visual quality, and commercial necessity.
The payment prediction model achieves 82% accuracy in predicting human payment decisions.
Abstract
Recent image generation and editing models demonstrate robust adherence to instructions and high visual quality on academic benchmarks. However, their performance on paid, real-world design projects remains uncertain. We introduce \textbf{ServImage}, a benchmark that explicitly correlates model outputs with economic value in commercial design projects. ServImage consists of (i) \textbf{\textit{ServImageBench}}: a dataset of 1.07k paid commercial design tasks and 2.05k designer deliverables totaling over $295k, covering portrait, product, and digital content, along with 33k candidate images and 33k human annotations. (ii) \textbf{\textit{ServImageScore}}: an integrated scoring system that combines three quality dimensions: baseline requirements fulfilment, visual execution quality, and commercial necessity satisfaction. These three dimensions are designed to characterize the factors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
