# UI-Bench: A Benchmark for Evaluating Design Capabilities of AI Text-to-App Tools

**Authors:** Sam Jung, Agustin Garcinuno, Spencer Mateega

arXiv: 2508.20410 · 2025-09-05

## TL;DR

UI-Bench is a comprehensive benchmark that evaluates the visual quality of AI text-to-app tools through expert comparisons, establishing a standard for AI-driven web design evaluation.

## Contribution

It introduces the first large-scale, reproducible benchmark with a ranking system for AI text-to-app tools, including an open-source framework and public leaderboard.

## Key findings

- UI-Bench evaluates 10 tools across 30 prompts and 300 sites.
- The benchmark uses a TrueSkill model for ranking with confidence intervals.
- It provides a reproducible standard and resources for future AI web design research.

## Abstract

AI text-to-app tools promise high quality applications and websites in minutes, yet no public benchmark rigorously verifies those claims. We introduce UI-Bench, the first large-scale benchmark that evaluates visual excellence across competing AI text-to-app tools through expert pairwise comparison. Spanning 10 tools, 30 prompts, 300 generated sites, and 4,000+ expert judgments, UI-Bench ranks systems with a TrueSkill-derived model that yields calibrated confidence intervals. UI-Bench establishes a reproducible standard for advancing AI-driven web design. We release (i) the complete prompt set, (ii) an open-source evaluation framework, and (iii) a public leaderboard. The generated sites rated by participants will be released soon. View the UI-Bench leaderboard at https://uibench.ai/leaderboard.

---
Source: https://tomesphere.com/paper/2508.20410