Benchmarking Layout-Guided Diffusion Models through Unified Semantic-Spatial Evaluation in Closed and Open Settings

Luca Parolari; Nicla Faccioli; Lamberto Ballan

arXiv:2604.25358·cs.CV·April 29, 2026

Benchmarking Layout-Guided Diffusion Models through Unified Semantic-Spatial Evaluation in Closed and Open Settings

Luca Parolari, Nicla Faccioli, Lamberto Ballan

PDF

1 Repo

TL;DR

This paper introduces comprehensive benchmarks and a unified evaluation protocol for assessing layout-guided text-to-image diffusion models, focusing on semantic and spatial alignment in both controlled and real-world scenarios.

Contribution

It presents two new benchmarks, C-Bench and O-Bench, and a unified evaluation method to systematically compare layout-guided diffusion models.

Findings

01

Large-scale evaluation of six state-of-the-art models conducted

02

Model rankings based on overall performance and detailed alignment analysis

03

Fine-grained insights into strengths and limitations of current models

Abstract

Evaluating layout-guided text-to-image generative models requires assessing both semantic alignment with textual prompts and spatial fidelity to prescribed layouts. Assessing layout alignment requires collecting fine-grained annotations, which is costly and labor-intensive. Consequently, current benchmarks rarely provide comprehensive layout evaluation and often remain limited in scale or coverage, making model comparison, ranking, and interpretation difficult. In this work, we introduce a closed-set benchmark (C-Bench) designed to isolate key generative capabilities while providing varying levels of complexity in both prompt structure and layout. To complement this controlled setting, we propose an open-set benchmark (O-Bench) that evaluates models using real-world prompts and layouts, offering a measure of semantic and spatial alignment in the wild. We further develop a unified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lparolari/cobench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.