7Bench: a Comprehensive Benchmark for Layout-guided Text-to-image Models

Elena Izzo; Luca Parolari; Davide Vezzaro; Lamberto Ballan

arXiv:2508.12919·cs.CV·August 19, 2025

7Bench: a Comprehensive Benchmark for Layout-guided Text-to-image Models

Elena Izzo, Luca Parolari, Davide Vezzaro, Lamberto Ballan

PDF

Open Access

TL;DR

7Bench is a new comprehensive benchmark designed to evaluate both semantic and spatial alignment in layout-guided text-to-image models, addressing a critical gap in current assessment tools for spatial fidelity.

Contribution

This work introduces 7Bench, the first benchmark to jointly evaluate semantic and spatial alignment in layout-guided text-to-image generation models.

Findings

01

State-of-the-art models show varied strengths in spatial and semantic tasks.

02

The benchmark reveals limitations in current models' spatial fidelity.

03

Evaluation protocol effectively measures both semantic and spatial accuracy.

Abstract

Layout-guided text-to-image models offer greater control over the generation process by explicitly conditioning image synthesis on the spatial arrangement of elements. As a result, their adoption has increased in many computer vision applications, ranging from content creation to synthetic data generation. A critical challenge is achieving precise alignment between the image, textual prompt, and layout, ensuring semantic fidelity and spatial accuracy. Although recent benchmarks assess text alignment, layout alignment remains overlooked, and no existing benchmark jointly evaluates both. This gap limits the ability to evaluate a model's spatial fidelity, which is crucial when using layout-guided generation for synthetic data, as errors can introduce noise and degrade data quality. In this work, we introduce 7Bench, the first benchmark to assess both semantic and spatial alignment in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · 3D Modeling in Geospatial Applications · Image Processing and 3D Reconstruction