SpatialBench-UC: Uncertainty-Aware Evaluation of Spatial Prompt Following in Text-to-Image Generation

Amine Rostane

arXiv:2601.13462·cs.AI·January 21, 2026

SpatialBench-UC: Uncertainty-Aware Evaluation of Spatial Prompt Following in Text-to-Image Generation

Amine Rostane

PDF

Open Access 1 Datasets

TL;DR

This paper introduces SpatialBench-UC, a benchmark for evaluating spatial prompt following in text-to-image models using an uncertainty-aware approach, enabling reproducible and calibrated assessments.

Contribution

The paper presents a new benchmark and evaluation framework for spatial prompt following, incorporating uncertainty and abstention to improve assessment accuracy.

Findings

01

Grounding methods improve pass rate and coverage.

02

Abstention remains significant due to missing detections.

03

Benchmark enables reproducible and calibrated evaluation.

Abstract

Evaluating whether text-to-image models follow explicit spatial instructions is difficult to automate. Object detectors may miss targets or return multiple plausible detections, and simple geometric tests can become ambiguous in borderline cases. Spatial evaluation is naturally a selective prediction problem, the checker may abstain when evidence is weak and report confidence so that results can be interpreted as a risk coverage tradeoff rather than a single score. We introduce SpatialBench-UC, a small, reproducible benchmark for pairwise spatial relations. The benchmark contains 200 prompts (50 object pairs times 4 relations) grouped into 100 counterfactual pairs obtained by swapping object roles. We release a benchmark package, versioned prompts, pinned configs, per-sample checker outputs, and report tables, enabling reproducible and auditable comparisons across models. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

aminerostane/spatialbench-uc
dataset· 8.4k dl
8.4k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Data Visualization and Analytics · Topic Modeling