GenSpace: Benchmarking Spatially-Aware Image Generation
Zehan Wang, Jiayang Xu, Ziang Zhang, Tianyu Pang, Chao Du, Hengshuang Zhao, Zhou Zhao

TL;DR
GenSpace introduces a benchmark and evaluation pipeline to assess the spatial awareness of AI image generators, revealing their limitations in understanding 3D scene geometry and object placement.
Contribution
This work presents a novel benchmark and specialized evaluation metric for measuring spatial faithfulness in image generation models, addressing gaps in existing assessments.
Findings
AI models struggle with 3D object placement and relationships.
Current models lack accurate understanding of object perspective.
Spatial perception limitations identified in state-of-the-art image generators.
Abstract
Humans can intuitively compose and arrange scenes in the 3D space for photography. However, can advanced AI image generators plan scenes with similar 3D spatial awareness when creating images from text or image prompts? We present GenSpace, a novel benchmark and evaluation pipeline to comprehensively assess the spatial awareness of current image generation models. Furthermore, standard evaluations using general Vision-Language Models (VLMs) frequently fail to capture the detailed spatial errors. To handle this challenge, we propose a specialized evaluation pipeline and metric, which reconstructs 3D scene geometry using multiple visual foundation models and provides a more accurate and human-aligned metric of spatial faithfulness. Our findings show that while AI models create visually appealing images and can follow general instructions, they struggle with specific 3D details like object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Aesthetic Perception and Analysis
