SpatialLock: Precise Spatial Control in Text-to-Image Synthesis

Biao Liu; Yuanzhi Liang

arXiv:2511.04112·cs.CV·November 7, 2025

SpatialLock: Precise Spatial Control in Text-to-Image Synthesis

Biao Liu, Yuanzhi Liang

PDF

Open Access

TL;DR

SpatialLock introduces a novel framework for text-to-image synthesis that significantly improves object localization accuracy by leveraging perception signals and grounding information, enabling precise spatial control and higher visual quality.

Contribution

The paper presents SpatialLock, a new method combining position-guided learning and attention-based injection to enhance spatial control in text-to-image generation.

Findings

01

Achieves IOU scores above 0.9 across multiple datasets.

02

Sets a new state-of-the-art for object positioning accuracy.

03

Improves visual quality of generated images.

Abstract

Text-to-Image (T2I) synthesis has made significant advancements in recent years, driving applications such as generating datasets automatically. However, precise control over object localization in generated images remains a challenge. Existing methods fail to fully utilize positional information, leading to an inadequate understanding of object spatial layouts. To address this issue, we propose SpatialLock, a novel framework that leverages perception signals and grounding information to jointly control the generation of spatial locations. SpatialLock incorporates two components: Position-Engaged Injection (PoI) and Position-Guided Learning (PoG). PoI directly integrates spatial information through an attention layer, encouraging the model to learn the grounding information effectively. PoG employs perception-based supervision to further refine object localization. Together, these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis