SpatialLock: Precise Spatial Control in Text-to-Image Synthesis
Biao Liu, Yuanzhi Liang

TL;DR
SpatialLock introduces a novel framework for text-to-image synthesis that significantly improves object localization accuracy by leveraging perception signals and grounding information, enabling precise spatial control and higher visual quality.
Contribution
The paper presents SpatialLock, a new method combining position-guided learning and attention-based injection to enhance spatial control in text-to-image generation.
Findings
Achieves IOU scores above 0.9 across multiple datasets.
Sets a new state-of-the-art for object positioning accuracy.
Improves visual quality of generated images.
Abstract
Text-to-Image (T2I) synthesis has made significant advancements in recent years, driving applications such as generating datasets automatically. However, precise control over object localization in generated images remains a challenge. Existing methods fail to fully utilize positional information, leading to an inadequate understanding of object spatial layouts. To address this issue, we propose SpatialLock, a novel framework that leverages perception signals and grounding information to jointly control the generation of spatial locations. SpatialLock incorporates two components: Position-Engaged Injection (PoI) and Position-Guided Learning (PoG). PoI directly integrates spatial information through an attention layer, encouraging the model to learn the grounding information effectively. PoG employs perception-based supervision to further refine object localization. Together, these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis
