From Pixels to Policies: Reinforcing Spatial Reasoning in Language Models for Content-Aware Layout Design
Sha Li, Stefano Petrangeli, Yu Shen, Xiang Chen

TL;DR
LaySPA is a reinforcement learning framework that enhances large language models with explicit spatial reasoning capabilities for content-aware graphic layout design, improving transparency and performance.
Contribution
It introduces a structured textual environment and a multi-objective critique for LLMs, enabling interpretable and effective spatial reasoning in layout design.
Findings
Outperforms larger proprietary LLMs in layout quality
Achieves comparable results to specialized SOTA layout generators
Requires fewer annotated samples and less computation
Abstract
We introduce LaySPA, a reinforcement learning framework that equips large language models (LLMs) with explicit and interpretable spatial reasoning for content-aware graphic layout design. LaySPA addresses two key challenges: LLMs' limited spatial reasoning and the lack of opacity in design decision making. Instead of operating at the pixel level, we reformulate layout design as a policy learning problem over a structured textual spatial environment that explicitly encodes canvas geometry, element attributes, and inter-element relationships. LaySPA produces dual-level outputs comprising interpretable reasoning traces and structured layout specifications, enabling transparent and controllable design decision making. Layout design policy is optimized via a multi-objective spatial critique that decomposes layout quality into geometric validity, relational coherence, and aesthetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInteractive and Immersive Displays · Multimodal Machine Learning Applications · 3D Shape Modeling and Analysis
