Learning to Draw ASCII Improves Spatial Reasoning in Language Models
Shiyuan Huang, Li Liu, Jincheng He, Leilani H. Gilpin

TL;DR
Training language models to construct explicit ASCII spatial layouts from descriptions enhances their spatial reasoning abilities, with benefits transferring to external benchmarks.
Contribution
We introduce Text2Space, a dataset for training models on ASCII layout construction, improving spatial reasoning in language models without requiring ASCII output at inference.
Findings
ASCII layout construction training improves spatial reasoning accuracy.
Models trained on layout construction transfer gains to external spatial benchmarks.
Learning to draw ASCII layouts enhances models' spatial understanding beyond training data.
Abstract
When faced with complex spatial problems, humans naturally sketch layouts to organize their thinking, and the act of drawing further sharpens their understanding. In this work, we ask whether a similar principle holds for Large Language Models (LLMs): can learning to construct explicit visual layouts from spatial descriptions instill genuine spatial understanding? We introduce Text2Space, a dataset that pairs natural language descriptions with ground-truth ASCII grid layouts and spatial QA pairs, enabling us to separate failures in constructing spatial representations from failures in reasoning over them. We adopt ASCII because it is human-readable, operates entirely within the token space of language models, and encodes spatial relations in a structurally verifiable form. Our evaluation reveals a pronounced "Read-Write Asymmetry": LLMs interpret ASCII representations effectively but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
