Unlocking the Latent Canvas: Eliciting and Benchmarking Symbolic Visual Expression in LLMs
Yiren Zheng, Shibo Li, Jiaming Liu, Haofan Wang, Yiren Song

TL;DR
This paper introduces SVE-ASCII, a framework for eliciting and benchmarking symbolic visual expression within Large Language Models using ASCII art, demonstrating that generative training improves visual understanding.
Contribution
It presents a novel ASCII art dataset, a unified instruction-tuning approach, and empirical evidence of the mutual reinforcement between visual perception and generation in LLMs.
Findings
Generative training enhances visual comprehension in LLMs.
The ASCIIArt-7K dataset enables systematic benchmarking.
A unified framework effectively elicits symbolic visual expression.
Abstract
Current multimodal approaches predominantly treat visual generation as an external process, relying on pixel rendering or code execution, thereby overlooking the native visual representation capabilities latent within Large Language Models (LLMs). In this work, we unlock this potential through ASCII art, a compact, efficient, and text-native visual format. We introduce SVE-ASCII, a unified framework designed to elicit and benchmark Symbolic Visual Expression directly within the pure text space. To address the scarcity of systematic resources, we construct ASCIIArt-7K, a high-quality dataset synthesized via a novel "Seed-and-Evolve" pipeline that augments human-curated anchors through in-context stylistic editing. We further implement a unified instruction-tuning strategy that jointly optimizes for both Generation (Text-to-ASCII) and Understanding (ASCII-to-Text). Crucially, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Digital Humanities and Scholarship
