Unlocking the Latent Canvas: Eliciting and Benchmarking Symbolic Visual Expression in LLMs

Yiren Zheng; Shibo Li; Jiaming Liu; Haofan Wang; Yiren Song

arXiv:2603.14505·cs.CV·March 17, 2026

Unlocking the Latent Canvas: Eliciting and Benchmarking Symbolic Visual Expression in LLMs

Yiren Zheng, Shibo Li, Jiaming Liu, Haofan Wang, Yiren Song

PDF

Open Access

TL;DR

This paper introduces SVE-ASCII, a framework for eliciting and benchmarking symbolic visual expression within Large Language Models using ASCII art, demonstrating that generative training improves visual understanding.

Contribution

It presents a novel ASCII art dataset, a unified instruction-tuning approach, and empirical evidence of the mutual reinforcement between visual perception and generation in LLMs.

Findings

01

Generative training enhances visual comprehension in LLMs.

02

The ASCIIArt-7K dataset enables systematic benchmarking.

03

A unified framework effectively elicits symbolic visual expression.

Abstract

Current multimodal approaches predominantly treat visual generation as an external process, relying on pixel rendering or code execution, thereby overlooking the native visual representation capabilities latent within Large Language Models (LLMs). In this work, we unlock this potential through ASCII art, a compact, efficient, and text-native visual format. We introduce SVE-ASCII, a unified framework designed to elicit and benchmark Symbolic Visual Expression directly within the pure text space. To address the scarcity of systematic resources, we construct ASCIIArt-7K, a high-quality dataset synthesized via a novel "Seed-and-Evolve" pipeline that augments human-curated anchors through in-context stylistic editing. We further implement a unified instruction-tuning strategy that jointly optimizes for both Generation (Text-to-ASCII) and Understanding (ASCII-to-Text). Crucially, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Digital Humanities and Scholarship