Thinking with Drafting: Optical Decompression via Logical Reconstruction

Jingxuan Wei; Honghao He; Caijun Jia; Siyuan Li; Zheng Sun; Yuhang Xu; Yuanyuan Lin; Linzhuang Sun; Yuchen Wu; Bihui Yu; Xiangxiang Zhang; Cheng Tan

arXiv:2602.11731·cs.CL·April 30, 2026

Thinking with Drafting: Optical Decompression via Logical Reconstruction

Jingxuan Wei, Honghao He, Caijun Jia, Siyuan Li, Zheng Sun, Yuhang Xu, Yuanyuan Lin, Linzhuang Sun, Yuchen Wu, Bihui Yu, Xiangxiang Zhang, Cheng Tan

PDF

TL;DR

This paper introduces Thinking with Drafting (TwD), a novel approach that reconstructs logical structures from visual tokens to improve reasoning accuracy in multimodal large language models.

Contribution

It proposes a new framework that drafts mental models into executable code for deterministic verification, bridging the gap between perception and logical reasoning.

Findings

01

TwD outperforms standard methods on the VisAlg benchmark.

02

Visual generation is used as a logical verifier, not just creative output.

03

The approach enhances logical accuracy in complex visual reasoning tasks.

Abstract

Existing multimodal large language models have achieved high-fidelity visual perception and exploratory visual generation. However, a precision paradox persists in complex reasoning tasks: optical perception systems transcribe symbols without capturing logical topology, while pixel-based generative models produce visual artifacts lacking mathematical exactness. To bridge this gap, we propose that reasoning over visual inputs be reconceptualized as optical decompression-the process of reconstructing latent logical structures from compressed visual tokens. Guided by the axiom that Parsing is Reasoning, we introduce Thinking with Drafting (TwD), which utilizes a minimalist Domain-Specific Language (DSL) as a grounding intermediate representation. Unlike standard approaches that hallucinate answers directly, TwD forces the model to draft its mental model into executable code, rendering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.