Canvas-of-Thought: Grounding Reasoning via Mutable Structured States
Lingzhuang Sun, Yuxia Zhu, Ruitong Liu, Hao Liang, Zheng Sun, Caijun Jia, Honghao He, Yuchen Wu, Siyuan Li, Jingxuan Wei, Xiangxiang Zhang, Bihui Yu, Wentao Zhang

TL;DR
Canvas-of-Thought introduces an external visual reasoning substrate using HTML Canvas, enabling explicit state management and visual feedback, which enhances multimodal reasoning in large language models for complex tasks.
Contribution
The paper proposes Canvas-CoT, a novel framework that uses a HTML Canvas for explicit, in-place state updates and visual critique, improving reasoning efficiency and accuracy.
Findings
Outperforms existing baselines on VCode, RBench-V, and MathVista datasets.
Enables in-place state revisions without context disruption.
Provides explicit visual feedback to improve reasoning in high-dimensional domains.
Abstract
While Chain-of-Thought (CoT) prompting has significantly advanced the reasoning capabilities of Multimodal Large Language Models (MLLMs), relying solely on linear text sequences remains a bottleneck for complex tasks. We observe that even when auxiliary visual elements are interleaved, they are often treated as static snapshots within a one-dimensional, unstructured reasoning chain. We argue that such approaches treat reasoning history as an immutable stream: correcting a local error necessitates either generating verbose downstream corrections or regenerating the entire context. This forces the model to implicitly maintain and track state updates, significantly increasing token consumption and cognitive load. This limitation is particularly acute in high-dimensional domains, such as geometry and SVG design, where the textual expression of CoT lacks explicit visual guidance, further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Constraint Satisfaction and Optimization
