Thinking with Visual Abstract: Enhancing Multimodal Reasoning via Visual Abstraction
Dairu Liu, Ziyue Wang, Minyuan Ruan, Fuwen Luo, Chi Chen, Peng Li, Yang Liu

TL;DR
This paper introduces Visual Abstract Thinking (VAT), a novel paradigm that improves multimodal reasoning in large language models by focusing on essential visual elements, leading to better performance with fewer tokens.
Contribution
The paper proposes VAT as a new reasoning paradigm that enhances multimodal reasoning by emphasizing visual abstraction, outperforming traditional methods like Chain-of-Thought.
Findings
VAT improves multimodal reasoning performance by 2.21% over GPT-5 baseline.
VAT requires fewer tokens while achieving higher accuracy.
VAT outperforms Chain-of-Thought in visual perception tasks.
Abstract
Images usually convey richer detail than text, but often include redundant information, which potentially downgrades multimodal reasoning performance. When faced with lengthy or complex messages, humans tend to employ abstract thinking to convert them into simple and concise abstracts. Inspired by this cognitive strategy, we introduce a novel paradigm to elicit the ability to Think with Visual Abstract (VAT), by prompting Multimodal Large Language Models (MLLMs) with visual abstract instead of explicit verbal thoughts or elaborate guidance, permitting a more efficient visual reasoning mechanism via concentrated perception. VAT encourages models to focus on more essential visual elements, concepts and structural features by undermining redundant information compared with explicit thinking methods, such as Chain-of-thought (CoT) and tool-using approaches, that increase the complexity of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage, Metaphor, and Cognition · Innovative Teaching and Learning Methods
MethodsFocus
