AStar: Boosting Multimodal Reasoning with Automated Structured Thinking

Jinyang Wu; Mingkuan Feng; Guocheng Zhai; Shuai Zhang; Zheng Lian; Fangrui Lv; Pengpeng Shao; Ruihan Jin; Zhengqi Wen; Jianhua Tao

arXiv:2502.02339·cs.CL·March 3, 2026

AStar: Boosting Multimodal Reasoning with Automated Structured Thinking

Jinyang Wu, Mingkuan Feng, Guocheng Zhai, Shuai Zhang, Zheng Lian, Fangrui Lv, Pengpeng Shao, Ruihan Jin, Zhengqi Wen, Jianhua Tao

PDF

Open Access 1 Video

TL;DR

AStar is a training-free, structured thinking framework that enhances multimodal reasoning by adaptively using thought cards, leading to improved accuracy and transferability without extensive search or retraining.

Contribution

It introduces a novel thought card library and an adaptive retrieval method, enabling efficient, plug-and-play multimodal reasoning without additional training.

Findings

01

Achieves 53.9% accuracy on MathVerse, surpassing GPT-4o.

02

Attains 32.7% accuracy on MathVision, outperforming GPT-4o.

03

Thought cards transfer effectively across different reasoning tasks.

Abstract

Multimodal large language models excel across diverse domains but struggle with complex visual reasoning tasks. To enhance their reasoning capabilities, current approaches typically rely on explicit search or post-training techniques. However, search-based methods suffer from computational inefficiency due to extensive solution space exploration, while post-training methods demand substantial data, computational resources, and often exhibit training instability. To address these challenges, we propose \textbf{AStar}, a training-free, \textbf{A}utomatic \textbf{S}tructured \textbf{t}hinking paradigm for multimod\textbf{a}l \textbf{r}easoning. Specifically, we introduce novel ``thought cards'', a lightweight library of high-level reasoning patterns abstracted from prior samples. For each test problem, AStar adaptively retrieves the optimal thought cards and seamlessly integrates these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AStar: Boosting Multimodal Reasoning with Automated Structured Thinking· underline

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Semantic Web and Ontologies