M$^3$-ACE: Rectifying Visual Perception in Multimodal Math Reasoning via Multi-Agentic Context Engineering

Peijin Xie; Zhen Xu; Bingquan Liu; Baoxun Wang

arXiv:2603.08369·cs.AI·March 10, 2026

M$^3$-ACE: Rectifying Visual Perception in Multimodal Math Reasoning via Multi-Agentic Context Engineering

Peijin Xie, Zhen Xu, Bingquan Liu, Baoxun Wang

PDF

Open Access

TL;DR

This paper introduces M3-ACE, a multi-agent framework that improves multimodal math reasoning by collaboratively rectifying visual perception errors, leading to state-of-the-art performance on several benchmarks.

Contribution

The paper proposes a novel multi-agent context engineering approach that decouples perception and reasoning, enhancing visual evidence extraction in multimodal math reasoning models.

Findings

01

Achieves 89.1% accuracy on MathVision benchmark.

02

Significantly improves performance on MathVista and MathVerse datasets.

03

Demonstrates the effectiveness of perception correction in multimodal reasoning.

Abstract

Multimodal large language models have recently shown promising progress in visual mathematical reasoning. However, their performance is often limited by a critical yet underexplored bottleneck: inaccurate visual perception. Through systematic analysis, we find that the most failures originate from incorrect or incomplete visual evidence extraction rather than deficiencies in reasoning capability. Moreover, models tend to remain overly confident in their initial perceptions, making standard strategies such as prompt engineering, multi-round self-reflection, or posterior guidance insufficient to reliably correct errors. To address this limitation, we propose M3-ACE, a multi-agentic context engineering framework designed to rectify visual perception in multimodal math reasoning. Instead of directly aggregating final answers, our approach decouples perception and reasoning by dynamically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Data Visualization and Analytics · Advanced Graph Neural Networks