We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning

Runqi Qiao; Qiuna Tan; Peiqing Yang; Yanzi Wang; Xiaowan Wang; Enhui Wan; Sitong Zhou; Guanting Dong; Yuchen Zeng; Yida Xu; Jie Wang; Chong Sun; Chen Li; Honggang Zhang

arXiv:2508.10433·cs.AI·August 15, 2025

We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning

Runqi Qiao, Qiuna Tan, Peiqing Yang, Yanzi Wang, Xiaowan Wang, Enhui Wan, Sitong Zhou, Guanting Dong, Yuchen Zeng, Yida Xu, Jie Wang, Chong Sun, Chen Li, Honggang Zhang

PDF

3 Datasets 3 Reviews

TL;DR

We-Math 2.0 is a comprehensive system that enhances multimodal large language models' mathematical reasoning by integrating a structured knowledge system, diverse datasets, and reinforcement learning training.

Contribution

It introduces a hierarchical knowledge system, expanded datasets with difficulty levels, a novel RL training framework, and a comprehensive evaluation benchmark for mathematical reasoning.

Findings

01

MathBook-RL achieves competitive results on standard benchmarks.

02

The system demonstrates strong generalization in mathematical reasoning.

03

Enhanced reasoning capabilities across diverse knowledge points.

Abstract

Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities across various tasks, but still struggle with complex mathematical reasoning. Existing research primarily focuses on dataset construction and method optimization, often overlooking two critical aspects: comprehensive knowledge-driven design and model-centric data space modeling. In this paper, we introduce We-Math 2.0, a unified system that integrates a structured mathematical knowledge system, model-centric data space modeling, and a reinforcement learning (RL)-based training paradigm to comprehensively enhance the mathematical reasoning abilities of MLLMs. The key contributions of We-Math 2.0 are fourfold: (1) MathBook Knowledge System: We construct a five-level hierarchical system encompassing 491 knowledge points and 1,819 fundamental principles. (2) MathBook-Standard & Pro: We develop…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. The paper is well structured and easy to follow. The teaser figure is particularly well structured. 2. The intuition of the dataset is detailed and inspiring, which could be even more helpful than the dataset iteself. 3. The evaluation results show the improvement brought by We-Math 2.0.

Weaknesses

No significant weakness within the dataset scope.

Reviewer 02Rating 6Confidence 4

Strengths

- The paper presents a unified framework (We-Math 2.0) that integrates a structured mathematical knowledge system, a model-centric data space, and an RL-based training paradigm. - The trained model reportedly demonstrates a marginal advantage on some established mathematical multimodal benchmarks.

Weaknesses

- The categorization and comparison in Table 1 appear inconsistent. For instance, comparing the granularity of the proposed 491-point system with datasets like MathV360k (which contains diverse content like charts and general QA) is not an apples-to-apples comparison and may be misleading.

Reviewer 03Rating 8Confidence 3

Strengths

1. This work represents substantial engineering effort in an important area and should be applauded for this. 2. MathBook Knowledge System (MKS) is comprehensive with 491 knowledge points + 1819 fundamental principles 3. MathBook-Standard/Pro is Built on MKS with annotated problems which are shown later in experiments to be strong together with the proposed MathBook-RL. 4. MathBook-RL is presented well with ablation studies. 5. This paper offers a few insights including the observation that MLLM

Weaknesses

Maybe some more experiments on a few more model scale are warranted, but considering the amount of work put into the entire system, I do not see this as much of defect.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.