FinMMDocR: Benchmarking Financial Multimodal Reasoning with Scenario Awareness, Document Understanding, and Multi-Step Computation

Zichen Tang; Haihong E; Rongjin Li; Jiacheng Liu; Linwei Jia; Zhuodi Hao; Zhongjun Yang; Yuanze Li; Haolin Tian; Xinyi Hu; Peizhi Zhao; Yuan Liu; Zhengyu Wang; Xianghe Wang; Yiling Huang; Xueyuan Lin; Ruofei Bai; Zijian Xie; Qian Huang; Ruining Cao; Haocheng Gao

arXiv:2512.24903·cs.CV·March 23, 2026

FinMMDocR: Benchmarking Financial Multimodal Reasoning with Scenario Awareness, Document Understanding, and Multi-Step Computation

Zichen Tang, Haihong E, Rongjin Li, Jiacheng Liu, Linwei Jia, Zhuodi Hao, Zhongjun Yang, Yuanze Li, Haolin Tian, Xinyi Hu, Peizhi Zhao, Yuan Liu, Zhengyu Wang, Xianghe Wang, Yiling Huang, Xueyuan Lin, Ruofei Bai, Zijian Xie, Qian Huang, Ruining Cao, Haocheng Gao

PDF

Open Access 1 Video

TL;DR

FinMMDocR is a comprehensive bilingual benchmark designed to evaluate multimodal large language models on complex financial reasoning tasks involving scenario awareness, document understanding, and multi-step computation, reflecting real-world challenges.

Contribution

This work introduces FinMMDocR, a new benchmark with diverse financial scenarios, extensive document understanding, and multi-step reasoning, surpassing existing benchmarks in complexity and realism.

Findings

01

Best MLLM achieves 58.0% accuracy on the benchmark.

02

Significant performance variation observed across retrieval-augmented methods.

03

Benchmark emphasizes multi-step, cross-page, and scenario-aware reasoning in financial contexts.

Abstract

We introduce FinMMDocR, a novel bilingual multimodal benchmark for evaluating multimodal large language models (MLLMs) on real-world financial numerical reasoning. Compared to existing benchmarks, our work delivers three major advancements. (1) Scenario Awareness: 57.9% of 1,200 expert-annotated problems incorporate 12 types of implicit financial scenarios (e.g., Portfolio Management), challenging models to perform expert-level reasoning based on assumptions; (2) Document Understanding: 837 Chinese/English documents spanning 9 types (e.g., Company Research) average 50.8 pages with rich visual elements, significantly surpassing existing benchmarks in both breadth and depth of financial documents; (3) Multi-Step Computation: Problems demand 11-step reasoning on average (5.3 extraction + 5.7 calculation steps), with 65.0% requiring cross-page evidence (2.4 pages average). The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

FinMMDocR: Benchmarking Financial Multimodal Reasoning with Scenario Awareness, Document Understanding, and Multi-Step Computation· underline

Taxonomy

TopicsStock Market Forecasting Methods · Topic Modeling · Multimodal Machine Learning Applications