FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

Zichen Tang; Haihong E; Jiacheng Liu; Zhongjun Yang; Rongjin Li; Zihua Rong; Haoyang He; Zhuodi Hao; Xinyang Hu; Kun Ji; Ziyan Ma; Mengyuan Ji; Jun Zhang; Chenghao Ma; Qianhe Zheng; Yang Liu; Yiling Huang; Xinyi Hu; Qing Huang; Zijian Xie; Shiyao Peng

arXiv:2508.04625·cs.CV·August 7, 2025

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

Zichen Tang, Haihong E, Jiacheng Liu, Zhongjun Yang, Rongjin Li, Zihua Rong, Haoyang He, Zhuodi Hao, Xinyang Hu, Kun Ji, Ziyan Ma, Mengyuan Ji, Jun Zhang, Chenghao Ma, Qianhe Zheng, Yang Liu, Yiling Huang, Xinyi Hu, Qing Huang, Zijian Xie, Shiyao Peng

PDF

TL;DR

FinMMR is a comprehensive bilingual multimodal benchmark designed to evaluate and advance the numerical reasoning abilities of large language models in complex financial tasks involving diverse data types and subdomains.

Contribution

It introduces a novel multimodal, comprehensive, and challenging benchmark with 4.3K questions and 8.7K images across 14 financial categories, focusing on multi-step reasoning.

Findings

01

Best model achieves 53.0% accuracy on hard problems

02

Benchmark covers 14 financial subdomains and multiple data modalities

03

FinMMR surpasses existing benchmarks in scope and difficulty

Abstract

We present FinMMR, a novel bilingual multimodal benchmark tailored to evaluate the reasoning capabilities of multimodal large language models (MLLMs) in financial numerical reasoning tasks. Compared to existing benchmarks, our work introduces three significant advancements. (1) Multimodality: We meticulously transform existing financial reasoning benchmarks, and construct novel questions from the latest Chinese financial research reports. FinMMR comprises 4.3K questions and 8.7K images spanning 14 categories, including tables, bar charts, and ownership structure charts. (2) Comprehensiveness: FinMMR encompasses 14 financial subdomains, including corporate finance, banking, and industry analysis, significantly exceeding existing benchmarks in financial domain knowledge breadth. (3) Challenge: Models are required to perform multi-step precise numerical reasoning by integrating financial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.