FinMTM: A Multi-Turn Multimodal Benchmark for Financial Reasoning and Agent Evaluation

Chenxi Zhang; Ziliang Gan; Liyun Zhu; Youwei Pang; Qing Zhang; Rongjunchen Zhang

arXiv:2602.03130·cs.CV·February 4, 2026

FinMTM: A Multi-Turn Multimodal Benchmark for Financial Reasoning and Agent Evaluation

Chenxi Zhang, Ziliang Gan, Liyun Zhu, Youwei Pang, Qing Zhang, Rongjunchen Zhang

PDF

Open Access

TL;DR

FinMTM introduces a comprehensive multi-turn multimodal benchmark for financial reasoning, addressing limitations of existing single-turn, narrow-format datasets, and evaluates 22 vision-language models on diverse, realistic financial tasks.

Contribution

The paper presents FinMTM, a novel benchmark with extensive bilingual financial QA pairs and diverse task formats, enabling more realistic evaluation of VLMs in finance.

Findings

01

VLMs show limitations in visual perception and reasoning.

02

Multi-turn dialogues reveal challenges in long-context understanding.

03

Agent-based tasks expose gaps in complex workflow reasoning.

Abstract

The financial domain poses substantial challenges for vision-language models (VLMs) due to specialized chart formats and knowledge-intensive reasoning requirements. However, existing financial benchmarks are largely single-turn and rely on a narrow set of question formats, limiting comprehensive evaluation in realistic application scenarios. To address this gap, we propose FinMTM, a multi-turn multimodal benchmark that expands diversity along both data and task dimensions. On the data side, we curate and annotate 11{,}133 bilingual (Chinese and English) financial QA pairs grounded in financial visuals, including candlestick charts, statistical plots, and report figures. On the task side, FinMTM covers single- and multiple-choice questions, multi-turn open-ended dialogues, and agent-based tasks. We further design task-specific evaluation protocols, including a set-overlap scoring rule…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Constraint Satisfaction and Optimization