FinMTM: A Multi-Turn Multimodal Benchmark for Financial Reasoning and Agent Evaluation
Chenxi Zhang, Ziliang Gan, Liyun Zhu, Youwei Pang, Qing Zhang, Rongjunchen Zhang

TL;DR
FinMTM introduces a comprehensive multi-turn multimodal benchmark for financial reasoning, addressing limitations of existing single-turn, narrow-format datasets, and evaluates 22 vision-language models on diverse, realistic financial tasks.
Contribution
The paper presents FinMTM, a novel benchmark with extensive bilingual financial QA pairs and diverse task formats, enabling more realistic evaluation of VLMs in finance.
Findings
VLMs show limitations in visual perception and reasoning.
Multi-turn dialogues reveal challenges in long-context understanding.
Agent-based tasks expose gaps in complex workflow reasoning.
Abstract
The financial domain poses substantial challenges for vision-language models (VLMs) due to specialized chart formats and knowledge-intensive reasoning requirements. However, existing financial benchmarks are largely single-turn and rely on a narrow set of question formats, limiting comprehensive evaluation in realistic application scenarios. To address this gap, we propose FinMTM, a multi-turn multimodal benchmark that expands diversity along both data and task dimensions. On the data side, we curate and annotate 11{,}133 bilingual (Chinese and English) financial QA pairs grounded in financial visuals, including candlestick charts, statistical plots, and report figures. On the task side, FinMTM covers single- and multiple-choice questions, multi-turn open-ended dialogues, and agent-based tasks. We further design task-specific evaluation protocols, including a set-overlap scoring rule…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Constraint Satisfaction and Optimization
