FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning
Seunghee Kim, Changhyeon Kim, Taeuk Kim

TL;DR
This paper introduces FCMR, a new benchmark for evaluating multimodal large language models' ability to perform complex, multi-hop reasoning across financial data modalities, revealing current models' limitations.
Contribution
The paper presents FCMR, a challenging, multi-level benchmark for assessing cross-modal reasoning in financial contexts, addressing limitations of previous benchmarks.
Findings
State-of-the-art models achieve only 30.4% accuracy on hard tasks.
Models struggle with three-hop cross-modal reasoning.
Analysis uncovers a bottleneck in information retrieval.
Abstract
Real-world decision-making often requires integrating and reasoning over information from multiple modalities. While recent multimodal large language models (MLLMs) have shown promise in such tasks, their ability to perform multi-hop reasoning across diverse sources remains insufficiently evaluated. Existing benchmarks, such as MMQA, face challenges due to (1) data contamination and (2) a lack of complex queries that necessitate operations across more than two modalities, hindering accurate performance assessment. To address this, we present Financial Cross-Modal Multi-Hop Reasoning (FCMR), a benchmark created to analyze the reasoning capabilities of MLLMs by urging them to combine information from textual reports, tables, and charts within the financial domain. FCMR is categorized into three difficulty levels-Easy, Medium, and Hard-facilitating a step-by-step evaluation. In particular,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStock Market Forecasting Methods
