FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

Jie Zhu; Yimin Tian; Boyang Li; Kehao Wu; Zhongzhi Liang; Junhui Li; Xianyin Zhang; Lifan Guo; Feng Chen; Yong Liu; Chi Zhang

arXiv:2603.24943·cs.AI·March 27, 2026

FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

Jie Zhu, Yimin Tian, Boyang Li, Kehao Wu, Zhongzhi Liang, Junhui Li, Xianyin Zhang, Lifan Guo, Feng Chen, Yong Liu, Chi Zhang

PDF

Open Access

TL;DR

FinMCP-Bench is a comprehensive benchmark designed to evaluate large language models' ability to solve real-world financial problems through tool invocation, covering diverse scenarios and complexities.

Contribution

The paper introduces FinMCP-Bench, a new benchmark with diverse samples and scenarios for assessing LLMs' financial reasoning and tool usage capabilities.

Findings

01

Effective evaluation of LLMs on financial tasks

02

Identification of strengths and weaknesses in tool invocation

03

Benchmark sets a new standard for financial LLM assessment

Abstract

This paper introduces \textbf{FinMCP-Bench}, a novel benchmark for evaluating large language models (LLMs) in solving real-world financial problems through tool invocation of financial model context protocols. FinMCP-Bench contains 613 samples spanning 10 main scenarios and 33 sub-scenarios, featuring both real and synthetic user queries to ensure diversity and authenticity. It incorporates 65 real financial MCPs and three types of samples, single tool, multi-tool, and multi-turn, allowing evaluation of models across different levels of task complexity. Using this benchmark, we systematically assess a range of mainstream LLMs and propose metrics that explicitly measure tool invocation accuracy and reasoning capabilities. FinMCP-Bench provides a standardized, practical, and challenging testbed for advancing research on financial LLM agents.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStock Market Forecasting Methods · FinTech, Crowdfunding, Digital Finance · Financial Reporting and XBRL