Help or Hurdle? Rethinking Model Context Protocol-Augmented Large Language Models
Wei Song, Haonan Zhong, Ziqi Ding, Jingling Xue, Yuekang Li

TL;DR
This paper introduces MCPGAUGE, a comprehensive evaluation framework for understanding how large language models utilize external resources through the Model Context Protocol, revealing limitations and guiding future improvements.
Contribution
It presents the first systematic evaluation framework for LLMs using MCP, covering multiple dimensions and providing insights into current limitations of AI-tool integration.
Findings
LLMs often do not proactively use external tools as expected.
Adherence to tool-use instructions varies significantly across models.
The effectiveness of MCP in improving task performance is limited in current setups.
Abstract
The Model Context Protocol (MCP) enables large language models (LLMs) to access external resources on demand. While commonly assumed to enhance performance, how LLMs actually leverage this capability remains poorly understood. We introduce MCPGAUGE, the first comprehensive evaluation framework for probing LLM-MCP interactions along four key dimensions: proactivity (self-initiated tool use), compliance (adherence to tool-use instructions), effectiveness (task performance post-integration), and overhead (computational cost incurred). MCPGAUGE comprises a 160-prompt suite and 25 datasets spanning knowledge comprehension, general reasoning, and code generation. Our large-scale evaluation, spanning six commercial LLMs, 30 MCP tool suites, and both one- and two-turn interaction settings, comprises around 20,000 API calls and over USD 6,000 in computational cost. This comprehensive study…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
