Help or Hurdle? Rethinking Model Context Protocol-Augmented Large Language Models

Wei Song; Haonan Zhong; Ziqi Ding; Jingling Xue; Yuekang Li

arXiv:2508.12566·cs.AI·August 19, 2025

Help or Hurdle? Rethinking Model Context Protocol-Augmented Large Language Models

Wei Song, Haonan Zhong, Ziqi Ding, Jingling Xue, Yuekang Li

PDF

Open Access

TL;DR

This paper introduces MCPGAUGE, a comprehensive evaluation framework for understanding how large language models utilize external resources through the Model Context Protocol, revealing limitations and guiding future improvements.

Contribution

It presents the first systematic evaluation framework for LLMs using MCP, covering multiple dimensions and providing insights into current limitations of AI-tool integration.

Findings

01

LLMs often do not proactively use external tools as expected.

02

Adherence to tool-use instructions varies significantly across models.

03

The effectiveness of MCP in improving task performance is limited in current setups.

Abstract

The Model Context Protocol (MCP) enables large language models (LLMs) to access external resources on demand. While commonly assumed to enhance performance, how LLMs actually leverage this capability remains poorly understood. We introduce MCPGAUGE, the first comprehensive evaluation framework for probing LLM-MCP interactions along four key dimensions: proactivity (self-initiated tool use), compliance (adherence to tool-use instructions), effectiveness (task performance post-integration), and overhead (computational cost incurred). MCPGAUGE comprises a 160-prompt suite and 25 datasets spanning knowledge comprehension, general reasoning, and code generation. Our large-scale evaluation, spanning six commercial LLMs, 30 MCP tool suites, and both one- and two-turn interaction settings, comprises around 20,000 API calls and over USD 6,000 in computational cost. This comprehensive study…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling