RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection via   Retrieval-Augmented Generation

Tiantian Gan; Qiyao Sun

arXiv:2505.03275·cs.AI·May 7, 2025

RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection via Retrieval-Augmented Generation

Tiantian Gan, Qiyao Sun

PDF

Open Access

TL;DR

RAG-MCP introduces a retrieval-based framework that reduces prompt bloat and improves tool selection accuracy in LLMs by efficiently identifying relevant external tools before engagement.

Contribution

It proposes a novel retrieval-augmented approach to mitigate prompt bloat and enhance tool selection accuracy in LLMs using MCP.

Findings

01

Reduces prompt tokens by over 50%.

02

Triples tool selection accuracy (43.13% vs 13.62%).

03

Enables scalable and accurate tool integration.

Abstract

Large language models (LLMs) struggle to effectively utilize a growing number of external tools, such as those defined by the Model Context Protocol (MCP)\cite{IntroducingMCP}, due to prompt bloat and selection complexity. We introduce RAG-MCP, a Retrieval-Augmented Generation framework that overcomes this challenge by offloading tool discovery. RAG-MCP uses semantic retrieval to identify the most relevant MCP(s) for a given query from an external index before engaging the LLM. Only the selected tool descriptions are passed to the model, drastically reducing prompt size and simplifying decision-making. Experiments, including an MCP stress test, demonstrate RAG-MCP significantly cuts prompt tokens (e.g., by over 50%) and more than triples tool selection accuracy (43.13% vs 13.62% baseline) on benchmark tasks. RAG-MCP enables scalable and accurate tool integration for LLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling