A Large-Scale Evolvable Dataset for Model Context Protocol Ecosystem and Security Analysis
Zhiwei Lin, Bonan Ruan, Jiahao Liu, Weibo Zhao

TL;DR
This paper introduces MCPCorpus, a large-scale, annotated dataset of MCP artifacts, enabling comprehensive analysis of the Model Context Protocol ecosystem's adoption, diversity, and security.
Contribution
We present MCPCorpus, the first extensive, structured dataset of MCP servers and clients with detailed metadata, supporting ecosystem research and security analysis.
Findings
Provides a comprehensive snapshot of MCP ecosystem
Enables analysis of adoption trends and diversity
Supports security and ecosystem health studies
Abstract
The Model Context Protocol (MCP) has recently emerged as a standardized interface for connecting language models with external tools and data. As the ecosystem rapidly expands, the lack of a structured, comprehensive view of existing MCP artifacts presents challenges for research. To bridge this gap, we introduce MCPCorpus, a large-scale dataset containing around 14K MCP servers and 300 MCP clients. Each artifact is annotated with 20+ normalized attributes capturing its identity, interface configuration, GitHub activity, and metadata. MCPCorpus provides a reproducible snapshot of the real-world MCP ecosystem, enabling studies of adoption trends, ecosystem health, and implementation diversity. To keep pace with the rapid evolution of the MCP ecosystem, we provide utility tools for automated data synchronization, normalization, and inspection. Furthermore, to support efficient exploration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Model-Driven Software Engineering Techniques · Natural Language Processing Techniques
