Can LLM Infer Risk Information From MCP Server System Logs?
Jiayi Fu, Yuansen Zhang, Yinggui Wang

TL;DR
This paper introduces a synthetic benchmark to evaluate LLMs' ability to detect security risks in MCP server system logs, highlighting the effectiveness of reinforcement learning techniques in improving detection accuracy.
Contribution
It presents the first benchmark dataset for assessing LLMs' risk inference from system logs and demonstrates reinforcement learning's superiority over supervised fine-tuning.
Findings
Reinforcement learning with GRPO improves detection accuracy to 83%.
Smaller models tend to miss risky logs, resulting in high false negatives.
Supervised fine-tuning increases false positives, over-flagging benign logs.
Abstract
Large Language Models (LLMs) demonstrate strong capabilities in solving complex tasks when integrated with external tools. The Model Context Protocol (MCP) has become a standard interface for enabling such tool-based interactions. However, these interactions introduce substantial security concerns, particularly when the MCP server is compromised or untrustworthy. While prior benchmarks primarily focus on prompt injection attacks or analyze the vulnerabilities of LLM-MCP interaction trajectories, limited attention has been given to the underlying system logs associated with malicious MCP servers. To address this gap, we present the first synthetic benchmark for evaluating LLMs' ability to identify security risks from system logs. We define nine categories of MCP server risks and generate 1,800 synthetic system logs using ten state-of-the-art LLMs. These logs are embedded in the return…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Adversarial Robustness in Machine Learning · Topic Modeling
