More Vulnerable than You Think: On the Stability of Tool-Integrated LLM Agents
Weimin Xiong, Ke Wang, Yifan Song, Hanchao Liu, Sai Zhou, Wei Peng, Sujian Li

TL;DR
This paper investigates the stability of tool-integrated LLM agents, revealing their high vulnerability to errors during tool invocation, especially in open-source models, and emphasizes the need for stability evaluation in real-world applications.
Contribution
It provides a comprehensive analysis of the stability issues in tool-integrated LLM agents, highlighting vulnerabilities across different stages and model types, which was previously underexplored.
Findings
Agents are highly susceptible to errors at each invocation stage.
Open-source models are more vulnerable than proprietary models.
Increasing model size does not improve and may worsen vulnerability.
Abstract
Current evaluations of tool-integrated LLM agents typically focus on end-to-end tool-usage evaluation while neglecting their stability. This limits their real-world applicability, as various internal or external factors can cause agents to crash or behave abnormally. Our research addresses this by investigating whether agents are vulnerable to errors throughout the entire tool invocation process, including reading tool documentation, selecting tools and generating parameters, and processing the tool's response. Through extensive experiments, we observe that agents are highly susceptible to errors at each stage and agents based on open-source models are more vulnerable than those based on proprietary models. We also find that increasing the model size does not significantly improve tool invocation reasoning and may make agents more vulnerable to attacks resembling normal user…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Multi-Agent Systems and Negotiation
MethodsFocus
