Blue Teaming Function-Calling Agents
Greta Dolcetti, Giulio Zizzo, Sergio Maffeis

TL;DR
This paper evaluates the robustness of open source LLMs with function-calling capabilities against attacks and defenses, revealing current safety limitations and the impracticality of existing defenses in real-world applications.
Contribution
It provides an experimental assessment of the security vulnerabilities of function-calling LLMs and evaluates the effectiveness of various defenses.
Findings
Models are unsafe by default
Defenses are not ready for real-world use
Significant security gaps remain in current LLMs
Abstract
We present an experimental evaluation that assesses the robustness of four open source LLMs claiming function-calling capabilities against three different attacks, and we measure the effectiveness of eight different defences. Our results show how these models are not safe by default, and how the defences are not yet employable in real-world scenarios.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Advanced Malware Detection Techniques · Web Application Security Vulnerabilities
