VeriGrey: Greybox Agent Validation
Yuntong Zhang, Sungmin Kang, Ruijie Meng, Marcel B\"ohme, Abhik Roychoudhury

TL;DR
VeriGrey introduces a grey-box testing approach for LLM agents that uncovers security vulnerabilities by analyzing tool invocation sequences and using mutation-based prompt injections, outperforming black-box methods.
Contribution
The paper presents VeriGrey, a novel grey-box testing framework that effectively detects security risks in LLM agents through sequence analysis and prompt mutation techniques.
Findings
33% more effective in finding prompt injection vulnerabilities compared to black-box baseline.
Successfully identified malicious prompts in real-world coding and personal assistant agents.
Demonstrated high success rates in uncovering attack scenarios across multiple LLM backends.
Abstract
Agentic AI has been a topic of great interest recently. A Large Language Model (LLM) agent involves one or more LLMs in the back-end. In the front end, it conducts autonomous decision-making by combining the LLM outputs with results obtained by invoking several external tools. The autonomous interactions with the external environment introduce critical security risks. In this paper, we present a grey-box approach to explore diverse behaviors and uncover security risks in LLM agents. Our approach VeriGrey uses the sequence of tools invoked as a feedback function to drive the testing process. This helps uncover infrequent but dangerous tool invocations that cause unexpected agent behavior. As mutation operators in the testing process, we mutate prompts to design pernicious injection prompts. This is carefully accomplished by linking the task of the agent to an injection task, so that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Malware Detection Techniques
