AttackPilot: Autonomous Inference Attacks Against ML Services With LLM-Based Agents
Yixin Wu, Rui Wen, Chi Cui, Michael Backes, Yang Zhang

TL;DR
AttackPilot is an autonomous LLM-based agent that performs inference attacks on ML services, achieving near-expert performance with minimal cost, and can help non-experts assess ML service risks effectively.
Contribution
This work introduces AttackPilot, the first autonomous agent capable of conducting inference attacks on ML services without human intervention, utilizing LLMs like GPT-4o.
Findings
100% task completion rate on 20 services
Near-expert attack performance achieved
Low average token cost of $0.627 per run
Abstract
Inference attacks have been widely studied and offer a systematic risk assessment of ML services; however, their implementation and the attack parameters for optimal estimation are challenging for non-experts. The emergence of advanced large language models presents a promising yet largely unexplored opportunity to develop autonomous agents as inference attack experts, helping address this challenge. In this paper, we propose AttackPilot, an autonomous agent capable of independently conducting inference attacks without human intervention. We evaluate it on 20 target services. The evaluation shows that our agent, using GPT-4o, achieves a 100.0% task completion rate and near-expert attack performance, with an average token cost of only $0.627 per run. The agent can also be powered by many other representative LLMs and can adaptively optimize its strategy under service constraints. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Software System Performance and Reliability
