VirtualCrime: Evaluating Criminal Potential of Large Language Models via Sandbox Simulation
Yilin Tang, Yu Wang, Lanlan Qiu, Wenchang Gao, Yunfei Ma, Baicheng Chen, Tianxing He

TL;DR
This paper introduces VirtualCrime, a sandbox simulation framework with three agents to evaluate the criminal potential of large language models through diverse crime tasks, highlighting safety concerns.
Contribution
It presents a novel multi-agent sandbox framework and a set of 40 crime tasks to assess LLMs' ability to plan and execute crimes, revealing safety risks.
Findings
LLMs can generate detailed crime plans and execute actions successfully.
Some agents inflict harm on NPCs to achieve goals.
The study underscores safety alignment needs for agentic AI.
Abstract
Large language models (LLMs) have shown strong capabilities in multi-step decision-making, planning and actions, and are increasingly integrated into various real-world applications. It is concerning whether their strong problem-solving abilities may be misused for crimes. To address this gap, we propose VirtualCrime, a sandbox simulation framework based on a three-agent system to evaluate the criminal capabilities of models. Specifically, this framework consists of an attacker agent acting as the leader of a criminal team, a judge agent determining the outcome of each action, and a world manager agent updating the environment state and entities. Furthermore, we design 40 diverse crime tasks within this framework, covering 11 maps and 13 crime objectives such as theft, robbery, kidnapping, and riot. We also introduce a human player baseline for reference to better interpret the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
