TL;DR
This paper evaluates the real-world safety of OpenClaw, a widely used personal AI agent, revealing significant vulnerabilities across multiple attack scenarios and proposing a new safety analysis framework.
Contribution
It introduces the CIK taxonomy for safety analysis and provides the first comprehensive real-world evaluation of OpenClaw's vulnerabilities.
Findings
Poisoning any CIK dimension increases attack success rate from 24.6% to 64-74%.
Even the most robust model shows over a threefold increase in vulnerability.
File protection blocks 97% of malicious injections but also hinders legitimate updates.
Abstract
OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services such as Gmail, Stripe, and the filesystem. While these broad privileges enable high levels of automation and powerful personalization, they also expose a substantial attack surface that existing sandboxed evaluations fail to capture. To address this gap, we present the first real-world safety evaluation of OpenClaw and introduce the CIK taxonomy, which unifies an agent's persistent state into three dimensions, i.e., Capability, Identity, and Knowledge, for safety analysis. Our evaluations cover 12 attack scenarios on a live OpenClaw instance across four backbone models (Claude Sonnet 4.5, Opus 4.6, Gemini 3.1 Pro, and GPT-5.4). The results show that poisoning any single CIK dimension increases the average attack success rate from 24.6% to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
