Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

Zijun Wang; Haoqin Tu; Letian Zhang; Hardy Chen; Juncheng Wu; Xiangyan Liu; Zhenlong Yuan; Tianyu Pang; Michael Qizhe Shieh; Fengze Liu; Zeyu Zheng; Huaxiu Yao; Yuyin Zhou; Cihang Xie

arXiv:2604.04759·cs.CR·April 7, 2026

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

Zijun Wang, Haoqin Tu, Letian Zhang, Hardy Chen, Juncheng Wu, Xiangyan Liu, Zhenlong Yuan, Tianyu Pang, Michael Qizhe Shieh, Fengze Liu, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie

PDF

2 Repos

TL;DR

This paper evaluates the real-world safety of OpenClaw, a widely used personal AI agent, revealing significant vulnerabilities across multiple attack scenarios and proposing a new safety analysis framework.

Contribution

It introduces the CIK taxonomy for safety analysis and provides the first comprehensive real-world evaluation of OpenClaw's vulnerabilities.

Findings

01

Poisoning any CIK dimension increases attack success rate from 24.6% to 64-74%.

02

Even the most robust model shows over a threefold increase in vulnerability.

03

File protection blocks 97% of malicious injections but also hinders legitimate updates.

Abstract

OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services such as Gmail, Stripe, and the filesystem. While these broad privileges enable high levels of automation and powerful personalization, they also expose a substantial attack surface that existing sandboxed evaluations fail to capture. To address this gap, we present the first real-world safety evaluation of OpenClaw and introduce the CIK taxonomy, which unifies an agent's persistent state into three dimensions, i.e., Capability, Identity, and Knowledge, for safety analysis. Our evaluations cover 12 attack scenarios on a live OpenClaw instance across four backbone models (Claude Sonnet 4.5, Opus 4.6, Gemini 3.1 Pro, and GPT-5.4). The results show that poisoning any single CIK dimension increases the average attack success rate from 24.6% to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.