A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)
Tianyu Chen, Dongrui Liu, Xia Hu, Jingyi Yu, Wenjie Wang

TL;DR
This paper evaluates the safety of Clawdbot, a versatile AI agent, across multiple risk dimensions using a trajectory-based approach, identifying common failure modes and security vulnerabilities in various scenarios.
Contribution
It introduces a trajectory-centric safety evaluation framework for Clawdbot, combining automated and human assessments across adapted and custom scenarios.
Findings
Safety performance varies across tasks
Failures often occur with ambiguous or open-ended prompts
Most vulnerabilities are linked to misinterpretations in benign contexts
Abstract
Clawdbot is a self-hosted, tool-using personal AI agent with a broad action space spanning local execution and web-mediated workflows, which raises heightened safety and security concerns under ambiguity and adversarial steering. We present a trajectory-centric evaluation of Clawdbot across six risk dimensions. Our test suite samples and lightly adapts scenarios from prior agent-safety benchmarks (including ATBench and LPS-Bench) and supplements them with hand-designed cases tailored to Clawdbot's tool surface. We log complete interaction trajectories (messages, actions, tool-call arguments/outputs) and assess safety using both an automated trajectory judge (AgentDoG-Qwen3-4B) and human review. Across 34 canonical cases, we find a non-uniform safety profile: performance is generally consistent on reliability-focused tasks, while most failures arise under underspecified intent,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSafety Systems Engineering in Autonomy · Adversarial Robustness in Machine Learning · Occupational Health and Safety Research
