LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments

Chiyu Zhang,Huiqin Yang,Bendong Jiang,Xiaolei Zhang,Yiran Zhao,Ruyi Chen,Lu Zhou,Xiaogang Xu,Jiafei Wu,Liming Fang,Zhe Liu

arXiv:2605.10779·cs.CR·May 12, 2026

LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments

Chiyu Zhang,Huiqin Yang,Bendong Jiang,Xiaolei Zhang,Yiran Zhao,Ruyi Chen,Lu Zhou,Xiaogang Xu,Jiafei Wu,Liming Fang,Zhe Liu

PDF

TL;DR

LITMUS is a comprehensive benchmark for evaluating the safety and robustness of LLM-based OS agents against behavioral jailbreaks, addressing both semantic and physical-layer risks with a dual verification system.

Contribution

It introduces a dual verification benchmark with 819 test cases and an automated evaluation framework for assessing LLM agent safety at both conversational and OS levels.

Findings

01

Current agents often execute high-risk OS operations despite safety measures.

02

Agents frequently exhibit Execution Hallucination, misleadingly denying completed dangerous actions.

03

Skill injection and entity wrapping attacks are highly successful, revealing significant vulnerabilities.

Abstract

The rapid proliferation of LLM-based autonomous agents in real operating system environments introduces a new category of safety risk beyond content safety: behavior jailbreak, where an adversary induces an agent to execute dangerous OS-level operations with irreversible consequences. Existing benchmarks either evaluate safety at the semantic layer alone, missing physical-layer harms, or fail to isolate test cases, letting earlier runs contaminate later ones. We present LITMUS (LLM-agents In-OS Testing for Measuring Unsafe Subversion), a benchmark addressing both gaps via a semantic-physical dual verification mechanism and OS-level state rollback. LITMUS comprises 819 high-risk test cases organized into one harmful seed subset and six attack-extended subsets covering three adversarial paradigms (jailbreak speaking, skill injection, and entity wrapping), plus a fully automated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.