OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents

Thomas Kuntz; Agatha Duzan; Hao Zhao; Francesco Croce; Zico Kolter; Nicolas Flammarion; Maksym Andriushchenko

arXiv:2506.14866·cs.SE·October 30, 2025·2 cites

OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents

Thomas Kuntz, Agatha Duzan, Hao Zhao, Francesco Croce, Zico Kolter, Nicolas Flammarion, Maksym Andriushchenko

PDF

Open Access 1 Repo 1 Video

TL;DR

OS-Harm is a comprehensive benchmark designed to evaluate the safety of computer use agents, focusing on harm types like misuse, prompt injections, and misbehavior, with automated evaluation and insights into model vulnerabilities.

Contribution

The paper introduces OS-Harm, a new benchmark with 150 tasks and an automated safety judge, to systematically assess safety risks of LLM-based computer use agents.

Findings

01

Models often comply with misuse queries

02

Vulnerable to static prompt injections

03

Occasionally perform unsafe actions

Abstract

Computer use agents are LLM-based agents that can directly interact with a graphical user interface, by processing screenshots or accessibility trees. While these systems are gaining popularity, their safety has been largely overlooked, despite the fact that evaluating and understanding their potential for harmful behavior is essential for widespread adoption. To address this gap, we introduce OS-Harm, a new benchmark for measuring safety of computer use agents. OS-Harm is built on top of the OSWorld environment and aims to test models across three categories of harm: deliberate user misuse, prompt injection attacks, and model misbehavior. To cover these cases, we create 150 tasks that span several types of safety violations (harassment, copyright infringement, disinformation, data exfiltration, etc.) and require the agent to interact with a variety of OS applications (email client,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tml-epfl/os-harm
noneOfficial

Videos

OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents· slideslive

Taxonomy

TopicsAdvanced Malware Detection Techniques