MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents

Jonathan Steinberg; Oren Gal

arXiv:2605.03952·cs.CR·May 6, 2026

MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents

Jonathan Steinberg, Oren Gal

PDF

TL;DR

MOSAIC-Bench is a new benchmark for evaluating the vulnerability of coding agents to sequenced malicious prompts, revealing safety gaps and testing mitigation strategies across multiple models and attack chains.

Contribution

The paper introduces MOSAIC-Bench, a comprehensive benchmark with attack chains and exploit oracles, to measure and analyze compositional vulnerabilities in coding agents.

Findings

01

Production agents have 53-86% success rate in staged attacks.

02

Vulnerable outputs reduced to 0-20.4% with frontier models and defenses.

03

A pentester framing reduces evasion, with 88.4% attack detection on GitHub PRs.

Abstract

Coding agents often pass per-prompt safety review yet ship exploitable code when their tasks are decomposed into routine engineering tickets. The challenge is structural: existing safety alignment evaluates overt requests in isolation, leaving models blind to malicious end-states that emerge from sequenced compliance with innocuous-looking requests. We introduce MOSAIC-Bench (Malicious Objectives Sequenced As Innocuous Compliance), a benchmark of 199 three-stage attack chains paired with deterministic exploit oracles on deployed software substrates (10 web-application substrates, 31 CWE classes, 5 programming languages) that treats both exploit ground truth and downstream reviewer protocol as first-class evaluation axes. On this benchmark, nine production coding agents from Anthropic, OpenAI, Google, Moonshot, Zhipu, and Minimax compose innocuous tickets at 53-86% end-to-end ASR with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.