FragBench: Cross-Session Attacks Hidden in Benign-Looking Fragments

Astha Mehta; Niruthiha Selvanayagam; Cedric Lam; Hengxu Li; Phuc-Nguyen Nguyen; Raymond Lee; Olivia McGoffin; My (Isabella) Luong; Arthur Coll\'e; Jamie Johnson; David Williams-King; Linh Le

arXiv:2605.11029·cs.CR·May 13, 2026

FragBench: Cross-Session Attacks Hidden in Benign-Looking Fragments

Astha Mehta, Niruthiha Selvanayagam, Cedric Lam, Hengxu Li, Phuc-Nguyen Nguyen, Raymond Lee, Olivia McGoffin, My (Isabella) Luong, Arthur Coll\'e, Jamie Johnson, David Williams-King, Linh Le

PDF

1 Repo

TL;DR

FragBench introduces a benchmark for detecting cross-session malicious prompts in LLMs, emphasizing the importance of modeling interaction graphs over isolated prompt analysis.

Contribution

It presents a new benchmark derived from real cyber incidents, with tasks for adversarial rewriters and user-level detectors, and demonstrates the effectiveness of graph-based models.

Findings

01

Graph-based detectors achieve F1 scores of 0.88-0.96.

02

Single-turn safety judges perform near chance on cross-session attacks.

03

Cross-session interaction modeling is crucial for LLM safety.

Abstract

An attacker can split a malicious goal into sub-prompts that each look benign on their own and only become harmful in combination. Existing LLM safety benchmarks evaluate prompts one at a time, or across turns of a single chat, and so do not look for a malicious signal spread across separate sessions with no shared context. We build FragBench, a benchmark drawn from 24 real-world cyber-incident campaigns, which keeps the full attack trail: the multi-fragment kill chain, the per-fragment safety-judge verdicts, sandboxed execution traces, and a matched set of benign cover sessions. FragBench splits this trail into two paired tasks: an adversarial rewriter that hardens fragments against a single-turn safety judge (FragBench Attack), and a graph-based user-level detector trained on the resulting interactions (FragBench Defense). The single-turn judge is near chance on the released corpus by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LidaSafety/fragbench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.