Rethinking Fraud Safety Evaluation: Multi-Round Attacks Reveal Safety-Utility Tradeoffs in Graph-Context LLM Defenders
Laura Jiang, Reza Ryan, Qian Li, Nasim Ferdosian

TL;DR
This paper evaluates multi-round fraud defense strategies using graph-context LLMs, revealing safety-utility tradeoffs and emphasizing the importance of multi-round assessment and refusal timing in fraud safety evaluation.
Contribution
It introduces a comprehensive multi-round evaluation framework for fraud defenders, highlighting how structured context influences safety-utility tradeoffs and refusal behavior.
Findings
Graph-context defenders improve early safe refusal but cause more benign over-refusal.
The cost is localized to how the LLM consumes structured context, not the encoder quality.
Temporal graph context is stronger than static but not conclusively better on refusal metrics.
Abstract
Single-turn safety evaluation is a poor proxy for real fraud defense, where attackers escalate across multiple rounds. This paper evaluates fraud defenders under replay and adaptive multi-round attacks and measures when a defender refuses, not just whether it eventually refuses. On a frozen multi-round suite built from Fraud-R1, graph-context defenders improve early safe refusal relative to text-only baselines under both replay and adaptive fraud pressure, but they also produce substantially more benign over-refusal. Direct probing of the trained graph encoder, together with paired shuffle-risk ablations on both fraud and benign sides replicated across two seeds on the Qwen-1.5B backbone, localises this cost to how the defender LLM consumes structured context rather than to graph-encoder quality: the encoder cleanly separates fraud from benign, while the LLM responds primarily to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
