A Low-Latency Fraud Detection Layer for Detecting Adversarial Interaction Patterns in LLM-Powered Agents
Sheldon Yu, Yingcheng Sun, Hanqing Guo, Julian McAuley, Qianqian Tong

TL;DR
This paper introduces a low-latency, interaction-level fraud detection system for LLM-powered agents, enhancing real-time defense against adversarial manipulation by modeling risk across interaction trajectories.
Contribution
It proposes a lightweight, real-time detection layer that models risk over interaction sequences using structured features, outperforming LLM-based detectors in speed.
Findings
Detector achieves over 9 times faster response than LLM-based methods.
Structured runtime features effectively identify adversarial interaction patterns.
Interaction-level behavioral detection is crucial for deployment-time security.
Abstract
Large Language Model (LLM)-powered agents demonstrate strong capabilities in autonomous task execution, tool use, and multi-step reasoning. However, their increasing autonomy also introduces a new attack surface: adversarial interactions can manipulate agent behavior through direct prompt injection, indirect content attacks, and multi-turn escalation strategies. Existing defense strategies focus on prompt-level filtering and rule-based guardrails, which are often insufficient when risk emerges gradually across interaction sequences. In this work, we propose a complementary defense mechanism: a low-latency fraud detection layer for detecting adversarial interaction patterns in LLM-powered agents. Instead of determining whether a single prompt is malicious, our approach models risk over interaction trajectories using structured runtime features derived from prompt characteristics, session…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
