Rethinking LLMOps for Fraud and AML: Building a Compliance-Grade LLM Serving Stack

Prathamesh Vasudeo Naik; Naresh Dintakurthi; Yue Wang

arXiv:2605.11232·cs.AI·May 13, 2026

Rethinking LLMOps for Fraud and AML: Building a Compliance-Grade LLM Serving Stack

Prathamesh Vasudeo Naik, Naresh Dintakurthi, Yue Wang

PDF

TL;DR

This paper presents a specialized LLMOps stack optimized for fraud and AML compliance workloads, improving efficiency, latency, and throughput through workload-aware tuning and system design.

Contribution

It introduces a novel workload-aware LLM serving architecture tailored for compliance tasks, combining multiple optimization techniques and quality gates.

Findings

01

Throughput increased from 612-650 to 3,600 requests/hour.

02

P99 latency reduced from 31-38 seconds to 6.4-8.7 seconds.

03

GPU utilization improved from 12% to 78%.

Abstract

Fraud detection and anti-money-laundering (AML) compliance are high-value domains for large language models (LLMs), but their serving requirements differ sharply from generic chat workloads. Compliance prompts are often prefix-heavy, schema-constrained, and evidence-rich, combining reusable policy instructions, risk taxonomies, transaction or document context, and short structured outputs such as JSON labels or risk factors. These properties make prefix reuse, KV-cache efficiency, runtime tuning, model orchestration, and output validation first-order systems concerns. This paper introduces a workload-aware LLMOps stack for fraud and AML workloads using self-hosted open-weight models such as Meta Llama and Alibaba Qwen. The stack combines vLLM-style runtime tuning, PagedAttention, Automatic Prefix Caching, multi-adapter serving, adapter and prompt-length-aware batching, sleep/wake…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.