MEMSAD: Gradient-Coupled Anomaly Detection for Memory Poisoning in Retrieval-Augmented Agents

Ishrith Gowda (University of California; Berkeley)

arXiv:2605.03482·cs.CR·May 8, 2026

MEMSAD: Gradient-Coupled Anomaly Detection for Memory Poisoning in Retrieval-Augmented Agents

Ishrith Gowda (University of California, Berkeley)

PDF

TL;DR

This paper introduces MEMSAD, a gradient-based anomaly detection method for memory poisoning in retrieval-augmented language models, providing formal guarantees and demonstrating robustness against various attacks.

Contribution

The paper presents MEMSAD, a novel calibration-based defense grounded in a gradient coupling theorem, with theoretical optimality and practical effectiveness against memory poisoning attacks.

Findings

01

MEMSAD achieves 100% true positive rate and 0% false positive rate in experiments.

02

Faithful evaluation increases measured attack success by 4 times.

03

Online calibration bounds and a formal characterization of a synonym-invariance loophole are provided.

Abstract

Persistent external memory enables LLM agents to maintain context across sessions, yet its security properties remain formally uncharacterized. We formalize memory poisoning attacks on retrieval-augmented agents as a Stackelberg game with a unified evaluation framework spanning three attack classes with escalating access assumptions. Correcting an evaluation protocol inconsistency in the triggered-query specification of Chen et al. (2024), we show faithful evaluation increases measured attack success by $4 \times$ (ASR-R: $0.25 \to 1.00$ ). Our primary contribution is MEMSAD (Semantic Anomaly Detection), a calibration-based defense grounded in a gradient coupling theorem: under encoder regularity, the anomaly score gradient and the retrieval objective gradient are provably identical, so any continuous perturbation that reduces detection risk necessarily degrades retrieval rank. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.