Visual Inception: Compromising Long-term Planning in Agentic Recommenders via Multimodal Memory Poisoning

Jiachen Qian

arXiv:2604.16966·cs.CR·April 21, 2026

Visual Inception: Compromising Long-term Planning in Agentic Recommenders via Multimodal Memory Poisoning

Jiachen Qian

PDF

TL;DR

This paper reveals a new vulnerability in agentic recommender systems called Visual Inception, where poisoned images can hijack long-term planning, and proposes a defense framework called CognitiveGuard to mitigate this threat.

Contribution

It introduces Visual Inception, a novel attack method exploiting long-term memory in agentic RecSys, and proposes CognitiveGuard, a dual-process defense inspired by human cognition.

Findings

01

Visual Inception achieves about 85% Goal-Hit Rate in experiments.

02

CognitiveGuard reduces the attack success rate to around 10%.

03

Defense mechanism maintains system quality with configurable latency.

Abstract

The evolution from static ranking models to Agentic Recommender Systems (Agentic RecSys) empowers AI agents to maintain long-term user profiles and autonomously plan service tasks. While this paradigm shift enhances personalization, it introduces a vulnerability: reliance on Long-term Memory (LTM). In this paper, we uncover a threat termed "Visual Inception." Unlike traditional adversarial attacks that seek immediate misclassification, Visual Inception injects triggers into user-uploaded images (e.g., lifestyle photos) that act as "sleeper agents" within the system's memory. When retrieved during future planning, these poisoned memories hijack the agent's reasoning chain, steering it toward adversary-defined goals (e.g., promoting high-margin products) without prompt injection. To mitigate this, we propose CognitiveGuard, a dual-process defense framework inspired by human cognition. It…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.