When Routine Chats Turn Toxic: Unintended Long-Term State Poisoning in Personalized Agents
Xiaoyu Xu, Minxin Du, Qipeng Xie, Haobin Ke, Qingqing Ye, Haibo Hu

TL;DR
This paper reveals that routine interactions with personalized LLM agents can unintentionally poison their long-term state, leading to security vulnerabilities, and introduces benchmarks and defenses to address this issue.
Contribution
It formalizes the risk of long-term state poisoning, introduces the ULSPB benchmark, and proposes StateGuard as an effective mitigation strategy.
Findings
Routine conversations can significantly poison long-term state.
StateGuard effectively reduces authorization drift and tool-use escalation.
Synthetic and real-world interactions confirm the poisoning risk.
Abstract
Personalized LLM agents maintain persistent cross-session state to support long-horizon collaboration. Yet, this persistence introduces a subtle but critical security vulnerability: routine user-agent interactions can gradually reshape an agent's long-term state, inadvertently weakening future confirmation boundaries, expanding tool-use defaults, and escalating autonomous behavior over time. We formalize this risk as \textbf{unintended long-term state poisoning}. To systematically study it, we introduce the \textbf{Unintended Long-Term State Poisoning Bench (ULSPB)}, a bilingual benchmark comprising settings spanning five assistance categories, seven interaction patterns, 24-turn routine interactions, and matched single-injection counterparts. Furthermore, we define the \emph{Harm Score} (HS), a state-centric metric that quantifies \emph{authorization drift}, \emph{tool-use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
