When Personalization Legitimizes Risks: Uncovering Safety Vulnerabilities in Personalized Dialogue Agents

Jiahe Guo; Xiangran Guo; Yulin Hu; Zimo Long; Xingyu Sui; Xuda Zhi; Yongbo Huang; Hao He; Weixiang Zhao; Yanyan Zhao; Bing Qin

arXiv:2601.17887·cs.AI·May 19, 2026

When Personalization Legitimizes Risks: Uncovering Safety Vulnerabilities in Personalized Dialogue Agents

Jiahe Guo, Xiangran Guo, Yulin Hu, Zimo Long, Xingyu Sui, Xuda Zhi, Yongbo Huang, Hao He, Weixiang Zhao, Yanyan Zhao, Bing Qin

PDF

1 Repo 1 Datasets

TL;DR

This paper uncovers a safety vulnerability in personalized dialogue agents called intent legitimation, where benign memories bias models to legitimize harmful queries, and introduces a benchmark and detection method to address this issue.

Contribution

It introduces PS-Bench, a benchmark to measure intent legitimation, and proposes a detection-reflection method to mitigate safety risks in personalized LLM agents.

Findings

01

Personalization increases attack success rates by 15.8% to 243.7%.

02

Intent legitimation is evidenced in internal model representations.

03

A lightweight detection method reduces safety degradation.

Abstract

Long-term memory enables large language model (LLM) agents to support personalized and sustained interactions. However, most work on personalized agents prioritizes utility and user experience, treating memory as a neutral component and largely overlooking its safety implications. In this paper, we reveal intent legitimation, a previously underexplored safety failure in personalized agents, where benign personal memories bias intent inference and cause models to legitimize inherently harmful queries. To study this phenomenon, we introduce PS-Bench, a benchmark designed to identify and quantify intent legitimation in personalized interactions. Across multiple memory-augmented agent frameworks and base LLMs, personalization increases attack success rates by 15.8\%--243.7\% relative to stateless baselines. We further provide mechanistic evidence for intent legitimation from internal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MuyuenLP/PS-Bench
github

Datasets

molmohsen/awesome-ai-agent-papers
dataset· 36 dl
36 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · AI in Service Interactions