When Personalization Legitimizes Risks: Uncovering Safety Vulnerabilities in Personalized Dialogue Agents
Jiahe Guo, Xiangran Guo, Yulin Hu, Zimo Long, Xingyu Sui, Xuda Zhi, Yongbo Huang, Hao He, Weixiang Zhao, Yanyan Zhao, Bing Qin

TL;DR
This paper uncovers a safety vulnerability in personalized dialogue agents called intent legitimation, where benign memories bias models to legitimize harmful queries, and introduces a benchmark and detection method to address this issue.
Contribution
It introduces PS-Bench, a benchmark to measure intent legitimation, and proposes a detection-reflection method to mitigate safety risks in personalized LLM agents.
Findings
Personalization increases attack success rates by 15.8% to 243.7%.
Intent legitimation is evidenced in internal model representations.
A lightweight detection method reduces safety degradation.
Abstract
Long-term memory enables large language model (LLM) agents to support personalized and sustained interactions. However, most work on personalized agents prioritizes utility and user experience, treating memory as a neutral component and largely overlooking its safety implications. In this paper, we reveal intent legitimation, a previously underexplored safety failure in personalized agents, where benign personal memories bias intent inference and cause models to legitimize inherently harmful queries. To study this phenomenon, we introduce PS-Bench, a benchmark designed to identify and quantify intent legitimation in personalized interactions. Across multiple memory-augmented agent frameworks and base LLMs, personalization increases attack success rates by 15.8\%--243.7\% relative to stateless baselines. We further provide mechanistic evidence for intent legitimation from internal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · AI in Service Interactions
