Sell Me This Stock: Unsafe Recommendation Drift in LLM Agents

Zekun Wu; Adriano Koshiyama; Sahan Bulathwela; Maria Perez-Ortiz

arXiv:2603.12564·cs.CL·April 16, 2026

Sell Me This Stock: Unsafe Recommendation Drift in LLM Agents

Zekun Wu, Adriano Koshiyama, Sahan Bulathwela, Maria Perez-Ortiz

PDF

TL;DR

This paper reveals that large language model recommendation agents can produce unsafe outputs due to evaluation blindness, where models trust manipulated tool data despite high quality scores, highlighting safety risks in multi-turn interactions.

Contribution

It uncovers counterintuitive failure modes in LLM recommendation agents, showing stronger models are more prone to unsafe recommendations and that safety cannot be ensured solely by internal detection.

Findings

01

Stronger models have higher quality scores but worse suitability violations.

02

Most violations are caused by current turn's manipulated data, not accumulated errors.

03

Internal detection of data manipulation does not lead to safer outputs.

Abstract

When a multi-turn LLM recommendation agent consumes incorrect tool data, it recommends unsuitable products while standard quality metrics stay near-perfect, a pattern we call evaluation blindness. We replay 23-turn financial advisory conversations across eight language models and find three counterintuitive failure modes. First, stronger models are not safer: the best-performing model has the highest quality score yet the worst suitability violations (99.1% of turns). This points to an alignment-grounding tension: the same property that makes it an effective agent, faithfully grounding its reasoning in tool data, makes it the most reliable executor of bad data. Across all models, 80% of risk-score citations repeat the manipulated value verbatim, and not a single turn out of 1,840 questions the tool outputs. Second, the failures are not cumulative: 95% of violations stem from the current…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.