Just Ask for a Table: A Thirty-Token User Prompt Defeats Sponsored Recommendations in Twelve LLMs
Andreas Maier, Jeta Sopa, Gozde Gul Sahin, Paula Perez-Toro, Siming Bayer

TL;DR
A simple thirty-token user prompt effectively reduces sponsored recommendations in multiple large language models, highlighting the importance of evaluation reproducibility and potential mitigation strategies.
Contribution
Demonstrates that a minimal user prompt can significantly diminish sponsored suggestions across various LLMs, and uncovers reproducibility issues in prior evaluations.
Findings
A thirty-token prompt reduces sponsored recommendations from ~50% to near zero.
Reproduction of prior results requires careful implementation; some reported rates were affected by silent failures.
The central claims of the original study generalize across models and evaluation setups.
Abstract
Wu et al. (2026) showed that most frontier large language models (LLMs) recommend a sponsored, roughly twice-as-expensive flight when their system prompt contains a soft sponsorship cue. We reproduce their evaluation on ten open-weight chat models plus the two of their twenty-three models that are still reachable today (gpt-3.5-turbo, gpt-4o). All reported rates in this paper are produced under the same judge the original paper used (gpt-4o); we additionally store every label under an open-weight (gpt-oss-120b) and a smaller proprietary (gpt-4o-mini) judge for an ablation. Three findings emerge. First, a prose description of an LLM evaluation pipeline is not, on its own, sufficient for accurate reproduction: we surfaced three silent implementation failures that each shifted a reported rate by tens of percentage points. Second, the central claims do generalise - the gpt-3.5-turbo…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
