Just Ask for a Table: A Thirty-Token User Prompt Defeats Sponsored Recommendations in Twelve LLMs

Andreas Maier; Jeta Sopa; Gozde Gul Sahin; Paula Perez-Toro; Siming Bayer

arXiv:2605.12772·cs.CV·May 14, 2026

Just Ask for a Table: A Thirty-Token User Prompt Defeats Sponsored Recommendations in Twelve LLMs

Andreas Maier, Jeta Sopa, Gozde Gul Sahin, Paula Perez-Toro, Siming Bayer

PDF

1 Repo 1 Datasets

TL;DR

A simple thirty-token user prompt effectively reduces sponsored recommendations in multiple large language models, highlighting the importance of evaluation reproducibility and potential mitigation strategies.

Contribution

Demonstrates that a minimal user prompt can significantly diminish sponsored suggestions across various LLMs, and uncovers reproducibility issues in prior evaluations.

Findings

01

A thirty-token prompt reduces sponsored recommendations from ~50% to near zero.

02

Reproduction of prior results requires careful implementation; some reported rates were affected by silent failures.

03

The central claims of the original study generalize across models and evaluation setups.

Abstract

Wu et al. (2026) showed that most frontier large language models (LLMs) recommend a sponsored, roughly twice-as-expensive flight when their system prompt contains a soft sponsorship cue. We reproduce their evaluation on ten open-weight chat models plus the two of their twenty-three models that are still reachable today (gpt-3.5-turbo, gpt-4o). All reported rates in this paper are produced under the same judge the original paper used (gpt-4o); we additionally store every label under an open-weight (gpt-oss-120b) and a smaller proprietary (gpt-4o-mini) judge for an ablation. Three findings emerge. First, a prose description of an LLM evaluation pipeline is not, on its own, sufficient for accurate reproduction: we surfaced three silent implementation failures that each shifted a reported rate by tens of percentage points. Second, the central claims do generalise - the gpt-3.5-turbo…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

akmaier/Paper-LLM-Ads
github

Datasets

akmaier/LLM-Ads
dataset· 161 dl
161 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.