In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations

Mohammad Aflah Khan; Mahsa Amani; Soumi Das; Bishwamittra Ghosh; Qinyuan Wu; Krishna P. Gummadi; Manish Gupta; Abhilasha Ravichander

arXiv:2602.15456·cs.CL·February 18, 2026

In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations

Mohammad Aflah Khan, Mahsa Amani, Soumi Das, Bishwamittra Ghosh, Qinyuan Wu, Krishna P. Gummadi, Manish Gupta, Abhilasha Ravichander

PDF

Open Access 3 Reviews

TL;DR

This paper investigates how large language model (LLM) agents display systematic source preferences when selecting information, revealing biases that influence what users see and highlighting the need for transparency and control mechanisms.

Contribution

It uncovers latent source preferences in LLMs, demonstrating their consistency, sensitivity to context, and impact on information presentation, which was previously underexplored.

Findings

01

LLMs exhibit strong, predictable source preferences.

02

Preferences are influenced by contextual framing.

03

Preferences persist despite explicit prompts to avoid them.

Abstract

Agents based on Large Language Models (LLMs) are increasingly being deployed as interfaces to information on online platforms. These agents filter, prioritize, and synthesize information retrieved from the platforms' back-end databases or via web search. In these scenarios, LLM agents govern the information users receive, by drawing users' attention to particular instances of retrieved information at the expense of others. While much prior work has focused on biases in the information LLMs themselves generate, less attention has been paid to the factors that influence what information LLMs select and present to users. We hypothesize that when information is attributed to specific sources (e.g., particular publishers, journals, or platforms), current LLMs exhibit systematic latent source preferences- that is, they prioritize information from some sources over others. Through controlled…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 2Confidence 3

Strengths

The authors did a bunch of experiments.

Weaknesses

The paper has no clear take-away insights. It is more fit for a Scientific Reports kind of paper, than an ICLR paper.

Reviewer 02Rating 6Confidence 4

Strengths

1. The core research question “whether LLM-based agents carry latent source preferences that systematically influence which items they trust and retrieve” is largely novel. This is a specific type of model bias that has not been systematically studied by prior works, but also appears timely and highly relevant to realistic LLM applications. 2. The paper is well-structured and easy to follow. Each research question is stated up front and directly answered with matched experiments and analyses, m

Weaknesses

1. The evaluation may be vulnerable to prompt-induced shortcutting: if the same phrasing (for instance, “select the article based on journalistic standards”) is used across direct and indirect tests, models might be reacting to that cue rather than expressing a stable, content-independent source prior. Concretely, a model could learn that the phrase “journalistic standards” often co-occurs with examples from mainstream outlets during pretraining or instruction tuning and therefore surface those

Reviewer 03Rating 4Confidence 5

Strengths

1. Introduces and formalizes the idea of “latent source preferences.” 2. 12 models, 6 providers, multiple domains, and both synthetic and real world data. 3. Consistent results with rank correlation and contextual sensitivity analyses. 4. Ties directly to alignment, fairness, and trustworthiness of LLM based agents. 5. Appendices include detailed prompt templates, datasets, and code release commitment.

Weaknesses

The paper stops short of causal analysis, it does not probe which stages of training (pretraining vs instruction-tuning) most contribute to preference formation. While the phenomenon is well-characterized, the mitigation aspect is limited to showing that prompting fails. A deeper exploration of possible control mechanisms (e.g., debiasing or preference regularization) would strengthen the work. Some statistical results (e.g., rationality correlations in Fig. 5) could be better explained with

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Text Readability and Simplification