Seeing the Goal, Missing the Truth: Human Accountability for AI Bias
Sean Cao, Wei Jiang, Hui Xu

TL;DR
This paper investigates how human-defined goals influence LLM behavior, revealing that goal disclosure can induce bias and overfitting, which stems from human accountability rather than algorithmic flaws.
Contribution
It demonstrates that goal-aware prompting causes bias and overfitting in LLMs, highlighting the role of human accountability in AI bias.
Findings
Goal disclosure leads to biased sentiment and measures in LLM outputs.
Purpose leakage improves in-sample performance but not post-cutoff.
Bias arises from human unintentional conversational cues.
Abstract
This research explores how human-defined goals influence the behavior of Large Language Models (LLMs) through purpose-conditioned cognition. Using financial prediction tasks, we show that revealing the downstream use (e.g., predicting stock returns or earnings) of LLM outputs leads the LLM to generate biased sentiment and competition measures, even though these measures are intended to be downstream task-independent. Goal-aware prompting shifts these intermediate measures toward the disclosed downstream objective, producing in-sample overfitting. Specifically, purpose leakage improves performance on data prior to the LLM's knowledge cutoff, but provides no advantage after the cutoff. This bias is strong enough that regularization of prompt instructions cannot fully address this form of overfitting. We further show that the bias can arise from users' unintentional conversational context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
