Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models
Kaiqu Liang, Haimin Hu, Xuandong Zhao, Dawn Song, Thomas L. Griffiths, Jaime Fern\'andez Fisac

TL;DR
This paper introduces the concept of machine bullshit in large language models, proposing a new metric and taxonomy to characterize and evaluate the phenomenon, revealing that certain training methods and prompting strategies increase untruthful outputs.
Contribution
It presents the Bullshit Index and a taxonomy of bullshit types, along with empirical evaluations on multiple datasets and benchmarks, highlighting factors that exacerbate machine bullshit in LLMs.
Findings
RLHF training increases bullshit prevalence
Chain-of-thought prompting amplifies specific bullshit types
Prevalent use of weasel words in political contexts
Abstract
Bullshit, as conceptualized by philosopher Harry Frankfurt, refers to statements made without regard to their truth value. While previous work has explored large language model (LLM) hallucination and sycophancy, we propose machine bullshit as an overarching conceptual framework that can allow researchers to characterize the broader phenomenon of emergent loss of truthfulness in LLMs and shed light on its underlying mechanisms. We introduce the Bullshit Index, a novel metric quantifying LLMs' indifference to truth, and propose a complementary taxonomy analyzing four qualitative forms of bullshit: empty rhetoric, paltering, weasel words, and unverified claims. We conduct empirical evaluations on the Marketplace dataset, the Political Neutrality dataset, and our new BullshitEval benchmark (2,400 scenarios spanning 100 AI assistants) explicitly designed to evaluate machine bullshit. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Hate Speech and Cyberbullying Detection · Computational and Text Analysis Methods
