Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models

Kaiqu Liang; Haimin Hu; Xuandong Zhao; Dawn Song; Thomas L. Griffiths; Jaime Fern\'andez Fisac

arXiv:2507.07484·cs.CL·July 11, 2025·2 cites

Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models

Kaiqu Liang, Haimin Hu, Xuandong Zhao, Dawn Song, Thomas L. Griffiths, Jaime Fern\'andez Fisac

PDF

Open Access 1 Datasets

TL;DR

This paper introduces the concept of machine bullshit in large language models, proposing a new metric and taxonomy to characterize and evaluate the phenomenon, revealing that certain training methods and prompting strategies increase untruthful outputs.

Contribution

It presents the Bullshit Index and a taxonomy of bullshit types, along with empirical evaluations on multiple datasets and benchmarks, highlighting factors that exacerbate machine bullshit in LLMs.

Findings

01

RLHF training increases bullshit prevalence

02

Chain-of-thought prompting amplifies specific bullshit types

03

Prevalent use of weasel words in political contexts

Abstract

Bullshit, as conceptualized by philosopher Harry Frankfurt, refers to statements made without regard to their truth value. While previous work has explored large language model (LLM) hallucination and sycophancy, we propose machine bullshit as an overarching conceptual framework that can allow researchers to characterize the broader phenomenon of emergent loss of truthfulness in LLMs and shed light on its underlying mechanisms. We introduce the Bullshit Index, a novel metric quantifying LLMs' indifference to truth, and propose a complementary taxonomy analyzing four qualitative forms of bullshit: empty rhetoric, paltering, weasel words, and unverified claims. We conduct empirical evaluations on the Marketplace dataset, the Political Neutrality dataset, and our new BullshitEval benchmark (2,400 scenarios spanning 100 AI assistants) explicitly designed to evaluate machine bullshit. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

kaiquliang/BullshitEval
dataset· 26 dl
26 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Hate Speech and Cyberbullying Detection · Computational and Text Analysis Methods