The simulation of judgment in LLMs
Edoardo Loru, Jacopo Nudo, Niccol\`o Di Marco, Alessandro Santirocchi, Roberto Atzeni, Matteo Cinelli, Vincenzo Cestari, Clelia Rossi-Arnaud, Walter Quattrociocchi

TL;DR
This paper investigates how large language models perform evaluative judgments compared to humans, revealing differences in criteria and reasoning that could impact their role in information assessment.
Contribution
It introduces a structured framework for comparing LLMs and humans in evaluative tasks and highlights systematic differences in their reasoning processes.
Findings
LLMs rely more on lexical associations and statistical priors.
Differences in evaluation criteria between models and humans.
Identification of a phenomenon called epistemia, where surface plausibility mimics reliability.
Abstract
Large Language Models (LLMs) are increasingly embedded in evaluative processes, from information filtering to assessing and addressing knowledge gaps through explanation and credibility judgments. This raises the need to examine how such evaluations are built, what assumptions they rely on, and how their strategies diverge from those of humans. We benchmark six LLMs against expert ratings--NewsGuard and Media Bias/Fact Check--and against human judgments collected through a controlled experiment. We use news domains purely as a controlled benchmark for evaluative tasks, focusing on the underlying mechanisms rather than on news classification per se. To enable direct comparison, we implement a structured agentic framework in which both models and nonexpert participants follow the same evaluation procedure: selecting criteria, retrieving content, and producing justifications. Despite…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLaw, AI, and Intellectual Property · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI
MethodsLLaMA
