TruthTensor: Evaluating LLMs through Human Imitation on Prediction Market under Drift and Holistic Reasoning

Shirin Shahabi; Spencer Graham; Haruna Isah

arXiv:2601.13545·cs.AI·January 27, 2026

TruthTensor: Evaluating LLMs through Human Imitation on Prediction Market under Drift and Holistic Reasoning

Shirin Shahabi, Spencer Graham, Haruna Isah

PDF

Open Access 1 Datasets

TL;DR

TruthTensor is a comprehensive evaluation framework for language models that assesses their reasoning and human imitation capabilities in dynamic, real-world environments like prediction markets, emphasizing robustness, calibration, and decision-making under drift.

Contribution

The paper introduces TruthTensor, a novel, reproducible evaluation paradigm that measures LLMs as human-imitation systems in socially-grounded, high-entropy environments, integrating probabilistic scoring and drift diagnostics.

Findings

01

Models with similar accuracy can differ in calibration and risk sensitivity.

02

Evaluation reveals divergence in model behavior under distribution shifts.

03

TruthTensor provides a holistic, reproducible assessment of LLMs in real-world scenarios.

Abstract

Evaluating language models and AI agents remains fundamentally challenging because static benchmarks fail to capture real-world uncertainty, distribution shift, and the gap between isolated task accuracy and human-aligned decision-making under evolving conditions. This paper introduces TruthTensor, a novel, reproducible evaluation paradigm that measures reasoning models not only as prediction engines but as human-imitation systems operating in socially-grounded, high-entropy environments. Building on forward-looking, contamination-free tasks, our framework anchors evaluation to live prediction markets and combines probabilistic scoring to provide a holistic view of model behavior. TruthTensor complements traditional correctness metrics with drift-centric diagnostics and explicit robustness checks for reproducibility. It specify human vs. automated evaluation roles, annotation protocols,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

inferencelabs/TruthTensor
dataset· 30 dl
30 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education