TruthTensor: Evaluating LLMs through Human Imitation on Prediction Market under Drift and Holistic Reasoning
Shirin Shahabi, Spencer Graham, Haruna Isah

TL;DR
TruthTensor is a comprehensive evaluation framework for language models that assesses their reasoning and human imitation capabilities in dynamic, real-world environments like prediction markets, emphasizing robustness, calibration, and decision-making under drift.
Contribution
The paper introduces TruthTensor, a novel, reproducible evaluation paradigm that measures LLMs as human-imitation systems in socially-grounded, high-entropy environments, integrating probabilistic scoring and drift diagnostics.
Findings
Models with similar accuracy can differ in calibration and risk sensitivity.
Evaluation reveals divergence in model behavior under distribution shifts.
TruthTensor provides a holistic, reproducible assessment of LLMs in real-world scenarios.
Abstract
Evaluating language models and AI agents remains fundamentally challenging because static benchmarks fail to capture real-world uncertainty, distribution shift, and the gap between isolated task accuracy and human-aligned decision-making under evolving conditions. This paper introduces TruthTensor, a novel, reproducible evaluation paradigm that measures reasoning models not only as prediction engines but as human-imitation systems operating in socially-grounded, high-entropy environments. Building on forward-looking, contamination-free tasks, our framework anchors evaluation to live prediction markets and combines probabilistic scoring to provide a holistic view of model behavior. TruthTensor complements traditional correctness metrics with drift-centric diagnostics and explicit robustness checks for reproducibility. It specify human vs. automated evaluation roles, annotation protocols,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education
