Computational Turing Test Reveals Systematic Differences Between Human and AI Language

Nicol\`o Pagan; Petter T\"ornberg; Christopher A. Bail; Anik\'o Hann\'ak; Christopher Barrie

arXiv:2511.04195·cs.CL·November 26, 2025

Computational Turing Test Reveals Systematic Differences Between Human and AI Language

Nicol\`o Pagan, Petter T\"ornberg, Christopher A. Bail, Anik\'o Hann\'ak, Christopher Barrie

PDF

Open Access

TL;DR

This paper introduces a computational Turing test combining metrics and linguistic features to evaluate how well large language models mimic human language, revealing persistent distinguishability and trade-offs in calibration strategies.

Contribution

It presents a novel validation framework for assessing LLM realism and systematically compares multiple models and calibration methods against human language benchmarks.

Findings

01

LLMs remain distinguishable from humans even after calibration.

02

Instruction tuning can reduce model performance in mimicking human-like affect.

03

Scaling model size does not necessarily improve human-likeness.

Abstract

Large language models (LLMs) are increasingly used in the social sciences to simulate human behavior, based on the assumption that they can generate realistic, human-like text. Yet this assumption remains largely untested. Existing validation efforts rely heavily on human-judgment-based evaluations -- testing whether humans can distinguish AI from human output -- despite evidence that such judgments are blunt and unreliable. As a result, the field lacks robust tools for assessing the realism of LLM-generated text or for calibrating models to real-world data. This paper makes two contributions. First, we introduce a computational Turing test: a validation framework that integrates aggregate metrics (BERT-based detectability and semantic similarity) with interpretable linguistic features (stylistic markers and topical patterns) to assess how closely LLMs approximate human language within…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing · Artificial Intelligence in Healthcare and Education · Computational and Text Analysis Methods