Large Language Models Pass the Turing Test

Cameron R. Jones; Benjamin K. Bergen

arXiv:2503.23674·cs.CL·April 1, 2025·6 cites

Large Language Models Pass the Turing Test

Cameron R. Jones, Benjamin K. Bergen

PDF

Open Access

TL;DR

This study empirically demonstrates that advanced large language models, notably GPT-4.5, can pass a standard Turing test by convincingly mimicking human conversation in controlled experiments.

Contribution

First empirical evidence showing that a large language model, GPT-4.5, can pass a standard Turing test in a controlled setting, highlighting progress in AI human-like conversational abilities.

Findings

01

GPT-4.5 was judged human 73% of the time.

02

LLaMa-3.1 was judged human 56% of the time.

03

Baseline models performed significantly below chance.

Abstract

We evaluated 4 systems (ELIZA, GPT-4o, LLaMa-3.1-405B, and GPT-4.5) in two randomised, controlled, and pre-registered Turing tests on independent populations. Participants had 5 minute conversations simultaneously with another human participant and one of these systems before judging which conversational partner they thought was human. When prompted to adopt a humanlike persona, GPT-4.5 was judged to be the human 73% of the time: significantly more often than interrogators selected the real human participant. LLaMa-3.1, with the same prompt, was judged to be the human 56% of the time -- not significantly more or less often than the humans they were being compared to -- while baseline models (ELIZA and GPT-4o) achieved win rates significantly below chance (23% and 21% respectively). The results constitute the first empirical evidence that any artificial system passes a standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in Service Interactions · Language and cultural evolution · Artificial Intelligence in Healthcare and Education

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Multi-Head Attention · Dense Connections · Layer Normalization · Label Smoothing · Residual Connection · Adam · Dropout