Large Language Models Pass the Turing Test
Cameron R. Jones, Benjamin K. Bergen

TL;DR
This study empirically demonstrates that advanced large language models, notably GPT-4.5, can pass a standard Turing test by convincingly mimicking human conversation in controlled experiments.
Contribution
First empirical evidence showing that a large language model, GPT-4.5, can pass a standard Turing test in a controlled setting, highlighting progress in AI human-like conversational abilities.
Findings
GPT-4.5 was judged human 73% of the time.
LLaMa-3.1 was judged human 56% of the time.
Baseline models performed significantly below chance.
Abstract
We evaluated 4 systems (ELIZA, GPT-4o, LLaMa-3.1-405B, and GPT-4.5) in two randomised, controlled, and pre-registered Turing tests on independent populations. Participants had 5 minute conversations simultaneously with another human participant and one of these systems before judging which conversational partner they thought was human. When prompted to adopt a humanlike persona, GPT-4.5 was judged to be the human 73% of the time: significantly more often than interrogators selected the real human participant. LLaMa-3.1, with the same prompt, was judged to be the human 56% of the time -- not significantly more or less often than the humans they were being compared to -- while baseline models (ELIZA and GPT-4o) achieved win rates significantly below chance (23% and 21% respectively). The results constitute the first empirical evidence that any artificial system passes a standard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions · Language and cultural evolution · Artificial Intelligence in Healthcare and Education
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Multi-Head Attention · Dense Connections · Layer Normalization · Label Smoothing · Residual Connection · Adam · Dropout
