Can Machines Imitate Humans? Integrative Turing-like tests for Language and Vision Demonstrate a Narrowing Gap

Mengmi Zhang; Elisa Pavarino; Xiao Liu; Giorgia Dellaferrera; Ankur Sikarwar; Caishun Chen; Marcelo Armendariz; Noga Mudrik; Prachi Agrawal; Spandan Madan; Mranmay Shetty; Andrei Barbu; Haochen Yang; Tanishq Kumar; Shui'Er Han; Aman Raj Singh; Meghna Sadwani; Stella Dellaferrera; Michele Pizzochero; Brandon Tang; Yew Soon Ong; Hanspeter Pfister; Gabriel Kreiman

arXiv:2211.13087·cs.CV·September 9, 2025

Can Machines Imitate Humans? Integrative Turing-like tests for Language and Vision Demonstrate a Narrowing Gap

Mengmi Zhang, Elisa Pavarino, Xiao Liu, Giorgia Dellaferrera, Ankur Sikarwar, Caishun Chen, Marcelo Armendariz, Noga Mudrik, Prachi Agrawal, Spandan Madan, Mranmay Shetty, Andrei Barbu, Haochen Yang, Tanishq Kumar, Shui'Er Han, Aman Raj Singh, Meghna Sadwani, Stella Dellaferrera

PDF

Open Access

TL;DR

This study benchmarks AI's ability to imitate humans across language and vision tasks, revealing that current AI systems are nearing human-like deception capabilities and emphasizing the importance of imitation as an independent evaluation metric.

Contribution

The paper introduces large-scale datasets and Turing-like tests for evaluating AI's human-likeness in language and vision, providing new benchmarks and insights into AI imitation abilities.

Findings

01

AI can convincingly imitate humans in language and vision tasks.

02

Simple AI judges outperform humans in detecting AI responses.

03

Imitation ability is weakly correlated with traditional AI performance metrics.

Abstract

As AI becomes increasingly embedded in daily life, ascertaining whether an agent is human is critical. We systematically benchmark AI's ability to imitate humans in three language tasks (image captioning, word association, conversation) and three vision tasks (color estimation, object detection, attention prediction), collecting data from 636 humans and 37 AI agents. Next, we conducted 72,191 Turing-like tests with 1,916 human judges and 10 AI judges. Current AIs are approaching the ability to convincingly impersonate humans and deceive human judges in both language and vision. Even simple AI judges outperformed humans in distinguishing AI from human responses. Imitation ability showed minimal correlation with conventional AI performance metrics, suggesting that passing as human is an important independent evaluation criterion. The large-scale Turing datasets and metrics introduced here…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace Recognition and Perception · Visual Attention and Saliency Detection · Domain Adaptation and Few-Shot Learning

MethodsTest