X-TURING: Towards an Enhanced and Efficient Turing Test for Long-Term Dialogue Agents

Weiqi Wu; Hongqiu Wu; Hai Zhao

arXiv:2408.09853·cs.CL·May 30, 2025

X-TURING: Towards an Enhanced and Efficient Turing Test for Long-Term Dialogue Agents

Weiqi Wu, Hongqiu Wu, Hai Zhao

PDF

Open Access 1 Video

TL;DR

This paper proposes X-Turing, an improved Turing test for long-term dialogue agents that uses burst dialogues and pseudo-dialogues to better evaluate AI human-likeness over extended interactions, reducing human effort.

Contribution

It introduces X-Turing with burst dialogue patterns and pseudo-dialogues, along with the X-Turn Pass-Rate metric, to more effectively assess long-term AI conversational capabilities.

Findings

01

LLMs like GPT-4 achieve initial pass rates of 51.9% at 3 turns and 38.9% at 10 turns.

02

Performance of LLMs declines over longer dialogues, highlighting challenges in maintaining consistency.

03

X-Turing reduces human workload in evaluating long-term dialogue agents.

Abstract

The Turing test examines whether AIs exhibit human-like behaviour in natural language conversations. The traditional setting limits each participant to one message at a time and requires constant human participation. This fails to reflect a natural conversational style and hinders the evaluation of dialogue agents based on Large Language Models (LLMs) in complex and prolonged interactions. This paper proposes \textbf{\textsc{X-Turing}}, which enhances the original test with a \textit{burst dialogue} pattern, allowing more dynamic exchanges using consecutive messages. It further reduces human workload by iteratively generating dialogues that simulate the long-term interaction between the agent and a human to compose the majority of the test process. With the \textit{pseudo-dialogue} history, the agent then engages in a shorter dialogue with a real human, which is paired with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

X-TURING: Towards an Enhanced and Efficient Turing Test for Long-Term Dialogue Agents· underline

Taxonomy

TopicsTopic Modeling · Machine Learning and Algorithms · Ferroelectric and Negative Capacitance Devices

MethodsLinear Layer · Residual Connection · Layer Normalization · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Attention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Softmax