TURINGBENCH: A Benchmark Environment for Turing Test in the Age of   Neural Text Generation

Adaku Uchendu; Zeyu Ma; Thai Le; Rui Zhang; and Dongwon Lee

arXiv:2109.13296·cs.CL·September 29, 2021

TURINGBENCH: A Benchmark Environment for Turing Test in the Age of Neural Text Generation

Adaku Uchendu, Zeyu Ma, Thai Le, Rui Zhang, and Dongwon Lee

PDF

3 Repos

TL;DR

TuringBench is a comprehensive benchmark environment with datasets, tasks, and leaderboards designed to evaluate and distinguish machine-generated texts from human-written ones, addressing the need for systematic Turing Test studies in neural text generation.

Contribution

It introduces the first benchmark environment for the Turing Test in neural text generation, including a large dataset, two benchmark tasks, and a public leaderboard.

Findings

01

FAIR_wmt20 and GPT-3 produce the most human-like texts

02

State-of-the-art TT detectors struggle to distinguish these texts

03

Benchmark provides a new standard for evaluating neural text generation models

Abstract

Recent progress in generative language models has enabled machines to generate astonishingly realistic texts. While there are many legitimate applications of such models, there is also a rising need to distinguish machine-generated texts from human-written ones (e.g., fake news detection). However, to our best knowledge, there is currently no benchmark environment with datasets and tasks to systematically study the so-called "Turing Test" problem for neural text generation methods. In this work, we present the TuringBench benchmark environment, which is comprised of (1) a dataset with 200K human- or machine-generated samples across 20 labels {Human, GPT-1, GPT-2_small, GPT-2_medium, GPT-2_large, GPT-2_xl, GPT-2_PyTorch, GPT-3, GROVER_base, GROVER_large, GROVER_mega, CTRL, XLM, XLNET_base, XLNET_large, FAIR_wmt19, FAIR_wmt20, TRANSFORMER_XL, PPLM_distil, PPLM_gpt2}, (2) two benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Test · Linear Layer · Gradient Clipping · Linear Warmup · Dropout · Layer Normalization