Beyond Turing Test: Can GPT-4 Sway Experts' Decisions?
Takehiro Takayanagi, Hiroya Takamura, Kiyoshi Izumi and, Chung-Chi Chen

TL;DR
This paper investigates how GPT-4-generated text influences decision-making among both experts and amateurs, proposing a new evaluation approach based on audience reactions rather than traditional indistinguishability metrics.
Contribution
It introduces a novel perspective on evaluating LLMs by analyzing their impact on human decisions and releases a dataset for future research in this area.
Findings
GPT-4 can significantly sway expert and amateur decisions.
High correlation between audience reactions and multi-dimensional evaluation metrics.
Proposes a new evaluation paradigm based on reader responses.
Abstract
In the post-Turing era, evaluating large language models (LLMs) involves assessing generated text based on readers' reactions rather than merely its indistinguishability from human-produced content. This paper explores how LLM-generated text impacts readers' decisions, focusing on both amateur and expert audiences. Our findings indicate that GPT-4 can generate persuasive analyses affecting the decisions of both amateurs and professionals. Furthermore, we evaluate the generated text from the aspects of grammar, convincingness, logical coherence, and usefulness. The results highlight a high correlation between real-world evaluation through audience reactions and the current multi-dimensional evaluators commonly used for generative models. Overall, this paper shows the potential and risk of using generated text to sway human decisions and also points out a new direction for evaluating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax · Layer Normalization · Dropout · Dense Connections
