A Statistical Turing Test for Generative Models
Hayden Helm, Carey E. Priebe, Weiwei Yang

TL;DR
This paper introduces a statistical framework to evaluate how closely AI-generated content resembles human content, aiding in assessing the progression of generative models towards human-like abilities.
Contribution
It provides a formal statistical approach to quantify differences between human and machine-generated content, enabling systematic evaluation of generative model progress.
Findings
Framework effectively measures content similarity
Current methods are contextualized within the framework
Facilitates evaluation of generative models' human-likeness
Abstract
The emergence of human-like abilities of AI systems for content generation in domains such as text, audio, and vision has prompted the development of classifiers to determine whether content originated from a human or a machine. Implicit in these efforts is an assumption that the generation properties of a human are different from that of the machine. In this work, we provide a framework in the language of statistical pattern recognition that quantifies the difference between the distributions of human and machine-generated content conditioned on an evaluation context. We describe current methods in the context of the framework and demonstrate how to use the framework to evaluate the progression of generative models towards human-like capabilities, among many axes of analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Music and Audio Processing
