Does a Large Language Model Really Speak in Human-Like Language?
Mose Park, Yunjin Choi, Jong-June Jeon

TL;DR
This paper investigates whether large language models produce text that truly resembles human language by comparing their latent community structures to human-written text using a hypothesis testing framework.
Contribution
It introduces a novel statistical hypothesis testing method to compare the latent community structures of LLM-generated and human-written texts, revealing persistent differences.
Findings
GPT-generated text remains distinct from human text
The similarity between LLM and human text does not significantly increase with parameter adjustments
The proposed method effectively compares latent structures across text datasets
Abstract
Large Language Models (LLMs) have recently emerged, attracting considerable attention due to their ability to generate highly natural, human-like text. This study compares the latent community structures of LLM-generated text and human-written text within a hypothesis testing procedure. Specifically, we analyze three text sets: original human-written texts (), their LLM-paraphrased versions (), and a twice-paraphrased set () derived from . Our analysis addresses two key questions: (1) Is the difference in latent community structures between and the same as that between and ? (2) Does become more similar to as the LLM parameter controlling text variability is adjusted? The first question is based on the assumption that if LLM-generated text truly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training
