Does a Large Language Model Really Speak in Human-Like Language?

Mose Park; Yunjin Choi; Jong-June Jeon

arXiv:2501.01273·cs.CL·January 3, 2025

Does a Large Language Model Really Speak in Human-Like Language?

Mose Park, Yunjin Choi, Jong-June Jeon

PDF

Open Access

TL;DR

This paper investigates whether large language models produce text that truly resembles human language by comparing their latent community structures to human-written text using a hypothesis testing framework.

Contribution

It introduces a novel statistical hypothesis testing method to compare the latent community structures of LLM-generated and human-written texts, revealing persistent differences.

Findings

01

GPT-generated text remains distinct from human text

02

The similarity between LLM and human text does not significantly increase with parameter adjustments

03

The proposed method effectively compares latent structures across text datasets

Abstract

Large Language Models (LLMs) have recently emerged, attracting considerable attention due to their ability to generate highly natural, human-like text. This study compares the latent community structures of LLM-generated text and human-written text within a hypothesis testing procedure. Specifically, we analyze three text sets: original human-written texts ( $O$ ), their LLM-paraphrased versions ( $G$ ), and a twice-paraphrased set ( $S$ ) derived from $G$ . Our analysis addresses two key questions: (1) Is the difference in latent community structures between $O$ and $G$ the same as that between $G$ and $S$ ? (2) Does $G$ become more similar to $O$ as the LLM parameter controlling text variability is adjusted? The first question is based on the assumption that if LLM-generated text truly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training