Do LLMs produce texts with "human-like" lexical diversity?
Kelly Kendro, Jeffrey Maloney, Scott Jarvis

TL;DR
This study compares lexical diversity in texts generated by various ChatGPT models with human writing, revealing that current models do not produce human-like lexical diversity and newer models are less human-like in this aspect.
Contribution
It provides a comprehensive analysis of lexical diversity in multiple ChatGPT models compared to human writers, highlighting differences and trends across model versions.
Findings
ChatGPT texts differ significantly from human texts in lexical diversity.
Newer models like ChatGPT-4.5 show higher lexical diversity than older models.
Models generally produce less human-like lexical diversity, especially the latest versions.
Abstract
The degree to which large language models (LLMs) produce writing that is truly human-like remains unclear despite the extensive empirical attention that this question has received. The present study addresses this question from the perspective of lexical diversity. Specifically, the study investigates patterns of lexical diversity in LLM-generated texts from four ChatGPT models (ChatGPT-3.5, ChatGPT-4, ChatGPT-o4 mini, and ChatGPT-4.5) in comparison with texts written by L1 and L2 English participants (n = 240) across four education levels. Six dimensions of lexical diversity were measured in each text: volume, abundance, variety-repetition, evenness, disparity, and dispersion. Results from one-way MANOVAs, one-way ANOVAs, and Support Vector Machines revealed that the ChatGPT-generated texts differed significantly from human-written texts for each variable, with ChatGPT-o4 mini and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
