Playing with words: Comparing the vocabulary and lexical diversity of ChatGPT and humans
Pedro Reviriego, Javier Conde, Elena Merino-G\'omez, Gonzalo, Mart\'inez, Jos\'e Alberto Hern\'andez

TL;DR
This study compares the vocabulary and lexical diversity of ChatGPT and humans across different tasks, finding that ChatGPT uses fewer distinct words and has lower lexical richness, with implications for language evolution and AI development.
Contribution
It provides an initial analysis of lexical differences between ChatGPT and humans, highlighting potential impacts on language use and evolution due to AI-generated text.
Findings
ChatGPT uses fewer distinct words than humans.
ChatGPT exhibits lower lexical richness.
Results are preliminary and require further validation.
Abstract
The introduction of Artificial Intelligence (AI) generative language models such as GPT (Generative Pre-trained Transformer) and tools such as ChatGPT has triggered a revolution that can transform how text is generated. This has many implications, for example, as AI-generated text becomes a significant fraction of the text, would this have an effect on the language capabilities of readers and also on the training of newer AI tools? Would it affect the evolution of languages? Focusing on one specific aspect of the language: words; will the use of tools such as ChatGPT increase or reduce the vocabulary used or the lexical richness? This has implications for words, as those not included in AI-generated content will tend to be less and less popular and may eventually be lost. In this work, we perform an initial comparison of the vocabulary and lexical richness of ChatGPT and humans when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Natural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Discriminative Fine-Tuning · Adam · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Layer Normalization · Linear Layer · Residual Connection
