Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study
Gonzalo Mart\'inez, Jos\'e Alberto Hern\'andez, Javier Conde, Pedro, Reviriego, Elena Merino

TL;DR
This paper investigates the lexical diversity of conversational LLMs, specifically ChatGPT, highlighting how linguistic features vary with model parameters and emphasizing the importance of evaluating language use in AI-generated texts.
Contribution
It introduces a methodology for assessing lexical richness in LLM outputs and provides a comprehensive analysis of how model parameters influence linguistic diversity.
Findings
Lexical richness varies with ChatGPT versions and parameters
Presence penalty affects diversity of generated text
Role assignment impacts linguistic features
Abstract
The performance of conversational Large Language Models (LLMs) in general, and of ChatGPT in particular, is currently being evaluated on many different tasks, from logical reasoning or maths to answering questions on a myriad of topics. Instead, much less attention is being devoted to the study of the linguistic features of the texts generated by these LLMs. This is surprising since LLMs are models for language, and understanding how they use the language is important. Indeed, conversational LLMs are poised to have a significant impact on the evolution of languages as they may eventually dominate the creation of new text. This means that for example, if conversational LLMs do not use a word it may become less and less frequent and eventually stop being used altogether. Therefore, evaluating the linguistic features of the text they produce and how those depend on the model parameters is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
