Contrasting Linguistic Patterns in Human and LLM-Generated News Text

Alberto Mu\~noz-Ortiz; Carlos G\'omez-Rodr\'iguez; David Vilares

arXiv:2308.09067·cs.CL·September 4, 2024·5 cites

Contrasting Linguistic Patterns in Human and LLM-Generated News Text

Alberto Mu\~noz-Ortiz, Carlos G\'omez-Rodr\'iguez, David Vilares

PDF

Open Access

TL;DR

This study quantitatively compares human and LLM-generated news texts across linguistic, emotional, and sociolinguistic dimensions, revealing significant measurable differences and biases in language use and emotional expression.

Contribution

It provides a comprehensive, multi-dimensional analysis of linguistic and emotional differences between human and various LLM-generated texts, highlighting biases and stylistic variations.

Findings

01

Humans show more varied sentence lengths and vocabulary.

02

Humans exhibit stronger negative emotions and less joy.

03

LLMs tend to use more numbers, symbols, and pronouns.

Abstract

We conduct a quantitative analysis contrasting human-written English news text with comparable large language model (LLM) output from six different LLMs that cover three different families and four sizes in total. Our analysis spans several measurable linguistic dimensions, including morphological, syntactic, psychometric, and sociolinguistic aspects. The results reveal various measurable differences between human and AI-generated texts. Human texts exhibit more scattered sentence length distributions, more variety of vocabulary, a distinct use of dependency and constituent types, shorter constituents, and more optimized dependency distances. Humans tend to exhibit stronger negative emotions (such as fear and disgust) and less joy compared to text generated by LLMs, with the toxicity of these models increasing as their size grows. LLM outputs use more numbers, symbols and auxiliaries…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification