Comparing LLM-generated and human-authored news text using formal syntactic theory
Olga Zamaraeva, Dan Flickinger, Francis Bond, Carlos G\'omez-Rodr\'iguez

TL;DR
This paper compares LLM-generated news text with human-written NYT articles using formal syntactic analysis, revealing systematic grammatical differences and advancing understanding of LLM syntactic behavior.
Contribution
It introduces a formal syntactic comparison of LLM and human news texts using HPSG, highlighting key grammatical distinctions within the NYT genre.
Findings
Systematic grammatical differences between LLM and human texts
Distinct distributions of HPSG grammar types in LLM vs. human writing
Enhanced understanding of LLM syntactic behavior
Abstract
This study provides the first comprehensive comparison of New York Times-style text generated by six large language models against real, human-authored NYT writing. The comparison is based on a formal syntactic theory. We use Head-driven Phrase Structure Grammar (HPSG) to analyze the grammatical structure of the texts. We then investigate and illustrate the differences in the distributions of HPSG grammar types, revealing systematic distinctions between human and LLM-generated writing. These findings contribute to a deeper understanding of the syntactic behavior of LLMs as well as humans, within the NYT genre.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
