Comparing LLM-generated and human-authored news text using formal syntactic theory

Olga Zamaraeva; Dan Flickinger; Francis Bond; Carlos G\'omez-Rodr\'iguez

arXiv:2506.01407·cs.CL·June 3, 2025

Comparing LLM-generated and human-authored news text using formal syntactic theory

Olga Zamaraeva, Dan Flickinger, Francis Bond, Carlos G\'omez-Rodr\'iguez

PDF

Open Access 1 Video

TL;DR

This paper compares LLM-generated news text with human-written NYT articles using formal syntactic analysis, revealing systematic grammatical differences and advancing understanding of LLM syntactic behavior.

Contribution

It introduces a formal syntactic comparison of LLM and human news texts using HPSG, highlighting key grammatical distinctions within the NYT genre.

Findings

01

Systematic grammatical differences between LLM and human texts

02

Distinct distributions of HPSG grammar types in LLM vs. human writing

03

Enhanced understanding of LLM syntactic behavior

Abstract

This study provides the first comprehensive comparison of New York Times-style text generated by six large language models against real, human-authored NYT writing. The comparison is based on a formal syntactic theory. We use Head-driven Phrase Structure Grammar (HPSG) to analyze the grammatical structure of the texts. We then investigate and illustrate the differences in the distributions of HPSG grammar types, revealing systematic distinctions between human and LLM-generated writing. These findings contribute to a deeper understanding of the syntactic behavior of LLMs as well as humans, within the NYT genre.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Comparing LLM-generated and human-authored news text using formal syntactic theory· underline

Taxonomy

TopicsNatural Language Processing Techniques