A Novel Corpus of Discourse Structure in Humans and Computers
Babak Hemmatian, Sheridan Feucht, Rachel Avram, Alexander Wey, Muskaan, Garg, Kate Spitalnic, Carsten Eickhoff, Ellie Pavlick, Bjorn Sandstede,, Steven Sloman

TL;DR
This paper introduces a new annotated corpus of human and AI-generated texts, enabling detailed analysis of discourse structures and coherence, which can inform improvements in AI text generation quality.
Contribution
It provides a comprehensive, annotated corpus of discourse structures in both human and AI-generated texts, facilitating nuanced discourse analysis and comparison.
Findings
Shorter, less numerous clause relations correlate with lower perceived quality.
Incoherent clause relations are more common in lower-quality AI texts.
The corpus enables detailed analysis of discourse coherence in AI-generated content.
Abstract
We present a novel corpus of 445 human- and computer-generated documents, comprising about 27,000 clauses, annotated for semantic clause types and coherence relations that allow for nuanced comparison of artificial and natural discourse modes. The corpus covers both formal and informal discourse, and contains documents generated using fine-tuned GPT-2 (Zellers et al., 2019) and GPT-3(Brown et al., 2020). We showcase the usefulness of this corpus for detailed discourse analysis of text generation by providing preliminary evidence that less numerous, shorter and more often incoherent clause relations are associated with lower perceived quality of computer-generated narratives and arguments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Linear Warmup With Cosine Annealing · Residual Connection · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Attention Dropout · Layer Normalization
