Examining the rhetorical capacities of neural language models
Zining Zhu, Chuer Pan, Mohamed Abdalla, Frank Rudzicz

TL;DR
This paper introduces a method to evaluate the ability of neural language models to understand and encode rhetorical structures in discourse, revealing differences among models like BERT, GPT-2, and XLNet.
Contribution
It presents a novel quantitative approach to assess the rhetorical understanding of neural language models based on Rhetorical Structure Theory.
Findings
BERT-based models encode richer discourse knowledge.
GPT-2 and XLNet encode less rhetorical information.
The method provides a new way to measure rhetorical capacities.
Abstract
Recently, neural language models (LMs) have demonstrated impressive abilities in generating high-quality discourse. While many recent papers have analyzed the syntactic aspects encoded in LMs, there has been no analysis to date of the inter-sentential, rhetorical knowledge. In this paper, we propose a method that quantitatively evaluates the rhetorical capacities of neural LMs. We examine the capacities of neural LMs understanding the rhetoric of discourse by evaluating their abilities to encode a set of linguistic features derived from Rhetorical Structure Theory (RST). Our experiments show that BERT-based LMs outperform other Transformer LMs, revealing the richer discourse knowledge in their intermediate layer representations. In addition, GPT-2 and XLNet apparently encode less rhetorical knowledge, and we suggest an explanation drawing from linguistic philosophy. Our method shows an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Cosine Annealing · Discriminative Fine-Tuning · Weight Decay · Linear Warmup With Cosine Annealing · Attention Dropout · GPT-2 · Dense Connections
