Towards Understanding Large-Scale Discourse Structures in Pre-Trained and Fine-Tuned Language Models
Patrick Huber, Giuseppe Carenini

TL;DR
This paper investigates how large pre-trained and fine-tuned language models understand discourse structures in long documents, introducing new methods to analyze and compare their internal representations of discourse.
Contribution
It presents a novel approach to infer discourse structures from long texts and analyzes the extent and accuracy of discourse information in BERT and BART models.
Findings
Discourse structures can be inferred from long documents using the proposed method.
Pre-trained models capture some discourse information, but with varying accuracy.
Generated discourse structures differ between models and from baseline structures.
Abstract
With a growing number of BERTology work analyzing different components of pre-trained language models, we extend this line of research through an in-depth analysis of discourse information in pre-trained and fine-tuned language models. We move beyond prior work along three dimensions: First, we describe a novel approach to infer discourse structures from arbitrarily long documents. Second, we propose a new type of analysis to explore where and how accurately intrinsic discourse is captured in the BERT and BART models. Finally, we assess how similar the generated structures are to a variety of baselines as well as their distribution within and between models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · WordPiece · Adam · Byte Pair Encoding · Dense Connections · Attention Dropout · Weight Decay
