Towards Understanding Large-Scale Discourse Structures in Pre-Trained   and Fine-Tuned Language Models

Patrick Huber; Giuseppe Carenini

arXiv:2204.04289·cs.CL·April 12, 2022·1 cites

Towards Understanding Large-Scale Discourse Structures in Pre-Trained and Fine-Tuned Language Models

Patrick Huber, Giuseppe Carenini

PDF

Open Access

TL;DR

This paper investigates how large pre-trained and fine-tuned language models understand discourse structures in long documents, introducing new methods to analyze and compare their internal representations of discourse.

Contribution

It presents a novel approach to infer discourse structures from long texts and analyzes the extent and accuracy of discourse information in BERT and BART models.

Findings

01

Discourse structures can be inferred from long documents using the proposed method.

02

Pre-trained models capture some discourse information, but with varying accuracy.

03

Generated discourse structures differ between models and from baseline structures.

Abstract

With a growing number of BERTology work analyzing different components of pre-trained language models, we extend this line of research through an in-depth analysis of discourse information in pre-trained and fine-tuned language models. We move beyond prior work along three dimensions: First, we describe a novel approach to infer discourse structures from arbitrarily long documents. Second, we propose a new type of analysis to explore where and how accurately intrinsic discourse is captured in the BERT and BART models. Finally, we assess how similar the generated structures are to a variety of baselines as well as their distribution within and between models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · WordPiece · Adam · Byte Pair Encoding · Dense Connections · Attention Dropout · Weight Decay