Predicting Discourse Trees from Transformer-based Neural Summarizers
Wen Xiao, Patrick Huber, Giuseppe Carenini

TL;DR
This paper demonstrates that transformer-based neural summarizers inherently learn and encode discourse structures, which can be extracted from their self-attention matrices, revealing a bidirectional relationship between discourse understanding and summarization.
Contribution
It introduces a method to infer discourse trees from pre-trained summarizers' self-attention, showing they encode both dependency and constituency discourse information.
Findings
Summarizers learn discourse structures in their self-attention matrices.
Discourse information is encoded in a single attention head.
The learned discourse representations are transferable across domains.
Abstract
Previous work indicates that discourse information benefits summarization. In this paper, we explore whether this synergy between discourse and summarization is bidirectional, by inferring document-level discourse trees from pre-trained neural summarizers. In particular, we generate unlabeled RST-style discourse trees from the self-attention matrices of the transformer model. Experiments across models and datasets reveal that the summarizer learns both, dependency- and constituency-style discourse information, which is typically encoded in a single head, covering long- and short-distance discourse dependencies. Overall, the experimental results suggest that the learned discourse information is general and transferable inter-domain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
