Understanding Neural Abstractive Summarization Models via Uncertainty
Jiacheng Xu, Shrey Desai, Greg Durrett

TL;DR
This paper investigates the uncertainty in neural abstractive summarization models, revealing how entropy relates to token copying, sentence position, and syntactic factors, and linking attention mechanisms to these uncertainties.
Contribution
It introduces an analysis of model uncertainty in summarization, connecting entropy to token copying behavior and attention, providing insights into model interpretability.
Findings
Low entropy correlates with token copying.
Uncertainty varies with sentence position and syntactic distance.
Attention mechanisms relate to observed uncertainty patterns.
Abstract
An advantage of seq2seq abstractive summarization models is that they generate text in a free-form manner, but this flexibility makes it difficult to interpret model behavior. In this work, we analyze summarization decoders in both blackbox and whitebox ways by studying on the entropy, or uncertainty, of the model's token-level predictions. For two strong pre-trained models, PEGASUS and BART on two summarization datasets, we find a strong correlation between low prediction entropy and where the model copies tokens rather than generating novel text. The decoder's uncertainty also connects to factors like sentence position and syntactic distance between adjacent pairs of tokens, giving a sense of what factors make a context particularly selective for the model's next output token. Finally, we study the relationship of decoder uncertainty and attention behavior to understand how attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsPEGASUS · Linear Layer · Adam · Byte Pair Encoding · Softmax · Layer Normalization · Dense Connections · Multi-Head Attention · Tanh Activation · Dropout
