Parallel Hierarchical Transformer with Attention Alignment for Abstractive Multi-Document Summarization
Ye Ma, Lu Zong

TL;DR
This paper introduces a Parallel Hierarchical Transformer with attention alignment for multi-document summarization, improving coverage and quality of generated summaries by leveraging hierarchical attention and attention calibration.
Contribution
The study proposes a novel hierarchical Transformer architecture with attention alignment for MDS, enhancing dependency modeling and summary coverage over existing models.
Findings
Improved ROUGE scores over baselines.
Higher quality summaries in human evaluations.
Efficient processing with low computational cost.
Abstract
In comparison to single-document summarization, abstractive Multi-Document Summarization (MDS) brings challenges on the representation and coverage of its lengthy and linked sources. This study develops a Parallel Hierarchical Transformer (PHT) with attention alignment for MDS. By incorporating word- and paragraph-level multi-head attentions, the hierarchical architecture of PHT allows better processing of dependencies at both token and document levels. To guide the decoding towards a better coverage of the source documents, the attention-alignment mechanism is then introduced to calibrate beam search with predicted optimal attention distributions. Based on the WikiSum data, a comprehensive evaluation is conducted to test improvements on MDS by the proposed architecture. By better handling the inner- and cross-document information, results in both ROUGE and human evaluation suggest that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Test · Linear Layer · Dense Connections · Absolute Position Encodings · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Residual Connection
