HIBRIDS: Attention with Hierarchical Biases for Structure-aware Long Document Summarization
Shuyang Cao, Lu Wang

TL;DR
HIBRIDS introduces hierarchical biases into Transformer attention to better encode document structure, improving long document summarization and hierarchical question-summary generation with new annotated datasets.
Contribution
The paper proposes HIBRIDS, a novel method injecting hierarchical biases into Transformer attention, and introduces a new hierarchical question-summary generation task with a labeled dataset.
Findings
HIBRIDS outperforms baselines in hierarchy quality and content coverage.
The model improves long-form summary generation measured by ROUGE scores.
Human judges favor HIBRIDS-generated hierarchies and summaries.
Abstract
Document structure is critical for efficient information consumption. However, it is challenging to encode it efficiently into the modern Transformer architecture. In this work, we present HIBRIDS, which injects Hierarchical Biases foR Incorporating Document Structure into the calculation of attention scores. We further present a new task, hierarchical question-summary generation, for summarizing salient content in the source document into a hierarchy of questions and summaries, where each follow-up question inquires about the content of its parent question-summary pair. We also annotate a new dataset with 6,153 question-summary hierarchies labeled on long government reports. Experiment results show that our model produces better question-summary hierarchies than comparisons on both hierarchy quality and content coverage, a finding also echoed by human judges. Additionally, our model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Residual Connection · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Label Smoothing · Dropout
