HydraSum: Disentangling Stylistic Features in Text Summarization using Multi-Decoder Models
Tanya Goyal, Nazneen Fatema Rajani, Wenhao Liu, Wojciech, Kry\'sci\'nski

TL;DR
HydraSum introduces a multi-decoder summarization model that automatically learns and controls diverse summary styles without extra supervision, enabling flexible and stylistically varied outputs.
Contribution
The paper proposes HydraSum, a multi-decoder architecture that disentangles stylistic features in summarization, allowing explicit style control and diversity without additional supervision.
Findings
HydraSum outperforms baseline models on three datasets.
Decoders learn contrasting styles automatically.
Modified gating enforces specific style partitions.
Abstract
Summarization systems make numerous "decisions" about summary properties during inference, e.g. degree of copying, specificity and length of outputs, etc. However, these are implicitly encoded within model parameters and specific styles cannot be enforced. To address this, we introduce HydraSum, a new summarization architecture that extends the single decoder framework of current models to a mixture-of-experts version with multiple decoders. We show that HydraSum's multiple decoders automatically learn contrasting summary styles when trained under the standard training objective without any extra supervision. Through experiments on three summarization datasets (CNN, Newsroom and XSum), we show that HydraSum provides a simple mechanism to obtain stylistically-diverse summaries by sampling from either individual decoders or their mixtures, outperforming baseline models. Finally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Softmax · Dropout · Dense Connections · Layer Normalization · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia?
