Nutribullets Hybrid: Multi-document Health Summarization
Darsh J Shah, Lili Yu, Tao Lei, Regina Barzilay

TL;DR
This paper introduces a hybrid multi-document summarization method that effectively generates comparative health summaries by combining relation extraction, deterministic content aggregation, and language modeling, especially suited for domains with limited training data.
Contribution
The paper presents a novel hybrid approach for multi-document summarization that separates content selection from realization, enabling effective training with limited annotations in the health domain.
Findings
Produces more faithful and relevant summaries
Improves aggregation sensitivity in summaries
Maintains fluency comparable to traditional methods
Abstract
We present a method for generating comparative summaries that highlights similarities and contradictions in input documents. The key challenge in creating such summaries is the lack of large parallel training data required for training typical summarization systems. To this end, we introduce a hybrid generation approach inspired by traditional concept-to-text systems. To enable accurate comparison between different sources, the model first learns to extract pertinent relations from input documents. The content planning component uses deterministic operators to aggregate these relations after identifying a subset for inclusion into a summary. The surface realization component lexicalizes this information using a text-infilling language model. By separately modeling content selection and realization, we can effectively train them with limited annotations. We implemented and tested the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
