Adding more data does not always help: A study in medical conversation summarization with PEGASUS
Varun Nair, Namit Katariya, Xavier Amatriain, Ilya Valmianski, Anitha, Kannan

TL;DR
This study investigates how dataset size impacts medical conversation summarization with PEGASUS and finds that performance plateaus, with active learning strategies offering no significant advantage over simple dataset expansion.
Contribution
It provides insights into the effects of dataset size and active learning strategies on medical summarization using PEGASUS, highlighting limitations in low-data regimes.
Findings
Model performance saturates as dataset size increases.
Active learning strategies perform similarly to simple dataset size increase.
Naive pseudo-labeling does not improve and may slightly worsen results.
Abstract
Medical conversation summarization is integral in capturing information gathered during interactions between patients and physicians. Summarized conversations are used to facilitate patient hand-offs between physicians, and as part of providing care in the future. Summaries, however, can be time-consuming to produce and require domain expertise. Modern pre-trained NLP models such as PEGASUS have emerged as capable alternatives to human summarization, reaching state-of-the-art performance on many summarization benchmarks. However, many downstream tasks still require at least moderately sized datasets to achieve satisfactory performance. In this work we (1) explore the effect of dataset size on transfer learning medical conversation summarization using PEGASUS and (2) evaluate various iterative labeling strategies in the low-data regime, following their success in the classification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
MethodsPEGASUS
