PULSAR: Pre-training with Extracted Healthcare Terms for Summarising   Patients' Problems and Data Augmentation with Black-box Large Language Models

Hao Li; Yuping Wu; Viktor Schlegel; Riza Batista-Navarro; Thanh-Tung; Nguyen; Abhinav Ramesh Kashyap; Xiaojun Zeng; Daniel Beck; Stefan Winkler,; Goran Nenadic

arXiv:2306.02754·cs.CL·June 6, 2023·5 cites

PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients' Problems and Data Augmentation with Black-box Large Language Models

Hao Li, Yuping Wu, Viktor Schlegel, Riza Batista-Navarro, Thanh-Tung, Nguyen, Abhinav Ramesh Kashyap, Xiaojun Zeng, Daniel Beck, Stefan Winkler,, Goran Nenadic

PDF

Open Access 1 Repo

TL;DR

This paper presents PULSAR, a novel method combining large language models for data augmentation and a specialized pre-training objective to improve automatic summarization of patient problems from medical notes, ranking second in a shared task.

Contribution

We introduce PULSAR, a new approach that integrates data augmentation with LLMs and a novel pre-training task for better summarization of patient problems.

Findings

01

Our model outperforms larger models by up to 3.1 points on unknown data.

02

PULSAR is more robust and effective in summarizing patient problems.

03

Ranked second in the BioNLP 2023 Shared Task 1A.

Abstract

Medical progress notes play a crucial role in documenting a patient's hospital journey, including his or her condition, treatment plan, and any updates for healthcare providers. Automatic summarisation of a patient's problems in the form of a problem list can aid stakeholders in understanding a patient's condition, reducing workload and cognitive bias. BioNLP 2023 Shared Task 1A focuses on generating a list of diagnoses and problems from the provider's progress notes during hospitalisation. In this paper, we introduce our proposed approach to this task, which integrates two complementary components. One component employs large language models (LLMs) for data augmentation; the other is an abstractive summarisation LLM with a novel pre-training objective for generating the patients' problems summarised as a list. Our approach was ranked second among all submissions to the shared task. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuping-wu/pulsar
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Natural Language Processing Techniques

MethodsTest