Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

Ruixiang Tang; Xiaotian Han; Xiaoqian Jiang; Xia Hu

arXiv:2303.04360·cs.CL·April 12, 2023·80 cites

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

Ruixiang Tang, Xiaotian Han, Xiaoqian Jiang, Xia Hu

PDF

Open Access

TL;DR

This paper explores using synthetic data generated by ChatGPT to improve clinical text mining, addressing privacy issues and enhancing model performance in extracting biological entities and relations from healthcare texts.

Contribution

The study introduces a novel training paradigm that leverages ChatGPT to generate labeled synthetic data for fine-tuning local models in clinical text mining tasks.

Findings

01

F1-score for named entity recognition improved from 23.37% to 63.99%.

02

F1-score for relation extraction increased from 75.86% to 83.59%.

03

Synthetic data generation reduces data collection time and privacy concerns.

Abstract

Recent advancements in large language models (LLMs) have led to the development of highly potent models like OpenAI's ChatGPT. These models have exhibited exceptional performance in a variety of tasks, such as question answering, essay composition, and code generation. However, their effectiveness in the healthcare sector remains uncertain. In this study, we seek to investigate the potential of ChatGPT to aid in clinical text mining by examining its ability to extract structured information from unstructured healthcare texts, with a focus on biological named entity recognition and relation extraction. However, our preliminary results indicate that employing ChatGPT directly for these tasks resulted in poor performance and raised privacy concerns associated with uploading patients' information to the ChatGPT API. To overcome these limitations, we propose a new training paradigm that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling