Leveraging Large Language Models for Knowledge-free Weak Supervision in   Clinical Natural Language Processing

Enshuo Hsu; Kirk Roberts

arXiv:2406.06723·cs.CL·April 2, 2025·1 cites

Leveraging Large Language Models for Knowledge-free Weak Supervision in Clinical Natural Language Processing

Enshuo Hsu, Kirk Roberts

PDF

Open Access

TL;DR

This paper introduces a method that fine-tunes large language models to generate weakly-labeled data for clinical NLP tasks, significantly reducing the need for extensive annotated datasets while maintaining high performance.

Contribution

It presents a novel approach combining fine-tuned LLMs with weak supervision to improve clinical NLP without relying on large domain-specific labeled data.

Findings

01

Weakly supervised models outperform traditional models with minimal gold data.

02

Using only 10 gold notes, models outperform PubMedBERT by up to 47.9% in F1 score.

03

Close to full performance achieved with just 50 gold notes.

Abstract

The performance of deep learning-based natural language processing systems is based on large amounts of labeled training data which, in the clinical domain, are not easily available or affordable. Weak supervision and in-context learning offer partial solutions to this issue, particularly using large language models (LLMs), but their performance still trails traditional supervised methods with moderate amounts of gold-standard data. In particular, inferencing with LLMs is computationally heavy. We propose an approach leveraging fine-tuning LLMs and weak supervision with virtually no domain knowledge that still achieves consistently dominant performance. Using a prompt-based approach, the LLM is used to generate weakly-labeled data for training a downstream BERT model. The weakly supervised model is then further fine-tuned on small amounts of gold standard data. We evaluate this approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging and Analysis · Topic Modeling · Artificial Intelligence in Healthcare and Education

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Linear Warmup With Linear Decay · Adam · Attention Dropout · Weight Decay · Linear Layer · Multi-Head Attention · Dropout