LLM-Based Section Identifiers Excel on Open Source but Stumble in Real   World Applications

Saranya Krishnamoorthy; Ayush Singh; Shabnam Tafreshi

arXiv:2404.16294·cs.CL·April 26, 2024

LLM-Based Section Identifiers Excel on Open Source but Stumble in Real World Applications

Saranya Krishnamoorthy, Ayush Singh, Shabnam Tafreshi

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the effectiveness of large language models, especially GPT-4, in identifying relevant sections in electronic health records, showing strong performance in controlled settings but challenges in real-world applications.

Contribution

It demonstrates GPT-4's superior zero and few-shot performance in section identification for EHRs and highlights the gap in real-world effectiveness, proposing new benchmarks.

Findings

01

GPT-4 outperforms state-of-the-art methods in controlled settings.

02

GPT-4 struggles on real-world, annotated datasets.

03

Zero and few-shot capabilities are promising but not sufficient for practical use.

Abstract

Electronic health records (EHR) even though a boon for healthcare practitioners, are growing convoluted and longer every day. Sifting around these lengthy EHRs is taxing and becomes a cumbersome part of physician-patient interaction. Several approaches have been proposed to help alleviate this prevalent issue either via summarization or sectioning, however, only a few approaches have truly been helpful in the past. With the rise of automated methods, machine learning (ML) has shown promise in solving the task of identifying relevant sections in EHR. However, most ML methods rely on labeled data which is difficult to get in healthcare. Large language models (LLMs) on the other hand, have performed impressive feats in natural language processing (NLP), that too in a zero-shot manner, i.e. without any labeled data. To that end, we propose using LLMs to identify relevant section headers. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

inqbator-evicore/llm_section_identifiers
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducational Technology and Assessment · Machine Learning and Data Classification

MethodsAttention Is All You Need · Dense Connections · Label Smoothing · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Linear Layer · Byte Pair Encoding · Absolute Position Encodings · Adam