Estimating Redundancy in Clinical Text

Thomas Searle; Zina Ibrahim; James Teo; Richard JB Dobson

arXiv:2105.11832·cs.CL·February 28, 2023

Estimating Redundancy in Clinical Text

Thomas Searle, Zina Ibrahim, James Teo, Richard JB Dobson

PDF

1 Repo

TL;DR

This paper quantifies redundancy in clinical notes using information-theoretic and semantic models, revealing significant duplication and inefficiency in clinical language models compared to open-domain models.

Contribution

It introduces two novel strategies to measure clinical text redundancy and evaluates their effectiveness using large-scale clinical datasets and language models.

Findings

01

Clinical text is 1.5 to 3 times less efficient for language models than open-domain text.

02

Manual evaluation shows high correlation between redundancy measures and actual text duplication.

03

Redundancy measures can help improve clinical documentation and NLP applications.

Abstract

The current mode of use of Electronic Health Record (EHR) elicits text redundancy. Clinicians often populate new documents by duplicating existing notes, then updating accordingly. Data duplication can lead to a propagation of errors, inconsistencies and misreporting of care. Therefore, quantifying information redundancy can play an essential role in evaluating innovations that operate on clinical narratives. This work is a quantitative examination of information redundancy in EHR notes. We present and evaluate two strategies to measure redundancy: an information-theoretic approach and a lexicosyntactic and semantic model. We evaluate the measures by training large Transformer-based language models using clinical text from a large openly available US-based ICU dataset and a large multi-site UK based Trust. By comparing the information-theoretic content of the trained models with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tomolopolis/clinical_sum
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.