Clinical Note Bloat Reduction for Efficient LLM Use

Jordan L. Cahoon; Chloe Stanwyck; Asad Aali; Rachel Madding; Emma Sun; Yixing Jiang; Renumathy Dhanasekaran; Emily Alsentzer

arXiv:2604.16364·cs.CY·April 21, 2026

Clinical Note Bloat Reduction for Efficient LLM Use

Jordan L. Cahoon, Chloe Stanwyck, Asad Aali, Rachel Madding, Emma Sun, Yixing Jiang, Renumathy Dhanasekaran, Emily Alsentzer

PDF

TL;DR

This paper presents TRACE, a scalable method to reduce clinical note bloat by leveraging EHR metadata and deduplication, significantly decreasing computational costs while maintaining model performance.

Contribution

The paper introduces TRACE, a novel preprocessing pipeline that effectively reduces note bloat in clinical texts using EHR metadata and deduplication techniques.

Findings

01

TRACE removed 47.3% of chart text

02

Preserved performance for information extraction and outcome prediction

03

Estimated $9.5 million annual cost savings at a medical center

Abstract

Health systems are rapidly deploying large language models (LLMs) that use clinical notes for clinical decision support applications. However, modern documentation practices rely heavily on templates, copy--paste shortcuts, and auto-populated fields, producing extensive duplicated text (``note bloat'') that dilutes clinically meaningful signal and substantially increases the computational cost of LLM use. We introduce TRACE, a scalable preprocessing pipeline that removes note bloat by leveraging EHR attribution metadata to identify templated and copied content and applying frequency-based deduplication when metadata are unavailable. We evaluated TRACE across four real--world clinical cohorts spanning liver transplantation, obstetrics, and inpatient care (5.3 million notes) using blinded physician review and downstream modeling tasks. TRACE removed 47.3% of chart text while preserving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.