Exploring Optimal Granularity for Extractive Summarization of Unstructured Health Records: Analysis of the Largest Multi-Institutional Archive of Health Records in Japan
Kenichiro Ando, Takashi Okumura, Mamoru Komachi, Hiromasa Horiguchi,, Yuji Matsumoto

TL;DR
This study investigates the optimal level of detail for extractive summarization of unstructured health records, finding that clinical segments provide the best accuracy among tested granularities, aiding automated discharge summary generation.
Contribution
It introduces a method to define and automatically split clinical segments, and compares their effectiveness with sentences and clauses for extractive summarization in medical texts.
Findings
Clinical segments outperform sentences and clauses in summarization accuracy.
Machine learning-based splitting of clinical segments achieved high F1 score of 0.846.
Finer granularity (clinical segments) improves extractive summarization of health records.
Abstract
Automated summarization of clinical texts can reduce the burden of medical professionals. "Discharge summaries" are one promising application of the summarization, because they can be generated from daily inpatient records. Our preliminary experiment suggests that 20-31% of the descriptions in discharge summaries overlap with the content of the inpatient records. However, it remains unclear how the summaries should be generated from the unstructured source. To decompose the physician's summarization process, this study aimed to identify the optimal granularity in summarization. We first defined three types of summarization units with different granularities to compare the performance of the discharge summary generation: whole sentences, clinical segments, and clauses. We defined clinical segments in this study, aiming to express the smallest medically meaningful concepts. To obtain the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
