Computing patient similarity based on unstructured clinical notes
Petr Zelina, Marko \v{R}eh\'a\v{c}ek, Jana Hal\'amkov\'a, Lucia Bohovicov\'a, Martin Rusinko, V\'it Nov\'a\v{c}ek

TL;DR
This paper presents a matrix-based method to compute patient similarity from unstructured clinical notes, enabling applications like personalized therapy and toxicity prediction.
Contribution
It introduces a novel approach that represents patients as matrices of embedded notes, facilitating robust similarity measures for clinical data analysis.
Findings
Effective similarity computation for 4,267 breast cancer patients
Matrix-based measures outperform traditional methods in certain facets
Method supports downstream clinical decision tasks
Abstract
Clinical notes hold rich yet unstructured details about diagnoses, treatments, and outcomes that are vital to precision medicine but hard to exploit at scale. We introduce a method that represents each patient as a matrix built from aggregated embeddings of all their notes, enabling robust patient similarity computation based on their latent low-rank representations. Using clinical notes of 4,267 Czech breast-cancer patients and expert similarity labels from Masaryk Memorial Cancer Institute, we evaluate several matrix-based similarity measures and analyze their strengths and limitations across different similarity facets, such as clinical history, treatment, and adverse events. The results demonstrate the usefulness of the presented method for downstream tasks, such as personalized therapy recommendations or toxicity warnings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Topic Modeling · AI in cancer detection
