Clinical semantics for lung cancer prediction
Luis H. John, Jan A. Kors, Jenna M. Reps, Peter R. Rijnbeek, Egill A. Fridgeirsson

TL;DR
This study enhances lung cancer prediction by embedding clinical knowledge from SNOMED into hyperbolic space and integrating it into deep learning models, leading to improved discrimination and calibration.
Contribution
It introduces a novel method of using Poincaré embeddings of SNOMED hierarchy within deep learning models for better clinical prediction.
Findings
Modest improvements in discrimination with Poincaré embeddings.
Enhanced calibration in ResNet models using 10D hyperbolic embeddings.
Stable calibration in Transformer models across configurations.
Abstract
Background: Existing clinical prediction models often represent patient data using features that ignore the semantic relationships between clinical concepts. This study integrates domain-specific semantic information by mapping the SNOMED medical term hierarchy into a low-dimensional hyperbolic space using Poincar\'e embeddings, with the aim of improving lung cancer onset prediction. Methods: Using a retrospective cohort from the Optum EHR dataset, we derived a clinical knowledge graph from the SNOMED taxonomy and generated Poincar\'e embeddings via Riemannian stochastic gradient descent. These embeddings were then incorporated into two deep learning architectures, a ResNet and a Transformer model. Models were evaluated for discrimination (area under the receiver operating characteristic curve) and calibration (average absolute difference between observed and predicted probabilities)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Machine Learning in Healthcare · Topic Modeling
