SigBERT: Combining Narrative Medical Reports and Rough Path Signature Theory for Survival Risk Estimation in Oncology
Paul Minchella, Lo\"ic Verlingue, St\'ephane Chr\'etien, R\'emi Vaucher, Guillaume Metzler

TL;DR
SigBERT introduces a novel framework combining narrative medical reports and rough path signature theory to improve survival risk estimation in oncology, effectively capturing complex temporal dynamics from textual data.
Contribution
The paper presents SigBERT, a new method that leverages signature extraction from rough path theory to process sequential medical reports for survival analysis, outperforming existing approaches.
Findings
Achieved a C-index of 0.75 on real-world oncology data.
Effectively captures temporal dynamics from narrative reports.
Enhances survival risk estimation accuracy.
Abstract
Electronic medical reports (EHR) contain a vast amount of information that can be leveraged for machine learning applications in healthcare. However, existing survival analysis methods often struggle to effectively handle the complexity of textual data, particularly in its sequential form. Here, we propose SigBERT, an innovative temporal survival analysis framework designed to efficiently process a large number of clinical reports per patient. SigBERT processes timestamped medical reports by extracting and averaging word embeddings into sentence embeddings. To capture temporal dynamics from the time series of sentence embedding coordinates, we apply signature extraction from rough path theory to derive geometric features for each patient, which significantly enhance survival model performance by capturing complex temporal dynamics. These features are then integrated into a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Machine Learning in Healthcare · Topic Modeling
