Med-gte-hybrid: A contextual embedding transformer model for extracting actionable information from clinical texts
Aditya Kumar, Simon Rauch, Mario Cypko, Oliver Amft

TL;DR
This paper presents med-gte-hybrid, a novel transformer-based model for extracting actionable insights from unstructured clinical texts, improving predictive accuracy and patient stratification in healthcare applications.
Contribution
The paper introduces med-gte-hybrid, a new contextual embedding model combining contrastive learning and autoencoder tuning, tailored for clinical text analysis and outperforming existing models.
Findings
Improves performance on clinical prediction tasks like CKD prognosis and mortality.
Enhances patient stratification, clustering, and text retrieval.
Outperforms state-of-the-art models on the MTEB benchmark.
Abstract
We introduce a novel contextual embedding model med-gte-hybrid that was derived from the gte-large sentence transformer to extract information from unstructured clinical narratives. Our model tuning strategy for med-gte-hybrid combines contrastive learning and a denoising autoencoder. To evaluate the performance of med-gte-hybrid, we investigate several clinical prediction tasks in large patient cohorts extracted from the MIMIC-IV dataset, including Chronic Kidney Disease (CKD) patient prognosis, estimated glomerular filtration rate (eGFR) prediction, and patient mortality prediction. Furthermore, we demonstrate that the med-gte-hybrid model improves patient stratification, clustering, and text retrieval, thus outperforms current state-of-the-art models on the Massive Text Embedding Benchmark (MTEB). While some of our evaluations focus on CKD, our hybrid tuning of sentence transformers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling
MethodsContrastive Learning · Focus
