CardioEmbed: Domain-Specialized Text Embeddings for Clinical Cardiology

Richard J. Young; Alice M. Matthews

arXiv:2511.10930·cs.CL·November 17, 2025

CardioEmbed: Domain-Specialized Text Embeddings for Clinical Cardiology

Richard J. Young, Alice M. Matthews

PDF

Open Access

TL;DR

CardioEmbed is a domain-specific text embedding model trained on cardiology textbooks that significantly improves semantic retrieval accuracy in clinical cardiology applications compared to existing models.

Contribution

This paper introduces CardioEmbed, a novel embedding model trained on comprehensive cardiology textbooks, enhancing clinical cardiology text understanding and retrieval performance.

Findings

01

Achieves 99.60% retrieval accuracy on cardiology tasks

02

Outperforms MedTE by +15.94 percentage points in accuracy

03

Demonstrates competitive results on biomedical benchmarks

Abstract

Biomedical text embeddings have primarily been developed using research literature from PubMed, yet clinical cardiology practice relies heavily on procedural knowledge and specialized terminology found in comprehensive textbooks rather than research abstracts. This research practice gap limits the effectiveness of existing embedding models for clinical applications incardiology. This study trained CardioEmbed, a domain-specialized embedding model based on Qwen3-Embedding-8B, using contrastive learning on a curated corpus of seven comprehensive cardiology textbooks totaling approximately 150,000 sentences after deduplication. The model employs InfoNCE loss with in-batch negatives and achieves 99.60% retrieval accuracy on cardiac-specific semantic retrieval tasks, a +15.94 percentage point improvement over MedTE, the current state-of-the-art medical embedding model. On MTEB medical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Biomedical Text Mining and Ontologies