DisEmbed: Transforming Disease Understanding through Embeddings
Salman Faroz

TL;DR
DisEmbed is a disease-focused embedding model trained on a synthetic dataset, outperforming general medical models in disease-specific tasks and retrieval-augmented generation applications.
Contribution
The paper introduces DisEmbed, a novel disease-specific embedding model trained on curated synthetic data, enhancing disease understanding and differentiation.
Findings
DisEmbed outperforms existing models on disease-specific benchmarks.
It excels in identifying disease contexts and differentiating similar diseases.
DisEmbed shows robustness in retrieval-augmented generation tasks.
Abstract
The medical domain is vast and diverse, with many existing embedding models focused on general healthcare applications. However, these models often struggle to capture a deep understanding of diseases due to their broad generalization across the entire medical field. To address this gap, I present DisEmbed, a disease-focused embedding model. DisEmbed is trained on a synthetic dataset specifically curated to include disease descriptions, symptoms, and disease-related Q\&A pairs, making it uniquely suited for disease-related tasks. For evaluation, I benchmarked DisEmbed against existing medical models using disease-specific datasets and the triplet evaluation method. My results demonstrate that DisEmbed outperforms other models, particularly in identifying disease-related contexts and distinguishing between similar diseases. This makes DisEmbed highly valuable for disease-specific use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health and Psychiatry
