DisEmbed: Transforming Disease Understanding through Embeddings

Salman Faroz

arXiv:2412.15258·cs.CL·December 23, 2024

DisEmbed: Transforming Disease Understanding through Embeddings

Salman Faroz

PDF

Open Access 1 Models 1 Datasets

TL;DR

DisEmbed is a disease-focused embedding model trained on a synthetic dataset, outperforming general medical models in disease-specific tasks and retrieval-augmented generation applications.

Contribution

The paper introduces DisEmbed, a novel disease-specific embedding model trained on curated synthetic data, enhancing disease understanding and differentiation.

Findings

01

DisEmbed outperforms existing models on disease-specific benchmarks.

02

It excels in identifying disease contexts and differentiating similar diseases.

03

DisEmbed shows robustness in retrieval-augmented generation tasks.

Abstract

The medical domain is vast and diverse, with many existing embedding models focused on general healthcare applications. However, these models often struggle to capture a deep understanding of diseases due to their broad generalization across the entire medical field. To address this gap, I present DisEmbed, a disease-focused embedding model. DisEmbed is trained on a synthetic dataset specifically curated to include disease descriptions, symptoms, and disease-related Q\&A pairs, making it uniquely suited for disease-related tasks. For evaluation, I benchmarked DisEmbed against existing medical models using disease-specific datasets and the triplet evaluation method. My results demonstrate that DisEmbed outperforms other models, particularly in identifying disease-related contexts and distinguishing between similar diseases. This makes DisEmbed highly valuable for disease-specific use…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
SalmanFaroz/DisEmbed-v1
model· 169 dl
169 dl

Datasets

SalmanFaroz/DisEmbed-Symptom-Disease-v1
dataset· 38 dl
38 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health and Psychiatry