Combining Domain-Specific Models and LLMs for Automated Disease Phenotyping from Survey Data
Gal Beeri, Benoit Chamot, Elena Latchem, Shruthi Venkatesh, Sarah, Whalan, Van Zyl Kruger, David Martino

TL;DR
This study explores combining a biomedical named entity recognition model with large language models to improve automated disease phenotyping from survey data, aiming for better data harmonization and cohort profiling.
Contribution
It introduces a novel approach integrating BERN2 with LLMs using prompt engineering, RAG, and IFT to enhance disease mention extraction and normalization from survey data.
Findings
BERN2 achieved high accuracy in extracting disease mentions
LLMs with RAG and Few Shot Inference improved performance further
The method shows promise for large-scale cohort data harmonization
Abstract
This exploratory pilot study investigated the potential of combining a domain-specific model, BERN2, with large language models (LLMs) to enhance automated disease phenotyping from research survey data. Motivated by the need for efficient and accurate methods to harmonize the growing volume of survey data with standardized disease ontologies, we employed BERN2, a biomedical named entity recognition and normalization model, to extract disease information from the ORIGINS birth cohort survey data. After rigorously evaluating BERN2's performance against a manually curated ground truth dataset, we integrated various LLMs using prompt engineering, Retrieval-Augmented Generation (RAG), and Instructional Fine-Tuning (IFT) to refine the model's outputs. BERN2 demonstrated high performance in extracting and normalizing disease mentions, and the integration of LLMs, particularly with Few Shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Adam · Linear Layer · Attention Dropout · Dropout · Weight Decay · Dense Connections · Byte Pair Encoding · BART
