Employing General-Purpose and Biomedical Large Language Models with Advanced Prompt Engineering for Pharmacoepidemiologic Study Design
Xinyao Zhang, Nicole Sonne Heckmann, Manuela Del Castillo Suero, Francesco Paolo Speca, Maurizio Sessa

TL;DR
This study compares general-purpose and biomedical large language models in supporting pharmacoepidemiologic study design, finding that general-purpose models with advanced prompting outperform biomedical models in relevance and reasoning.
Contribution
It provides a systematic evaluation of LLMs in pharmacoepidemiology, highlighting the effectiveness of prompt engineering and the superior performance of general-purpose models.
Findings
GPT-4o-LTM achieved median relevance score of 4 in most questions.
General-purpose LLMs outperformed biomedical LLMs in relevance and justification.
Prompt strategies significantly affected LLM performance.
Abstract
Background: The potential of large language models (LLMs) to automate and support pharmacoepidemiologic study design is an emerging area of interest, yet their reliability remains insufficiently characterized. General-purpose LLMs often display inaccuracies, while the comparative performance of specialized biomedical LLMs in this domain remains unknown. Methods: This study evaluated general-purpose LLMs (GPT-4o and DeepSeek-R1) versus biomedically fine-tuned LLMs (QuantFactory/Bio-Medical-Llama-3-8B-GGUF and Irathernotsay/qwen2-1.5B-medical_qa-Finetune) using 46 protocols (2018-2024) from the HMA-EMA Catalogue and Sentinel System. Performance was assessed across relevance, logic of justification, and ontology-code agreement across multiple coding systems using Least-to-Most (LTM) and Active Prompting strategies. Results: GPT-4o and DeepSeek-R1 paired with LTM prompting achieved the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
