Closing the gap between open-source and commercial large language models for medical evidence summarization
Gongbo Zhang, Qiao Jin, Yiliang Zhou, Song Wang, Betina R. Idnay,, Yiming Luo, Elizabeth Park, Jordan G. Nestor, Matthew E. Spotnitz, Ali, Soroush, Thomas Campion, Zhiyong Lu, Chunhua Weng, Yifan Peng

TL;DR
This study demonstrates that fine-tuning open-source large language models significantly narrows the performance gap with proprietary models in medical evidence summarization, offering more transparent and customizable options.
Contribution
It investigates the effectiveness of fine-tuning open-source LLMs like PRIMERA, LongT5, and Llama-2 for medical summarization, achieving performance close to proprietary models.
Findings
Fine-tuned models improved ROUGE-L, METEOR, and CHRF scores.
LongT5 performance approaches GPT-3.5 in zero-shot settings.
Smaller fine-tuned models sometimes outperform larger zero-shot models.
Abstract
Large language models (LLMs) hold great promise in summarizing medical evidence. Most recent studies focus on the application of proprietary LLMs. Using proprietary LLMs introduces multiple risk factors, including a lack of transparency and vendor dependency. While open-source LLMs allow better transparency and customization, their performance falls short compared to proprietary ones. In this study, we investigated to what extent fine-tuning open-source LLMs can further improve their performance in summarizing medical evidence. Utilizing a benchmark dataset, MedReview, consisting of 8,161 pairs of systematic reviews and summaries, we fine-tuned three broadly-used, open-sourced LLMs, namely PRIMERA, LongT5, and Llama-2. Overall, the fine-tuned LLMs obtained an increase of 9.89 in ROUGE-L (95% confidence interval: 8.94-10.81), 13.21 in METEOR score (95% confidence interval: 12.05-14.37),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Machine Learning in Healthcare
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Layer · Attention Dropout · Residual Connection · Multi-Head Attention · {Dispute@FaQ-s}How to file a dispute with Expedia? · Cosine Annealing · Weight Decay
