Can Open-Source LLMs Compete with Commercial Models? Exploring the   Few-Shot Performance of Current GPT Models in Biomedical Tasks

Samy Ateia; Udo Kruschwitz

arXiv:2407.13511·cs.CL·July 19, 2024·3 cites

Can Open-Source LLMs Compete with Commercial Models? Exploring the Few-Shot Performance of Current GPT Models in Biomedical Tasks

Samy Ateia, Udo Kruschwitz

PDF

Open Access 1 Repo

TL;DR

This study evaluates the few-shot and zero-shot performance of open-source and commercial GPT models in biomedical retrieval tasks, finding that few-shot learning narrows the gap, especially in domain-specific applications.

Contribution

It provides a comparative analysis of current GPT models and open-source alternatives in biomedical NLP, highlighting the effectiveness of few-shot learning in closing performance gaps.

Findings

01

Mixtral 8x7b is competitive in 10-shot settings.

02

Zero-shot performance of open-source models is significantly lower.

03

Few-shot examples improve domain-specific task performance.

Abstract

Commercial large language models (LLMs), like OpenAI's GPT-4 powering ChatGPT and Anthropic's Claude 3 Opus, have dominated natural language processing (NLP) benchmarks across different domains. New competing Open-Source alternatives like Mixtral 8x7B or Llama 3 have emerged and seem to be closing the gap while often offering higher throughput and being less costly to use. Open-Source LLMs can also be self-hosted, which makes them interesting for enterprise and clinical use cases where sensitive data should not be processed by third parties. We participated in the 12th BioASQ challenge, which is a retrieval augmented generation (RAG) setting, and explored the performance of current GPT models Claude 3 Opus, GPT-3.5-turbo and Mixtral 8x7b with in-context learning (zero-shot, few-shot) and QLoRa fine-tuning. We also explored how additional relevant knowledge from Wikipedia added to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

samyateia/bioasq2024
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Warmup With Linear Decay · Cosine Annealing · Label Smoothing · Linear Layer · BART · Weight Decay · Softmax