ProtoMed-LLM: An Automatic Evaluation Framework for Large Language   Models in Medical Protocol Formulation

Seungjun Yi; Jaeyoung Lim; Juyong Yoon

arXiv:2410.04601·cs.CL·April 15, 2025

ProtoMed-LLM: An Automatic Evaluation Framework for Large Language Models in Medical Protocol Formulation

Seungjun Yi, Jaeyoung Lim, Juyong Yoon

PDF

Open Access

TL;DR

This paper introduces ProtoMed-LLM, an automatic, flexible evaluation framework for assessing large language models' ability to generate scientific protocols in biology, reducing reliance on human evaluation.

Contribution

The paper presents ProtoMed-LLM, a novel automatic evaluation framework using pseudocode and GPT-4, and introduces BIOPROT 2.0 dataset for protocol formulation assessment.

Findings

01

GPT and Cohere excel at protocol formulation

02

The framework is adaptable and cost-free

03

BIOPROT 2.0 dataset supports evaluation and development

Abstract

Automated generation of scientific protocols executable by robots can significantly accelerate scientific research processes. Large Language Models (LLMs) excel at Scientific Protocol Formulation Tasks (SPFT), but the evaluation of their capabilities rely on human evaluation. Here, we propose a flexible, automatic framework to evaluate LLMs' capability on SPFT: ProtoMed-LLM. This framework prompts the target model and GPT-4 to extract pseudocode from biology protocols using only predefined lab actions and evaluates the output of the target model using LLAM-EVAL, the pseudocode generated by GPT-4 serving as a baseline and Llama-3 acting as the evaluator. Our adaptable prompt-based evaluation method, LLAM-EVAL, offers significant flexibility in terms of evaluation model, material, criteria, and is free of cost. We evaluate GPT variations, Llama, Mixtral, Gemma, Cohere, and Gemini.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Service-Oriented Architecture and Web Services · Data Quality and Management

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Transformer · Linear Layer · Residual Connection · Weight Decay · Cosine Annealing