ProtoMed-LLM: An Automatic Evaluation Framework for Large Language Models in Medical Protocol Formulation
Seungjun Yi, Jaeyoung Lim, Juyong Yoon

TL;DR
This paper introduces ProtoMed-LLM, an automatic, flexible evaluation framework for assessing large language models' ability to generate scientific protocols in biology, reducing reliance on human evaluation.
Contribution
The paper presents ProtoMed-LLM, a novel automatic evaluation framework using pseudocode and GPT-4, and introduces BIOPROT 2.0 dataset for protocol formulation assessment.
Findings
GPT and Cohere excel at protocol formulation
The framework is adaptable and cost-free
BIOPROT 2.0 dataset supports evaluation and development
Abstract
Automated generation of scientific protocols executable by robots can significantly accelerate scientific research processes. Large Language Models (LLMs) excel at Scientific Protocol Formulation Tasks (SPFT), but the evaluation of their capabilities rely on human evaluation. Here, we propose a flexible, automatic framework to evaluate LLMs' capability on SPFT: ProtoMed-LLM. This framework prompts the target model and GPT-4 to extract pseudocode from biology protocols using only predefined lab actions and evaluates the output of the target model using LLAM-EVAL, the pseudocode generated by GPT-4 serving as a baseline and Llama-3 acting as the evaluator. Our adaptable prompt-based evaluation method, LLAM-EVAL, offers significant flexibility in terms of evaluation model, material, criteria, and is free of cost. We evaluate GPT variations, Llama, Mixtral, Gemma, Cohere, and Gemini.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Service-Oriented Architecture and Web Services · Data Quality and Management
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Transformer · Linear Layer · Residual Connection · Weight Decay · Cosine Annealing
