BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology
Odhran O'Donoghue, Aleksandar Shtedritski, John Ginger, Ralph Abboud,, Ali Essa Ghareeb, Justin Booth, Samuel G Rodriques

TL;DR
This paper introduces BioPlanner, an automatic evaluation framework for LLMs in biology protocol planning, utilizing pseudocode representations to assess and improve their multi-step scientific reasoning capabilities.
Contribution
It presents BioProt, a novel dataset of biology protocols with pseudocode, and a framework for evaluating LLMs' ability to generate and reconstruct scientific protocols.
Findings
GPT-4 outperforms GPT-3 in protocol reconstruction
Generated protocols were successfully executed in a biological lab
Pseudocode improves evaluation and generation of scientific protocols
Abstract
The ability to automatically generate accurate protocols for scientific experiments would represent a major step towards the automation of science. Large Language Models (LLMs) have impressive capabilities on a wide range of tasks, such as question answering and the generation of coherent text and code. However, LLMs can struggle with multi-step problems and long-term planning, which are crucial for designing scientific experiments. Moreover, evaluation of the accuracy of scientific protocols is challenging, because experiments can be described correctly in many different ways, require expert knowledge to evaluate, and cannot usually be executed automatically. Here we present an automatic evaluation framework for the task of planning experimental protocols, and we introduce BioProt: a dataset of biology protocols with corresponding pseudocode representations. To measure performance on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Materials Science · Natural Language Processing Techniques
Methods15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Absolute Position Encodings · Layer Normalization · Dense Connections · Linear Layer · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention
