BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology

Odhran O'Donoghue; Aleksandar Shtedritski; John Ginger; Ralph Abboud,; Ali Essa Ghareeb; Justin Booth; Samuel G Rodriques

arXiv:2310.10632·cs.CL·October 17, 2023·6 cites

BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology

Odhran O'Donoghue, Aleksandar Shtedritski, John Ginger, Ralph Abboud,, Ali Essa Ghareeb, Justin Booth, Samuel G Rodriques

PDF

Open Access 1 Repo

TL;DR

This paper introduces BioPlanner, an automatic evaluation framework for LLMs in biology protocol planning, utilizing pseudocode representations to assess and improve their multi-step scientific reasoning capabilities.

Contribution

It presents BioProt, a novel dataset of biology protocols with pseudocode, and a framework for evaluating LLMs' ability to generate and reconstruct scientific protocols.

Findings

01

GPT-4 outperforms GPT-3 in protocol reconstruction

02

Generated protocols were successfully executed in a biological lab

03

Pseudocode improves evaluation and generation of scientific protocols

Abstract

The ability to automatically generate accurate protocols for scientific experiments would represent a major step towards the automation of science. Large Language Models (LLMs) have impressive capabilities on a wide range of tasks, such as question answering and the generation of coherent text and code. However, LLMs can struggle with multi-step problems and long-term planning, which are crucial for designing scientific experiments. Moreover, evaluation of the accuracy of scientific protocols is challenging, because experiments can be described correctly in many different ways, require expert knowledge to evaluate, and cannot usually be executed automatically. Here we present an automatic evaluation framework for the task of planning experimental protocols, and we introduce BioProt: a dataset of biology protocols with corresponding pseudocode representations. To measure performance on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bioplanner/bioplanner
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Materials Science · Natural Language Processing Techniques

Methods15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Absolute Position Encodings · Layer Normalization · Dense Connections · Linear Layer · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention