Elsevier Arena: Human Evaluation of Chemistry/Biology/Health   Foundational Large Language Models

Camilo Thorne; Christian Druckenbrodt; Kinga Szarkowska; Deepika; Goyal; Pranita Marajan; Vijay Somanath; Corey Harper; Mao Yan; Tony Scerri

arXiv:2409.05486·cs.CL·September 18, 2024

Elsevier Arena: Human Evaluation of Chemistry/Biology/Health Foundational Large Language Models

Camilo Thorne, Christian Druckenbrodt, Kinga Szarkowska, Deepika, Goyal, Pranita Marajan, Vijay Somanath, Corey Harper, Mao Yan, Tony Scerri

PDF

Open Access

TL;DR

This paper evaluates the performance of large language models in chemistry, biology, and health domains through human assessments, highlighting their strengths and limitations in specialized scientific fields.

Contribution

It introduces a comprehensive human evaluation framework for large language models in scientific domains, providing insights into their capabilities and gaps.

Findings

01

Models perform well on general scientific questions.

02

Significant gaps remain in specialized domain knowledge.

03

Human evaluation reveals nuanced strengths and weaknesses.

Abstract

arXiv admin comment: This version has been removed by arXiv administrators as the submitter did not have the rights to agree to the license at the time of submission

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenetics, Bioinformatics, and Biomedical Research · Health, Environment, Cognitive Aging · Biomedical Text Mining and Ontologies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Residual Connection · Attention Dropout · Linear Layer · Multi-Head Attention · Dense Connections · Cosine Annealing · Linear Warmup With Cosine Annealing