Can Large Language Models Replace Data Scientists in Biomedical   Research?

Zifeng Wang; Benjamin Danek; Ziwei Yang; Zheng Chen; Jimeng Sun

arXiv:2410.21591·cs.AI·April 10, 2025

Can Large Language Models Replace Data Scientists in Biomedical Research?

Zifeng Wang, Benjamin Danek, Ziwei Yang, Zheng Chen, Jimeng Sun

PDF

Open Access 1 Datasets

TL;DR

This study evaluates large language models' ability to perform biomedical data science tasks, developing a benchmark and methods to improve their accuracy, and demonstrating their potential to support medical professionals in research workflows.

Contribution

The paper introduces a new benchmark for biomedical data science tasks and demonstrates effective prompting techniques to enhance LLM performance in this domain.

Findings

01

Chain-of-thought prompting improves code accuracy by 21%.

02

Self-reflection refines buggy code, increasing accuracy by 11%.

03

LLMs significantly streamline programming for medical professionals.

Abstract

Data science plays a critical role in biomedical research, but it requires professionals with expertise in coding and medical data analysis. Large language models (LLMs) have shown great potential in supporting medical tasks and performing well in general coding tests. However, existing evaluations fail to assess their capability in biomedical data science, particularly in handling diverse data types such as genomics and clinical datasets. To address this gap, we developed a benchmark of data science coding tasks derived from the analyses of 39 published studies. This benchmark comprises 293 coding tasks (128 in Python and 165 in R) performed on real-world TCGA-type genomics and clinical data. Our findings reveal that the vanilla prompting of LLMs yields suboptimal performances due to drawbacks in following input instructions, understanding target data, and adhering to standard analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

zifeng-ai/BioDSBench
dataset· 56 dl
56 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare