Can Large Language Models Replace Data Scientists in Biomedical Research?
Zifeng Wang, Benjamin Danek, Ziwei Yang, Zheng Chen, Jimeng Sun

TL;DR
This study evaluates large language models' ability to perform biomedical data science tasks, developing a benchmark and methods to improve their accuracy, and demonstrating their potential to support medical professionals in research workflows.
Contribution
The paper introduces a new benchmark for biomedical data science tasks and demonstrates effective prompting techniques to enhance LLM performance in this domain.
Findings
Chain-of-thought prompting improves code accuracy by 21%.
Self-reflection refines buggy code, increasing accuracy by 11%.
LLMs significantly streamline programming for medical professionals.
Abstract
Data science plays a critical role in biomedical research, but it requires professionals with expertise in coding and medical data analysis. Large language models (LLMs) have shown great potential in supporting medical tasks and performing well in general coding tests. However, existing evaluations fail to assess their capability in biomedical data science, particularly in handling diverse data types such as genomics and clinical datasets. To address this gap, we developed a benchmark of data science coding tasks derived from the analyses of 39 published studies. This benchmark comprises 293 coding tasks (128 in Python and 165 in R) performed on real-world TCGA-type genomics and clinical data. Our findings reveal that the vanilla prompting of LLMs yields suboptimal performances due to drawbacks in following input instructions, understanding target data, and adhering to standard analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare
