How Scientists Use Large Language Models to Program
Gabrielle O'Brien

TL;DR
This paper explores how scientists utilize large language models for coding tasks, highlighting their roles in information retrieval, verification strategies, and potential impacts on scientific analysis.
Contribution
It provides insights into early-adopter behaviors and discusses vulnerabilities and verification practices in scientific code generation using language models.
Findings
Scientists use models mainly for navigating unfamiliar code and libraries.
Verification strategies are crucial to ensure correctness of generated code.
Potential vulnerabilities may influence scientific analysis outcomes.
Abstract
Scientists across disciplines write code for critical activities like data collection and generation, statistical modeling, and visualization. As large language models that can generate code have become widely available, scientists may increasingly use these models during research software development. We investigate the characteristics of scientists who are early-adopters of code generating models and conduct interviews with scientists at a public, research-focused university. Through interviews and reviews of user interaction logs, we see that scientists often use code generating models as an information retrieval tool for navigating unfamiliar programming languages and libraries. We present findings about their verification strategies and discuss potential vulnerabilities that may emerge from code generation practices unknowingly influencing the parameters of scientific analyses.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
