An Empirical Investigation of Pre-Trained Deep Learning Model Reuse in the Scientific Process
Nicholas M. Synovic, Karolina Ryzka, Alessandra V. Vellucci Solari, Kenny Lyons, James C. Davis, George K. Thiruvathukal

TL;DR
This study empirically analyzes how natural scientists reuse pre-trained deep learning models, revealing prevalent patterns, field-specific trends, and the significant impact on the testing phase of scientific research.
Contribution
First empirical evaluation of PTM reuse in natural sciences, quantifying patterns, impact, and providing insights for future implementation and scientific implications.
Findings
Biochemistry, Genetics and Molecular Biology lead in PTM reuse.
Adaptation is the most common PTM reuse pattern.
The 'Test' stage is most influenced by PTM integration.
Abstract
Deep learning has achieved recognition for its impact within natural sciences, however scientists are inhibited by the prohibitive technical cost and computational complexity of training project specific models from scratch. Following software engineering community guidance, natural scientists are reusing pre-trained deep learning models (PTMs) to amortize these costs. While prior works recommend PTM reuse patterns, to our knowledge, little work has been done to empirically evaluate their usage and impact within the natural sciences. We present the first empirical study of PTM reuse patterns in the natural sciences, quantifying the utilization and impact of conceptual, adaptation, and deployment reuse within the scientific process. Leveraging an automated large language model driven pipeline, we analyze 17,511 peer reviewed, open access papers to identify PTM reuse by scientific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Machine Learning in Materials Science · Cell Image Analysis Techniques
