Best Practices for Machine Learning-Assisted Protein Engineering
Fabio Herrera-Rocha, David Medina-Ortiz, Fabian Mauz, Juergen Pleiss, Mehdi D. Davari

TL;DR
This paper outlines best practices and guidelines for developing reliable, reproducible, and transparent machine learning models in protein engineering, emphasizing software engineering principles and practical resources.
Contribution
It provides a comprehensive set of guidelines and resources to improve ML development, evaluation, and publication standards in protein engineering.
Findings
Guidelines cover data acquisition to model deployment.
Emphasizes software engineering best practices.
Aims to promote transparency and reproducibility.
Abstract
Data-driven modeling based on Machine Learning (ML) is becoming a central component of protein engineering workflows. This perspective presents the elements necessary to develop effective, reliable, and reproducible ML models, and a set of guidelines for ML developments for protein engineering. This includes a critical discussion of software engineering good practices for development and evaluation of ML-based protein engineering projects, emphasizing supervised learning. These guidelines cover all the necessary steps for ML development, from data acquisition to model deployment. Additionally, the present perspective provides practical resources for the implementation of the outlined guidelines. These recommendations are also intended to support editors and scientific journals in enforcing good practices in ML-based protein engineering publications, promoting high standards across the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsvaccines and immunoinformatics approaches · Viral Infectious Diseases and Gene Expression in Insects · Genetics, Bioinformatics, and Biomedical Research
