Accessible, uniform protein property prediction with a scikit-learn based toolset AIDE
Evan Komp, Kristoffer E Johansson, Nicholas P Gauthier, Japheth E Gado, Kresten Lindorff-Larsen, Gregg T Beckham

TL;DR
AIDE is a Python tool that simplifies and standardizes protein property prediction using machine learning, with support for both labeled and unlabeled data.
Contribution
AIDE introduces a modular, scikit-learn compatible API for accessible and reproducible protein property prediction.
Findings
AIDE provides a standardized API for integrating various zero-shot and supervised prediction methods.
The tool supports reproducible workflows for analyzing protein variants and homologs.
AIDE is compatible with scikit-learn transformers and pipelines for streamlined use.
Abstract
Protein property prediction via machine learning with and without labeled data is becoming increasingly powerful, yet methods are disparate and capabilities vary widely over applications. The software presented here, “Artificial Intelligence Driven protein Estimation (AIDE)”, enables instantiating, optimizing, and testing many zero-shot and supervised property prediction methods for variants and variable length homologs in a single, reproducible notebook or script by defining a modular, standardized application programming interface (API), i.e. drop-in compatible with scikit-learn transformers and pipelines. AIDE is an installable, importable python package inheriting from scikit-learn classes and API and is installable on Windows, Mac, and Linux. Many of the wrapped models internal to AIDE will be effectively inaccessible without a GPU, and some assume CUDA. The newest stable, tested…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Genetics, Bioinformatics, and Biomedical Research · Machine Learning in Bioinformatics
