Combination of digital signal processing and assembled predictive models facilitates the rational design of proteins
David Medina-Ortiz, Sebastian Contreras, Juan Amado-Hinojosa and, Jorge Torres-Almonacid, Juan A. Asenjo, Marcelo Navarrete, \'Alvaro, Olivera-Nappa

TL;DR
This paper introduces a novel approach combining signal processing and machine learning on physicochemical property encodings to improve protein mutation effect prediction, outperforming existing models.
Contribution
It develops a method that uses clustering, embedding, and FFT on physicochemical properties for encoding, and creates an assembled predictive model with superior performance.
Findings
Assembled models outperform single-encoding models.
Method achieves better performance than previous approaches.
Proposed Python library is available for non-commercial use.
Abstract
Predicting the effect of mutations in proteins is one of the most critical challenges in protein engineering; by knowing the effect a substitution of one (or several) residues in the protein's sequence has on its overall properties, could design a variant with a desirable function. New strategies and methodologies to create predictive models are continually being developed. However, those that claim to be general often do not reach adequate performance, and those that aim to a particular task improve their predictive performance at the cost of the method's generality. Moreover, these approaches typically require a particular decision to encode the amino acidic sequence, without an explicit methodological agreement in such endeavor. To address these issues, in this work, we applied clustering, embedding, and dimensionality reduction techniques to the AAIndex database to select meaningful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Protein Structure and Dynamics · RNA and protein synthesis mechanisms
