Ensembles of phalanxes across assessment metrics for robust ranking of homologous proteins
Jabed H Tomal, William J Welch, Ruben H Zamar

TL;DR
This paper introduces an ensemble approach combining models based on feature subsets and multiple assessment metrics to improve the ranking of homologous proteins, enhancing robustness in highly unbalanced classification tasks.
Contribution
The novel ensemble of phalanxes and metrics effectively identifies diverse feature subsets and evaluation criteria, improving protein homology ranking accuracy.
Findings
Enhanced robustness against close and distant homologues.
Effective handling of highly unbalanced classification.
Improved ranking performance of homologous proteins.
Abstract
Two proteins are homologous if they have a common evolutionary origin, and the binary classification problem is to identify proteins in a candidate set that are homologous to a particular native protein. The feature (explanatory) variables available for classification are various measures of similarity of proteins. There are multiple classification problems of this type for different native proteins and their respective candidate sets. Homologous proteins are rare in a single candidate set, giving a highly unbalanced two-class problem. The goal is to rank proteins in a candidate set according to the probability of being homologous to the set's native protein. An ideal classifier will place all the homologous proteins at the head of such a list. Our approach uses an ensemble of models in a classifier and an ensemble of assessment metrics. For a given metric a classifier combines models,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Protein Structure and Dynamics · Computational Drug Discovery Methods
