Bounded Coordinate-Descent for Biological Sequence Classification in High Dimensional Predictor Space
Georgiana Ifrim, Carsten Wiuf

TL;DR
This paper introduces a bounded coordinate-descent algorithm for biological sequence classification that efficiently handles high-dimensional predictor spaces, producing interpretable models comparable to state-of-the-art methods.
Contribution
The paper presents a novel coordinate-descent framework with gradient bounding for fast discriminative subsequence selection in high-dimensional spaces, applicable to logistic regression and SVMs.
Findings
Achieves performance comparable to kernel SVMs in protein classification
Produces interpretable models as lists of weighted discriminative subsequences
Efficiently handles high-dimensional predictor spaces
Abstract
We present a framework for discriminative sequence classification where the learner works directly in the high dimensional predictor space of all subsequences in the training set. This is possible by employing a new coordinate-descent algorithm coupled with bounding the magnitude of the gradient for selecting discriminative subsequences fast. We characterize the loss functions for which our generic learning algorithm can be applied and present concrete implementations for logistic regression (binomial log-likelihood loss) and support vector machines (squared hinge loss). Application of our algorithm to protein remote homology detection and remote fold recognition results in performance comparable to that of state-of-the-art methods (e.g., kernel support vector machines). Unlike state-of-the-art classifiers, the resulting classification models are simply lists of weighted discriminative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genomics and Phylogenetic Studies · Gene expression and cancer classification
