Feature extraction in protein sequences classification : a new stability measure
Rabie Saidi, Sabeur Aridhi, Mondher Maddouri, Engelbert Mephu, Nguifo

TL;DR
This paper introduces a new stability measure to evaluate the robustness of motif extraction methods in protein sequence classification, emphasizing the importance of reliable feature extraction for biological data analysis.
Contribution
It proposes a novel stability measure for motif extraction methods and compares four existing methods based on this robustness criterion.
Findings
Stability measure effectively assesses robustness of motif extraction.
Some methods show higher stability under data perturbations.
Robust motif extraction improves reliability of biological sequence classification.
Abstract
Feature extraction is an unavoidable task, especially in the critical step of preprocessing biological sequences. This step consists for example in transforming the biological sequences into vectors of motifs where each motif is a subsequence that can be seen as a property (or attribute) characterizing the sequence. Hence, we obtain an object-property table where objects are sequences and properties are motifs extracted from sequences. This output can be used to apply standard machine learning tools to perform data mining tasks such as classification. Several previous works have described feature extraction methods for bio-sequence classification, but none of them discussed the robustness of these methods when perturbing the input data. In this work, we introduce the notion of stability of the generated motifs in order to study the robustness of motif extraction methods. We express this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
