Mining Negative Sequential Patterns to Improve Viral Genomic Feature Representation and Classification
Wenxi Zhu, Wensheng Gan, Zhenlian Qi

TL;DR
This paper introduces GeneNSPCla, a novel viral genome classification framework utilizing negative sequential patterns to improve accuracy and interpretability in viral sequence analysis.
Contribution
The study develops a new negative pattern mining algorithm, GONPM+, and demonstrates its effectiveness in extracting biologically meaningful absence-based features for viral classification.
Findings
GONPM+ improves average accuracy by 10.03% over the original algorithm.
GeneNSPCla enhances classification accuracy by 24.75% compared to positive pattern methods.
Incorporating absence-based features offers a new perspective for viral genome analysis.
Abstract
Viruses represent the most abundant biological entities on Earth and play a pivotal role in microbial ecosystems, yet, as prominent human pathogens, they are closely linked to human morbidity and mortality. Accurate identification of viral sequences from viral genome sequences is therefore essential, but existing genome-based classification models that largely relying on composition- or frequency-based subsequence features often suffer from limited interpretability and reduced accuracy, particularly on complex or imbalanced datasets. To address these limitations, we propose GeneNSPCla (Genomic Negative Sequential Pattern-based Classification), a novel viral classification framework based on Negative Sequential Patterns (NSPs) that extracts discriminative absence-based features from nucleotide sequences of RNA viral genomes. By transforming these NSPs into numerical feature vectors and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
