Hilbert Curve Based Molecular Sequence Analysis
Sarwan Ali, Tamkanat E Ali, Imdad Ullah Khan, Murray Patterson

TL;DR
This paper introduces a novel Hilbert curve-based image representation for molecular sequences, enhancing deep learning classification accuracy by capturing spatial information more effectively than traditional methods.
Contribution
It proposes a universal, alignment-free CGR method using Hilbert curves and a new alphabetic index mapping, improving molecular sequence classification performance.
Findings
Achieved 94.5% accuracy on lung cancer dataset
Outperformed existing state-of-the-art methods
Demonstrated effectiveness of image-based DL models for sequences
Abstract
Accurate molecular sequence analysis is a key task in the field of bioinformatics. To apply molecular sequence classification algorithms, we first need to generate the appropriate representations of the sequences. Traditional numeric sequence representation techniques are mostly based on sequence alignment that faces limitations in the form of lack of accuracy. Although several alignment-free techniques have also been introduced, their tabular data form results in low performance when used with Deep Learning (DL) models compared to the competitive performance observed in the case of image-based data. To find a solution to this problem and to make Deep Learning (DL) models function to their maximum potential while capturing the important spatial information in the sequence data, we propose a universal Hibert curve-based Chaos Game Representation (CGR) method. This method is a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Molecular spectroscopy and chirality · Analytical Chemistry and Chromatography
