Locality Sensitive Hashing-based Sequence Alignment Using Deep Bidirectional LSTM Models
Neda Tavakoli

TL;DR
This paper introduces a novel approach using deep bidirectional LSTM models to learn locality-sensitive hashing features for sequence alignment, demonstrating improved accuracy in aligning short reads to a reference genome.
Contribution
The paper presents a new LSTM-based method for generating LSH features, enhancing sequence alignment accuracy over traditional methods.
Findings
Higher alignment accuracy with increased epochs
Effective sequence modeling with deep bidirectional LSTM
Feasibility demonstrated on human genome data
Abstract
Bidirectional Long Short-Term Memory (LSTM) is a special kind of Recurrent Neural Network (RNN) architecture which is designed to model sequences and their long-range dependencies more precisely than RNNs. This paper proposes to use deep bidirectional LSTM for sequence modeling as an approach to perform locality-sensitive hashing (LSH)-based sequence alignment. In particular, we use the deep bidirectional LSTM to learn features of LSH. The obtained LSH is then can be utilized to perform sequence alignment. We demonstrate the feasibility of the modeling sequences using the proposed LSTM-based model by aligning the short read queries over the reference genome. We use the human reference genome as our training dataset, in addition to a set of short reads generated using Illumina sequencing technology. The ultimate goal is to align query sequences into a reference genome. We first decompose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · Machine Learning in Bioinformatics
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
