Locality Sensitive Hashing-based Sequence Alignment Using Deep   Bidirectional LSTM Models

Neda Tavakoli

arXiv:2004.02094·cs.LG·April 7, 2020·1 cites

Locality Sensitive Hashing-based Sequence Alignment Using Deep Bidirectional LSTM Models

Neda Tavakoli

PDF

Open Access

TL;DR

This paper introduces a novel approach using deep bidirectional LSTM models to learn locality-sensitive hashing features for sequence alignment, demonstrating improved accuracy in aligning short reads to a reference genome.

Contribution

The paper presents a new LSTM-based method for generating LSH features, enhancing sequence alignment accuracy over traditional methods.

Findings

01

Higher alignment accuracy with increased epochs

02

Effective sequence modeling with deep bidirectional LSTM

03

Feasibility demonstrated on human genome data

Abstract

Bidirectional Long Short-Term Memory (LSTM) is a special kind of Recurrent Neural Network (RNN) architecture which is designed to model sequences and their long-range dependencies more precisely than RNNs. This paper proposes to use deep bidirectional LSTM for sequence modeling as an approach to perform locality-sensitive hashing (LSH)-based sequence alignment. In particular, we use the deep bidirectional LSTM to learn features of LSH. The obtained LSH is then can be utilized to perform sequence alignment. We demonstrate the feasibility of the modeling sequences using the proposed LSTM-based model by aligning the short read queries over the reference genome. We use the human reference genome as our training dataset, in addition to a set of short reads generated using Illumina sequencing technology. The ultimate goal is to align query sequences into a reference genome. We first decompose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · Machine Learning in Bioinformatics

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory