Towards the Latent Transcriptome
Assya Trofimov, Francis Dutil, Claude Perreault, Sebastien Lemieux,, Yoshua Bengio, Joseph Paul Cohen

TL;DR
This paper introduces a novel method to generate continuous embeddings of kmers from raw RNA-seq data using an RNN, capturing sequence similarity, abundance, and genomic abnormalities without reference genome alignment.
Contribution
The work presents the Latent Transcriptome, a new embedding space that encodes sequence and abundance information directly from raw RNA-seq data, enabling structural and abnormality detection.
Findings
Embeddings recover exon information from raw data.
Latent space detects genomic translocations.
Model captures both sequence similarity and abundance.
Abstract
In this work we propose a method to compute continuous embeddings for kmers from raw RNA-seq data, without the need for alignment to a reference genome. The approach uses an RNN to transform kmers of the RNA-seq reads into a 2 dimensional representation that is used to predict abundance of each kmer. We report that our model captures information of both DNA sequence similarity as well as DNA sequence abundance in the embedding latent space, that we call the Latent Transcriptome. We confirm the quality of these vectors by comparing them to known gene sub-structures and report that the latent space recovers exon information from raw RNA-Seq data from acute myeloid leukemia patients. Furthermore we show that this latent space allows the detection of genomic abnormalities such as translocations as well as patient-specific mutations, making this representation space both useful for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMolecular Biology Techniques and Applications · RNA modifications and cancer · RNA Research and Splicing
