Efficient Algorithms for de novo Assembly of Alternative Splicing Events from RNA-seq Data
Gustavo Sacomoto

TL;DR
This paper introduces KisSplice, an efficient, scalable algorithm for identifying and quantifying alternative splicing events from RNA-seq data without a reference genome, outperforming existing transcriptome assemblers.
Contribution
It presents a novel exact method for extracting splicing variants from de Bruijn graphs and introduces scalable algorithms with reduced memory usage for bubble enumeration.
Findings
KisSplice detects more correct splicing events than general transcriptome assemblers.
New polynomial delay algorithm is several orders of magnitude faster for bubble enumeration.
Memory usage is reduced by 30-40% with minimal impact on construction time.
Abstract
In this thesis, we address the problem of identifying and quantifying variants (alternative splicing and genomic polymorphism) in RNA-seq data when no reference genome is available, without assembling the full transcripts. Based on the fundamental idea that each variant corresponds to a recognizable pattern, a bubble, in a de Bruijn graph constructed from the RNA-seq reads, we propose a general model for all variants in such graphs. We then introduce an exact method, called KisSplice, to extract alternative splicing events. Finally, we show that it enables to identify more correct events than general purpose transcriptome assemblers. In order to deal with ever-increasing volumes of NGS data, we put an extra effort to make KisSplice as scalable as possible. First, to improve its running time, we propose a new polynomial delay algorithm to enumerate bubbles. We show that it is several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · RNA and protein synthesis mechanisms · Algorithms and Data Compression
