Navigating in a sea of repeats in RNA-seq without drowning

Gustavo Sacomoto; Blerina Sinaimeri; Camille Marchet; Vincent Miele,; Marie-France Sagot; Vincent Lacroix

arXiv:1406.1022·cs.DS·June 5, 2014

Navigating in a sea of repeats in RNA-seq without drowning

Gustavo Sacomoto, Blerina Sinaimeri, Camille Marchet, Vincent Miele,, Marie-France Sagot, Vincent Lacroix

PDF

TL;DR

This paper introduces a formal model for high copy number repeats in RNA-seq data, demonstrating the NP-completeness of identifying repeat-associated subgraphs, and proposes an algorithm to effectively assemble alternative splicing events outside repetitive regions.

Contribution

It provides a formal model for repeats in RNA-seq, proves the complexity of identifying repeat subgraphs, and offers an algorithm to improve local assembly of splicing events.

Findings

01

NP-completeness of identifying repeat subgraphs in de Bruijn graphs

02

Algorithm to identify alternative splicing events outside repeats

03

Validation results on synthetic and real data

Abstract

The main challenge in de novo assembly of NGS data is certainly to deal with repeats that are longer than the reads. This is particularly true for RNA- seq data, since coverage information cannot be used to flag repeated sequences, of which transposable elements are one of the main examples. Most transcriptome assemblers are based on de Bruijn graphs and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them. The results of this work are twofold. First, we introduce a formal model for repre- senting high copy number repeats in RNA-seq data and exploit its properties for inferring a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying in a de Bruijn graph a subgraph with this charac- teristic is NP-complete. In a second step, we show that in the specific case of a local assembly of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.