# An Index for Sequencing Reads Based on The Colored de Bruijn Graph

**Authors:** Diego Diaz-Dom\'inguez

arXiv: 1908.02211 · 2019-12-02

## TL;DR

This paper introduces a space-efficient index for massive sequencing reads based on a colored de Bruijn graph, enabling effective read reconstruction and contig assembly with minimal storage.

## Contribution

It presents a novel sparse coloring algorithm for colored de Bruijn graphs and algorithms for read reconstruction and contig assembly using this index.

## Key findings

- Uses about half the space of traditional representations.
- Reconstructs over 99% of reads from the index.
- Enables efficient processing of large sequencing datasets.

## Abstract

In this article, we show how to transform a colored de Bruijn graph (dBG) into a practical index for processing massive sets of sequencing reads. Similar to previous works, we encode an instance of a colored dBG of the set using BOSS and a color matrix C. To reduce the space requirements, we devise an algorithm that produces a smaller and more sparse version of C. The novelties in this algorithm are (i) an incomplete coloring of the graph and (ii) a greedy coloring approach that tries to reuse the same colors for different strings when possible. We also propose two algorithms that work on top of the index; one is for reconstructing reads, and the other is for contig assembly. Experimental results show that our data structure uses about half the space of the plain representation of the set (1 Byte per DNA symbol) and that more than 99% of the reads can be reconstructed just from the index.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.02211/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1908.02211/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1908.02211/full.md

---
Source: https://tomesphere.com/paper/1908.02211