Robust Multi-Read Reconstruction from Contaminated Clusters Using Deep   Neural Network for DNA Storage

Yun Qin; Fei Zhu; Bo Xi

arXiv:2210.11106·cs.IR·October 21, 2022

Robust Multi-Read Reconstruction from Contaminated Clusters Using Deep Neural Network for DNA Storage

Yun Qin, Fei Zhu, Bo Xi

PDF

Open Access

TL;DR

This paper introduces a deep neural network-based method for robust DNA sequence reconstruction that effectively handles contaminated clusters and sequencing errors, improving accuracy in DNA data storage applications.

Contribution

A novel deep learning model designed to reconstruct DNA sequences from contaminated and noisy reads, addressing limitations of existing methods that assume uniform noise.

Findings

01

Outperforms existing methods on sequencing datasets

02

Robust against high contamination levels

03

Effective in handling IDS errors

Abstract

DNA has immense potential as an emerging data storage medium. The principle of DNA storage is the conversion and flow of digital information between binary code stream, quaternary base, and actual DNA fragments. This process will inevitably introduce errors, posing challenges to accurate data recovery. Sequence reconstruction consists of inferring the DNA reference from a cluster of erroneous copies. A common assumption in existing methods is that all the strands within a cluster are noisy copies originating from the same reference, thereby contributing equally to the reconstruction. However, this is not always valid considering the existence of contaminated sequences caused, for example, by DNA fragmentation and rearrangement during the DNA storage process.This paper proposed a robust multi-read reconstruction model using DNN, which is resilient to contaminated clusters with outlier…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Environmental DNA in Biodiversity Studies · Algorithms and Data Compression