Batch Optimization for DNA Synthesis

Konstantin Makarychev; Miklos Z. Racz; Cyrus Rashtchian; Sergey; Yekhanin

arXiv:2011.14532·cs.DS·February 25, 2021

Batch Optimization for DNA Synthesis

Konstantin Makarychev, Miklos Z. Racz, Cyrus Rashtchian, Sergey, Yekhanin

PDF

Open Access

TL;DR

This paper introduces batch optimization techniques to reduce the cost of large-scale DNA synthesis for data storage, demonstrating significant savings especially for non-repetitive DNA sequences.

Contribution

It proposes two novel batch optimization methods and proves their asymptotic optimality, highlighting the impact of sequence constraints on synthesis cost savings.

Findings

01

Batch optimization reduces DNA synthesis costs.

02

Using reverse reference strands improves batching efficiency.

03

Cost savings are greater for non-repetitive DNA sequences.

Abstract

Large pools of synthetic DNA molecules have been recently used to reliably store significant volumes of digital data. While DNA as a storage medium has enormous potential because of its high storage density, its practical use is currently severely limited because of the high cost and low throughput of available DNA synthesis technologies. We study the role of batch optimization in reducing the cost of large scale DNA synthesis, which translates to the following algorithmic task. Given a large pool $S$ of random quaternary strings of fixed length, partition $S$ into batches in a way that minimizes the sum of the lengths of the shortest common supersequences across batches. We introduce two ideas for batch optimization that both improve (in different ways) upon a naive baseline: (1) using both $(A C GT)^{*}$ and its reverse $(T GC A)^{*}$ as reference strands, and batching…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced biosensing and bioanalysis techniques · DNA and Biological Computing · Algorithms and Data Compression