Optimal Reference for DNA Synthesis
Ohad Elishco, Wasim Huleihel

TL;DR
This paper investigates the optimal reference DNA sequence for synthesis under homopolymer constraints, revealing that a periodic sequence minimizes synthesis costs, advancing theoretical understanding of DNA storage technologies.
Contribution
It provides a theoretical analysis showing the optimal reference sequence is a periodic pattern for any homopolymer run length, extending previous batch optimization studies.
Findings
Optimal reference sequence is a periodic ACGT pattern.
Homopolymer constraints significantly influence synthesis cost.
Representation of synthesis as a constrained system enables new analysis.
Abstract
In the recent years, DNA has emerged as a potentially viable storage technology. DNA synthesis, which refers to the task of writing the data into DNA, is perhaps the most costly part of existing storage systems. Accordingly, this high cost and low throughput limits the practical use in available DNA synthesis technologies. It has been found that the homopolymer run (i.e., the repetition of the same nucleotide) is a major factor affecting the synthesis and sequencing errors. Quite recently, [26] studied the role of batch optimization in reducing the cost of large scale DNA synthesis, for a given pool of random quaternary strings of fixed length. Among other things, it was shown that the asymptotic cost savings of batch optimization are significantly greater when the strings in contain repeats of the same character (homopolymer run of length one), as compared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Advanced biosensing and bioanalysis techniques · DNA and Nucleic Acid Chemistry
