Rates of DNA Sequence Profiles for Practical Values of Read Lengths
Zuling Chang, Johan Chrisnata, Martianus Frederic Ezerman, Han Mao, Kiah

TL;DR
This paper analyzes the number of profile vectors in DNA data storage, providing exact values, bounds, and efficient algorithms for encoding and decoding, especially for practical read lengths and alphabet sizes.
Contribution
It offers new enumeration bounds and efficient algorithms for profile vectors in DNA storage, enhancing understanding of their practical application.
Findings
Number of profile vectors is at least $q^{ heta n}$ for certain parameters.
Provides exact enumeration and lower bounds on profile vectors.
Develops efficient encoding and decoding algorithms for specific profile vector families.
Abstract
A recent study by one of the authors has demonstrated the importance of profile vectors in DNA-based data storage. We provide exact values and lower bounds on the number of profile vectors for finite values of alphabet size , read length , and word length .Consequently, we demonstrate that for and , the number of profile vectors is at least with very close to one.In addition to enumeration results, we provide a set of efficient encoding and decoding algorithms for each of two particular families of profile vectors.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Advanced biosensing and bioanalysis techniques · Algorithms and Data Compression
