A constrained Shannon-Fano entropy coder for image storage in synthetic DNA
Xavier Pic, Marc Antonini

TL;DR
This paper introduces a modified Shannon-Fano coding algorithm tailored for DNA data storage, improving compression efficiency for long-term 'cold' data while respecting biochemical constraints, integrated into JPEG for image storage.
Contribution
It presents a novel constrained Shannon-Fano coding method specifically designed for DNA storage, enhancing compression ratios without quality loss.
Findings
Achieved 0.5 to 2 bits per nucleotide improvement in compression ratio.
Maintained image reconstruction quality with the new coding scheme.
Demonstrated suitability for long-term 'cold' data storage in DNA.
Abstract
The exponentially increasing demand for data storage has been facing more and more challenges during the past years. The energy costs that it represents are also increasing, and the availability of the storage hardware is not able to follow the storage demand's trend. The short lifespan of conventional storage media -- 10 to 20 years - forces the duplication of the hardware and worsens the situation. The majority of this storage demand concerns "cold" data, data very rarely accessed but that has to be kept for long periods of time. The coding abilities of synthetic DNA, and its long durability (several hundred years), make it a serious candidate as an alternative storage media for "cold" data. In this paper, we propose a variable-length coding algorithm adapted to DNA data storage with improved performance. The proposed algorithm is based on a modified Shannon-Fano code that respects…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Algorithms and Data Compression · Error Correcting Code Techniques
