# Quantitative Assessment of Randomized DNA Base Sequences Using Multi‐Model Physical Analysis for High‐Fidelity Data Storage

**Authors:** Seongjun Seo, Thi Hong Nhung Vu, Anshula Tandon, Suyoun Park, Thi Bich Ngoc Nguyen, Shinsuke Kawai, Sung Ha Park

PMC · DOI: 10.1002/advs.202517208 · Advanced Science · 2025-11-21

## TL;DR

This paper presents a new method to design DNA sequences for data storage that reduces errors and improves reliability using physics-based models.

## Contribution

A quantitative framework using physics-inspired models to optimize DNA sequence design for high-fidelity data storage.

## Key findings

- Strict randomization rules reduce homopolymer length and balance GC content.
- Encoding schemes improve sequence randomness and error resilience.
- PCR and Sanger sequencing confirm 95–98% decoding fidelity.

## Abstract

DNA is emerging as a promising medium for ultra‐dense, long‐term digital data storage, yet sequence design remains hindered by homopolymer formation and compositional bias, which compromise synthesis, sequencing, and decoding accuracy. Here, the study introduces a quantitative framework to evaluate and optimize randomized DNA base sequence design rules using three physics‐inspired models: translational and rotational active particle trajectories, the inverse Ising model, and a 3‐input 1‐output logic algorithm system. Encoding schemes with varying homopolymer constraints are systematically applied to binary image data. Rigorous analysis reveals that stringent randomization rules markedly reduce homopolymer length, balance GC content, and enhance sequence randomness. Experimental validation via polymerase chain reaction (PCR) amplification and Sanger sequencing confirms high decoding fidelity (95–98%). This multi‐model assessment establishes a robust strategy for designing DNA sequences with superior stability, reliability, and scalability for future molecular data storage systems.

This study quantitatively evaluates randomized DNA sequence design for digital data storage using three physics‐based models. By applying encoding schemes with strict homopolymer constraints, the framework improves base randomness, GC balance, and error resilience. Experimental validation via PCR and Sanger sequencing confirms 95–98% decoding accuracy, demonstrating the feasibility of robust, long‐term molecular information preservation.

## Full-text entities

- **Chemicals:** C (MESH:D002244), T (MESH:D014316), water (MESH:D014867), agarose (MESH:D012685)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12866812/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12866812/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/PMC12866812/full.md

---
Source: https://tomesphere.com/paper/PMC12866812