From Pixels to Nucleotides: End-to-End Token-Based Video Compression for DNA Storage
Cihan Ruan, Lebin Zhou, Bingqing Zhao, Rongduo Han, Qiming Yuan, Chenchen Zhu, Linyi Han, Liang Yang, Wei Wang, Wei Jiang, Nam Ling

TL;DR
This paper introduces HELIX, an end-to-end neural network that jointly optimizes video compression and DNA encoding using token-based representations, achieving efficient and biologically compatible DNA storage of videos.
Contribution
It presents the first integrated approach combining neural video compression with DNA encoding, leveraging token-based representations aligned with DNA's quaternary alphabet.
Findings
HELIX achieves 1.91 bits per nucleotide in DNA storage.
Token-based representations naturally align with DNA bases, enabling efficient encoding.
Joint optimization improves visual quality and biochemical constraints over separate stages.
Abstract
DNA-based storage has emerged as a promising approach to the global data crisis, offering molecular-scale density and millennial-scale stability at low maintenance cost. Over the past decade, substantial progress has been made in storing text, images, and files in DNA -- yet video remains an open challenge. The difficulty is not merely technical: effective video DNA storage requires co-designing compression and molecular encoding from the ground up, a challenge that sits at the intersection of two fields that have largely evolved independently. In this work, we present HELIX, the first end-to-end neural network jointly optimizing video compression and DNA encoding -- prior approaches treat the two stages independently, leaving biochemical constraints and compression objectives fundamentally misaligned. Our key insight: token-based representations naturally align with DNA's quaternary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
