Motif Caller: Sequence Reconstruction for Motif-Based DNA Storage
Parv Agarwal, Nimesh Pinnamaneni, Thomas Heinis

TL;DR
Motif Caller is a machine learning model that directly detects DNA motifs from raw nanopore signals, improving accuracy and efficiency in motif-based DNA data storage retrieval.
Contribution
It introduces a novel direct motif detection method from raw signals, bypassing basecalling, to enhance DNA storage decoding accuracy and speed.
Findings
Significantly improved motif detection accuracy.
Faster data retrieval in motif-based DNA storage.
Reduced errors compared to traditional basecalling methods.
Abstract
DNA data storage is rapidly emerging as a promising solution for long-term data archiving, largely due to its exceptional durability. However, the synthesis of DNA strands remains a significant bottleneck in terms of cost and speed. To address this, new methods have been developed that encode information by concatenating long data-carrying DNA sequences from pre-synthesized DNA subsequences - known as motifs - from a library. Reading back data from DNA storage relies on basecalling - the process of translating raw nanopore sequencing signals into DNA base sequences using machine learning models. These sequences are then decoded back into binary data. However, current basecalling approaches are not optimized for decoding motif-carrying DNA: they first predict individual bases from the raw signal and only afterward attempt to identify higher-level motifs. This two-step, motif-agnostic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Environmental DNA in Biodiversity Studies · Genomics and Phylogenetic Studies
