# Motif caller for sequence reconstruction in motif-based DNA storage

**Authors:** Parv Agarwal, Nimesh Pinnamaneni, Thomas Heinis

PMC · DOI: 10.1038/s41598-025-22798-2 · Scientific Reports · 2025-11-10

## TL;DR

This paper introduces Motif Caller, a machine learning model that improves DNA data storage by directly detecting DNA motifs from sequencing signals, bypassing traditional basecalling.

## Contribution

Motif Caller is a novel machine learning model that directly detects motifs from raw nanopore signals, improving accuracy and efficiency in DNA data storage.

## Key findings

- Motif Caller achieves higher accuracy by directly detecting motifs from raw signals instead of using intermediate basecalling.
- The direct motif detection approach enhances the efficiency of data retrieval in motif-based DNA storage systems.
- Leveraging richer signal features associated with motifs leads to better performance compared to traditional methods.

## Abstract

DNA data storage is rapidly emerging as a promising solution for long-term data archiving, largely due to its exceptional durability. However, the synthesis of DNA strands remains a significant bottleneck in terms of cost and speed. To address this, new methods have been developed that encode information by concatenating long data-carrying DNA sequences from pre-synthesized DNA subsequences – known as motifs – from a library. Reading back data from DNA storage relies on basecalling–the process of translating raw nanopore sequencing signals into DNA base sequences using machine learning models. These sequences are then decoded back into binary data. However, current basecalling approaches are not optimized for decoding motif-carrying DNA: they first predict individual bases from the raw signal and only afterward attempt to identify higher-level motifs. This two-step, motif-agnostic process is both imprecise and inefficient. In this paper we introduce Motif Caller, machine learning model designed to directly detect entire motifs from raw nanopore signals, bypassing the need for intermediate basecalling. By targeting motifs directly, Motif Caller leverages richer signal features associated with each motif, resulting in significantly improved accuracy. This direct approach also enhances the efficiency of data retrieval in motif-based DNA storage systems.

## Full-text entities

- **Diseases:** CTC (MESH:D008310)
- **Chemicals:** phosphoramidite (MESH:C434331)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12603312/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12603312/full.md

## References

9 references — full list in the complete paper: https://tomesphere.com/paper/PMC12603312/full.md

---
Source: https://tomesphere.com/paper/PMC12603312