Optimizing Mirror-Image Peptide Sequence Design for Data Storage via Peptide Bond Cleavage Prediction

Yilong Lu; Si Chen; Songyan Gao; Han Liu; Xin Dong; Wenfeng Shen; Guangtai Ding

arXiv:2510.25814·q-bio.QM·February 9, 2026

Optimizing Mirror-Image Peptide Sequence Design for Data Storage via Peptide Bond Cleavage Prediction

Yilong Lu, Si Chen, Songyan Gao, Han Liu, Xin Dong, Wenfeng Shen, Guangtai Ding

PDF

TL;DR

This paper introduces a novel approach to enhance mirror-image peptide sequencing accuracy for biological data storage by optimizing peptide sequences using a deep learning model that predicts peptide bond cleavage.

Contribution

It is the first to optimize mirror-image peptide sequences for better sequencing accuracy, introducing a new dataset, labeling algorithm, and dual prediction strategy.

Findings

01

MiPD513 dataset with 513 peptides created

02

12.5 million labeled data points generated

03

Single-label classification outperformed other methods

Abstract

Traditional non-biological storage media, such as hard drives, face limitations in both storage density and lifespan due to the rapid growth of data in the big data era. Mirror-image peptides composed of D-amino acids have emerged as a promising biological storage medium due to their high storage density, structural stability, and long lifespan. The sequencing of mirror-image peptides relies on \textit{de-novo} technology. However, its accuracy is limited by the scarcity of tandem mass spectrometry datasets and the challenges that current algorithms encounter when processing these peptides directly. This study is the first to propose improving sequencing accuracy indirectly by optimizing the design of mirror-image peptide sequences. In this work, we introduce DBond, a deep neural network based model that integrates sequence features, precursor ion properties, and mass spectrometry…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.