Optimizing Mirror-Image Peptide Sequence Design for Data Storage via Peptide Bond Cleavage Prediction
Yilong Lu, Si Chen, Songyan Gao, Han Liu, Xin Dong, Wenfeng Shen, Guangtai Ding

TL;DR
This paper introduces a novel approach to enhance mirror-image peptide sequencing accuracy for biological data storage by optimizing peptide sequences using a deep learning model that predicts peptide bond cleavage.
Contribution
It is the first to optimize mirror-image peptide sequences for better sequencing accuracy, introducing a new dataset, labeling algorithm, and dual prediction strategy.
Findings
MiPD513 dataset with 513 peptides created
12.5 million labeled data points generated
Single-label classification outperformed other methods
Abstract
Traditional non-biological storage media, such as hard drives, face limitations in both storage density and lifespan due to the rapid growth of data in the big data era. Mirror-image peptides composed of D-amino acids have emerged as a promising biological storage medium due to their high storage density, structural stability, and long lifespan. The sequencing of mirror-image peptides relies on \textit{de-novo} technology. However, its accuracy is limited by the scarcity of tandem mass spectrometry datasets and the challenges that current algorithms encounter when processing these peptides directly. This study is the first to propose improving sequencing accuracy indirectly by optimizing the design of mirror-image peptide sequences. In this work, we introduce DBond, a deep neural network based model that integrates sequence features, precursor ion properties, and mass spectrometry…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
