Diffusion Decoding for Peptide De Novo Sequencing
Chi-en Amy Tai, Alexander Wong

TL;DR
This paper explores using diffusion decoders for peptide de novo sequencing, demonstrating that they can significantly improve amino acid recall over traditional autoregressive models, despite some performance challenges.
Contribution
It introduces diffusion decoders adapted for peptide sequencing, showing their potential to improve sensitivity and accuracy over existing autoregressive methods.
Findings
Diffusion decoders can enhance amino acid recall in peptide sequencing.
Knapsack beam search did not improve performance metrics.
The best diffusion decoder with DINOISER loss significantly outperformed the baseline.
Abstract
Peptide de novo sequencing is a method used to reconstruct amino acid sequences from tandem mass spectrometry data without relying on existing protein sequence databases. Traditional deep learning approaches, such as Casanovo, mainly utilize autoregressive decoders and predict amino acids sequentially. Subsequently, they encounter cascading errors and fail to leverage high-confidence regions effectively. To address these issues, this paper investigates using diffusion decoders adapted for the discrete data domain. These decoders provide a different approach, allowing sequence generation to start from any peptide segment, thereby enhancing prediction accuracy. We experiment with three different diffusion decoder designs, knapsack beam search, and various loss functions. We find knapsack beam search did not improve performance metrics and simply replacing the transformer decoder with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChemical Synthesis and Analysis
MethodsDiffusion
