# GA-Novo: De Novo Peptide Sequencing via Tandem Mass Spectrometry using   Genetic Algorithm

**Authors:** Samaneh Azari, Bing Xue, Mengjie Zhang, Lifeng Peng

arXiv: 1902.00845 · 2019-02-05

## TL;DR

GA-Novo is a genetic algorithm-based method for de novo peptide sequencing from tandem mass spectrometry data, significantly improving full-length sequence reconstruction over existing tools like PEAKS.

## Contribution

This paper introduces GA-Novo, a novel genetic algorithm approach that enhances the accuracy of de novo peptide sequencing without relying on protein databases.

## Key findings

- GA-Novo constructs 8% more fully matched peptides than PEAKS.
- GA-Novo achieves 4% higher recall for partially matched sequences.
- The method effectively optimizes amino acid sequences to fit MS/MS spectra.

## Abstract

Proteomics is the large-scale analysis of the proteins. The common method for identifying proteins and characterising their amino acid sequences is to digest the proteins into peptides, analyse the peptides using mass spectrometry and assign the resulting tandem mass spectra (MS/MS) to peptides using database search tools. However, database search algorithms are highly dependent on a reference protein database and they cannot identify peptides and proteins not included in the database. Therefore, de novo sequencing algorithms are developed to overcome the problem by directly reconstructing the peptide sequence of an MS/MS spectrum without using any protein database. Current de novo sequencing algorithms often fail to construct the completely matched sequences, and produce partial matches. In this study, we propose a genetic algorithm based method, GA-Novo, to solve the complex optimisation task of de novo peptide sequencing, aiming at constructing full length sequences. Given an MS/MS spectrum, GA-Novo optimises the amino acid sequences to best fit the input spectrum. On the testing dataset, GA-Novo outperforms PEAKS, the most commonly used software for this task, by constructing 8% higher number of fully matched peptide sequences, and 4% higher recall at partially matched sequences.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.00845/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1902.00845/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/1902.00845/full.md

---
Source: https://tomesphere.com/paper/1902.00845