MoGAAAP: a modular Snakemake workflow for automated genome assembly and annotation with quality assessment
Dirk-Jan M van Workum, Kuntal K Dey, Alexander Kozik, Dean O Lavelle, Dick de Ridder, M Eric Schranz, Richard W Michelmore, Sandra Smit

TL;DR
MoGAAAP is a pipeline that automates genome assembly and annotation using various sequencing data, providing quality assessments for comparative genomics.
Contribution
The novelty lies in a modular, automated pipeline for genome assembly and annotation with comprehensive quality assessment.
Findings
The pipeline is species-agnostic and supports HiFi, ONT, and Hi-C reads.
It generates detailed quality reports for assembly and annotation.
The pipeline is implemented in Snakemake and is publicly available for use.
Abstract
With the current speed of sequencing, there is a desire for standardized and automated genome assembly and annotation to produce high-quality genomes as input for comparative (pan)genomics. Therefore, we created a convenience pipeline using existing tools that creates annotated genome assemblies from HiFi (and optionally ultra-long ONT and/or Hi-C) reads for a set of related individuals as well as a related reference genome. Our pipeline is species-agnostic and generates an extensive quality assessment report that can be used for manual filtering and refinement of the assembly and annotation. It includes statistics for individual completeness and contamination assessments as well as a concise pangenome view. The pipeline is implemented in Snakemake and available with a GPLv3 licence at GitHub under github.com/dirkjanvw/MoGAAAP, at Zenodo under doi.org/10.5281/zenodo.14833021, and can be…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genomics and Rare Diseases · Genome Rearrangement Algorithms
