Mapache: a flexible pipeline to map ancient DNA

Samuel Neuenschwander (1; 2); Diana I. Cruz D\'avalos (1; 3),; Lucas Anchieri (1; 3); B\'arbara Sousa da Mota (1; 3); Davide Bozzi (1; and 3); Simone Rubinacci (1; 3); Olivier Delaneau (1; 3); Simon; Rasmussen (4); and Anna-Sapfo Malaspinas (1; 3) ((1) Department of; Computational Biology; University of Lausanne; Switzerland; (2) Vital-IT,; Swiss Institute of Bioinformatics; Lausanne; Switzerland; (3) Swiss Institute; of Bioinformatics; Lausanne; Switzerland; (4) Novo Nordisk Foundation Center; for Protein Research; University of Copenhagen; Denmark)

arXiv:2208.13283·q-bio.GN·March 9, 2023·Bioinform.

Mapache: a flexible pipeline to map ancient DNA

Samuel Neuenschwander (1, 2), Diana I. Cruz D\'avalos (1, 3),, Lucas Anchieri (1, 3), B\'arbara Sousa da Mota (1, 3), Davide Bozzi (1, and 3), Simone Rubinacci (1, 3), Olivier Delaneau (1, 3), Simon, Rasmussen (4), and Anna-Sapfo Malaspinas (1, 3) ((1) Department of

PDF

Open Access 1 Repo

TL;DR

Mapache is a scalable, automated pipeline designed to efficiently map, quantify, and impute ancient and modern DNA, addressing the challenges of reproducibility and resource consumption in large-scale genomic studies.

Contribution

It introduces a flexible, robust, and scalable Snakemake-based pipeline specifically optimized for ancient DNA mapping and analysis.

Findings

01

Efficient handling of large ancient DNA datasets.

02

Reproducible mapping and imputation workflows.

03

Optimized for low-space consumption.

Abstract

Summary: Mapping ancient DNA to a reference genome is challenging as it involves numerous steps, is time-consuming and has to be repeated within a study to assess the quality of extracts and libraries; as a result, the mapping needs to be automatized to handle large amounts of data in a reproducible way. We present mapache, a flexible, robust, and scalable pipeline to map, quantify and impute ancient and present-day DNA in a reproducible way. Mapache is implemented in the workflow manager Snakemake and is optimized for low-space consumption, allowing to efficiently (re)map large data sets such as reference panels and multiple extracts and libraries.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sneuensc/mapache
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Phylogenetic Studies · Forensic and Genetic Research · Algorithms and Data Compression