# Metapipeline-DNA: A comprehensive germline and somatic genomics Nextflow pipeline

**Authors:** Yash Patel, Chenghao Zhu, Takafumi N. Yamaguchi, Nicholas K. Wang, Nicholas Wiltsie, Nicole Zeltser, Alfredo E. Gonzalez, Helena K. Winata, Yu Pan, Mohammed Faizal Eeman Mootor, Timothy Sanders, Sorel T. Fitz-Gibbon, Cyriac Kandoth, Julie Livingstone, Lydia Y. Liu, Benjamin Carlin, Aaron Holmes, Jieun Oh, John Sahrmann, Shu Tao, Stefan Eng, Rupert Hugh-White, Kiarod Pashminehazar, Arpi Beshlikyan, Madison Jordan, Selina Wu, Mao Tian, Jaron Arbet, Beth Neilsen, Roni Haas, Yuan Zhe Bugh, Gina Kim, Joseph Salmingo, Wenshu Zhang, Aakarsh Anand, Edward Hwang, Anna Neiman-Golden, Philippa Steinberg, Wenyan Zhao, Prateek Anand, Raag Agrawal, Brandon L. Tsai, Paul C. Boutros

PMC · DOI: 10.1016/j.crmeth.2026.101340 · Cell Reports Methods · 2026-03-17

## TL;DR

Metapipeline-DNA is an automated, flexible pipeline for analyzing DNA sequencing data to extract genetic and evolutionary features from both germline and somatic genomes.

## Contribution

The paper introduces Metapipeline-DNA, a comprehensive and extensible Nextflow pipeline for scalable genomic analysis with robust automation and cloud compatibility.

## Key findings

- Metapipeline-DNA supports analysis of both germline and somatic DNA sequencing data.
- The pipeline is optimized for scalability, reproducibility, and consistent quality control.
- It integrates multiple algorithms for detecting nuclear, mitochondrial, and evolutionary features.

## Abstract

The price, quality, and throughput of DNA sequencing continue to improve. Algorithmic innovations have allowed inference of a growing range of features from DNA sequencing data, quantifying nuclear, mitochondrial, and evolutionary aspects of both germline and somatic genomes. To automate analyses of the full range of genomic characteristics, we created an extensible Nextflow metapipeline called metapipeline-DNA. It analyzes targeted and whole-genome sequencing data from raw reads through preprocessing, feature detection by multiple algorithms, quality control, and data-visualization. Each step can be run independently and is supported by robust software engineering including automated failure-recovery, granular testing, and consistent verifications of inputs, outputs, and parameters. Metapipeline-DNA is cloud-compatible and highly configurable, with options to subset and optimize each analysis. Metapipeline-DNA facilitates high-scale, comprehensive analysis of DNA sequencing data, and is open-source under the GPLv2 license.

•Metapipeline-DNA is a computational pipeline to analyze DNA sequencing data•Metapipeline-DNA identifies genetic and evolutionary features from DNA•Metapipeline-DNA is automated, highly customizable, and compute-agnostic•Metapipeline-DNA is optimized for disk usage and allocation of available resources

Metapipeline-DNA is a computational pipeline to analyze DNA sequencing data

Metapipeline-DNA identifies genetic and evolutionary features from DNA

Metapipeline-DNA is automated, highly customizable, and compute-agnostic

Metapipeline-DNA is optimized for disk usage and allocation of available resources

Rapid improvements in DNA sequencing technologies have expanded the breadth of genomic features, ranging from nuclear, mitochondrial, and evolutionary variation in germline and somatic contexts, which can be elucidated from sequencing data. In parallel, analytical workflows required to process and identify these features have become increasingly complex, relying on specialized tools and algorithms with varying assumptions and computational requirements. Comprehensive analysis, therefore, requires significant integration effort, limiting scalability, reproducibility, and consistent quality controls. To address this need for a flexible, robust framework that accommodates diverse sequencing methods and feature classes while being highly scalable and adaptable across computational environments, we created metapipeline-DNA to automate genomic analyses.

Patel et al. develop an automated, extensible, and cloud-compatible DNA sequencing analysis pipeline for DNA sequencing data to transform raw sequencing reads into genetic characteristics and evolutionary features. They demonstrate and validate the pipeline using whole-genome and targeted sequencing data from normal and tumor samples.

## Full-text entities

- **Diseases:** Cancer (MESH:D009369), biliary tract carcinoma (MESH:D001661), esophageal adenocarcinoma (MESH:D000230), soft tissue sarcoma (MESH:D012509), uterine corpus endometrial carcinoma (MESH:D016889)
- **Chemicals:** guanine (MESH:D006147), T (MESH:D014316), SNV (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13030954/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13030954/full.md

## References

64 references — full list in the complete paper: https://tomesphere.com/paper/PMC13030954/full.md

---
Source: https://tomesphere.com/paper/PMC13030954