# ANOMALY: a Snakemake pipeline for identifying NuMTs from long-read sequencing data

**Authors:** Nirmal S Mahar, Rachit Singh, Ishaan Gupta, Shweta Ramdas

PMC · DOI: 10.1093/nargab/lqag014 · NAR Genomics and Bioinformatics · 2026-02-04

## TL;DR

ANOMALY is a new pipeline that accurately detects NuMTs using long-read sequencing data, which is better than current short-read methods.

## Contribution

The paper introduces ANOMALY, the first Snakemake pipeline for detecting NuMTs from long-read sequencing data with high accuracy.

## Key findings

- ANOMALY achieved a precision of 1.000, recall of 0.989, and F1-score of 0.994 on 50 simulated datasets.
- Long-read data outperforms short-read data in resolving complex NuMTs.
- The pipeline is open-source and available for use with detailed setup instructions.

## Abstract

Nuclear mitochondrial DNA segments (NuMTs) can contribute to cancer development and disease progression by disrupting protein-coding genes. Furthermore, their presence confounds mitochondrial variant detection, underscoring the critical need for robust NuMT detection. Current methods to call NuMTs rely on short-read sequencing data but struggle to resolve complex NuMTs. These limitations can be overcome by employing long-read sequencing data. However, no such workflow exists to capture NuMTs from long-read sequencing data. Here, we introduce ANOMALY, a novel, easy-to-use workflow for detecting NuMTs from long-read sequencing data. The pipeline takes raw sequencing or aligned data and calls and visualises sample NuMTs. On 50 simulated datasets, the pipeline demonstrated high accuracy, with a precision of 1.000, a recall of 0.989, and an F1-score of 0.994. The pipeline underscores the limitations of short-read data in resolving and capturing complex NuMTs while demonstrating that long-read data enables their accurate identification. The Snakemake pipeline employs Python, Bash and R and is published under an open-source GNU GPL v3 license. Detailed information on setting up and running the pipeline, along with the source code, is available at https://github.com/Nirmal2310/ANOMALY.

## Full-text entities

- **Diseases:** cancer (MESH:D009369)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12869244/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12869244/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/PMC12869244/full.md

---
Source: https://tomesphere.com/paper/PMC12869244