# ParaMask: a new method to identify multicopy genomic regions, corrects major biases in whole-genome sequencing data

**Authors:** Bastiaan Tjeng, Male Arimond, Helene Bråten Grindeland, Andrea Dalla Libera, Andrea Fulgione

PMC · DOI: 10.1186/s13059-025-03836-8 · Genome Biology · 2025-10-24

## TL;DR

ParaMask is a new tool that identifies and filters repeated genomic regions to correct biases in whole-genome sequencing data.

## Contribution

ParaMask introduces a flexible method using an Expectation-Maximization framework to detect multicopy regions across species.

## Key findings

- Multicopy regions cause biases in evolutionary genomic analyses.
- ParaMask effectively identifies and filters these regions to correct the biases.
- The method combines excess heterozygosity, read-ratio deviations, and clustering for high recall.

## Abstract

Multicopy genomic regions are repeated sequences that can bias genomic analyses. Here, we present a method, ParaMask, to identify and filter multicopy regions in population-level genomic data of any species. The broad applicability of this method stems from a flexible Expectation-Maximization framework to detect excess heterozygosity while simultaneously fitting inbreeding levels. By combining this signature with read-ratio deviations, excess sequencing depth, and a clustering technique, our method attains high recall. We show that multicopy regions create biases that confound evolutionary genomic analyses and that by identifying these regions with our method and filtering them, we can correct these biases.

The online version contains supplementary material available at 10.1186/s13059-025-03836-8.

## Full-text entities

- **Chemicals:** acetaldehyde (MESH:D000079), ethanol (MESH:D000431)
- **Species:** Bacteria Latreille et al. 1825 (Bacteria stick insect, genus) [taxon 629395], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932], Leptidea sinapis (species) [taxon 189913], Arabidopsis halleri (species) [taxon 81970], Pieris brassicae (cabbage butterfly, species) [taxon 7116], Drosophila melanogaster (fruit fly, species) [taxon 7227], Arabis alpina (alpine rockcress, species) [taxon 50452], Homo sapiens (human, species) [taxon 9606], Oncorhynchus gorbuscha (humpback salmon, species) [taxon 8017], Arabidopsis thaliana (mouse-ear cress, species) [taxon 3702]
- **Cell lines:** ES03- — Homo sapiens (Human), Embryonic stem cell (CVCL_7158)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12551310/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12551310/full.md

## References

12 references — full list in the complete paper: https://tomesphere.com/paper/PMC12551310/full.md

---
Source: https://tomesphere.com/paper/PMC12551310