# EMVC-2: an efficient single-nucleotide variant caller based on expectation maximization

**Authors:** Guillermo Dufort y Álvarez, Martí Xargay-Ferrer, Alba Pagès-Zamora, Idoia Ochoa

PMC · DOI: 10.1093/bioinformatics/btad681 · Bioinformatics · 2023-11-14

## TL;DR

EMVC-2 is a fast and accurate method for detecting genetic variations in DNA sequencing data using a new algorithmic approach.

## Contribution

EMVC-2 introduces a novel SNV calling method using expectation maximization and ensemble classification for improved accuracy and speed.

## Key findings

- EMVC-2 outperforms existing SNV callers in accuracy and speed on real human sequencing data.
- The method uses a decision tree to filter out unlikely variants after genotype inference.
- EMVC-2 is implemented in C and Python and is publicly available for use.

## Abstract

Single-nucleotide variants (SNVs) are the most common type of genetic variation in the human genome. Accurate and efficient detection of SNVs from next-generation sequencing (NGS) data is essential for various applications in genomics and personalized medicine. However, SNV calling methods usually suffer from high computational complexity and limited accuracy. In this context, there is a need for new methods that overcome these limitations and provide fast reliable results.

We present EMVC-2, a novel method for SNV calling from NGS data. EMVC-2 uses a multi-class ensemble classification approach based on the expectation–maximization algorithm that infers at each locus the most likely genotype from multiple labels provided by different learners. The inferred variants are then validated by a decision tree that filters out unlikely ones. We evaluate EMVC-2 on several publicly available real human NGS data for which the set of SNVs is available, and demonstrate that it outperforms state-of-the-art variant callers in terms of accuracy and speed, on average.

EMVC-2 is coded in C and Python, and is freely available for download at: https://github.com/guilledufort/EMVC-2. EMVC-2 is also available in Bioconda.

## Full-text entities

- **Diseases:** cancer (MESH:D009369)
- **Chemicals:** EMVC (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC10919945/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC10919945/full.md

## References

10 references — full list in the complete paper: https://tomesphere.com/paper/PMC10919945/full.md

---
Source: https://tomesphere.com/paper/PMC10919945