# FM3VCF: A software library for accelerating the loading of large VCF files in genotype data analyses

**Authors:** Zhen Zuo, Mingliang Li, Qi Li, Zhuo Li, Defu Liu, Guanshi Ye, You Tang, Andrea Tangherloni, Andrea Tangherloni, Andrea Tangherloni

PMC · DOI: 10.1371/journal.pone.0324430 · PLOS One · 2025-06-04

## TL;DR

FM3VCF is a fast software library that speeds up the loading and compression of large VCF files for genomic analyses.

## Contribution

FM3VCF introduces a 36-fold speed improvement in VCF compression and efficient multi-threaded processing for genomic data.

## Key findings

- FM3VCF compresses VCF files to M3VCF format 36 times faster than m3VCFtools.
- FM3VCF is three times faster than HTSlib for reading compressed VCF files.
- The tool reduces computational burden in genomic analyses by utilizing multiple CPU threads.

## Abstract

The increasing size of genotype data has led to the loading of VCF files becoming a computational bottleneck in various analyses, including imputation and genome-wide association studies (GWAS). To address this issue, we developed a software library, FM3VCF (fast M3VCF), that utilizes multiple CPU threads to accelerate this process and compress VCF files into the more compact M3VCF format. FM3VCF can convert VCF files into the exclusive data format of MINIMAC4 and M3VCF and can efficiently read and parse data from VCF files. Compared with m3VCFtools, FM3VCF exhibits a speed improvement of approximately 36-fold in the compression of VCF files to the M3VCF format. This acceleration addresses a limitation faced by MINIMAC4 when dealing with datasets containing millions of samples. Furthermore, FM3VCF is approximately 3 times faster than HTSlib, including decompressing and parsing, for reading compressed VCF files. FM3VCF is an effective tool for both compressing VCF files efficiently and accelerating the loading of large VCF files in genotype data analyses. By fully utilizing multiple CPU threads, FM3VCF can significantly reduce the computational burden of various genomic analyses.

## Full-text entities

- **Diseases:** VCF (MESH:D058426), DS (MESH:C535601), FM3VCF (MESH:D007003)
- **Chemicals:** DATA (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12136311/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12136311/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/PMC12136311/full.md

---
Source: https://tomesphere.com/paper/PMC12136311