snpQT: flexible, reproducible, and comprehensive quality control and imputation of genomic data
Christina Vasilopoulou, Benjamin Wingfield, Andrew P. Morris, William, Duddy

TL;DR
snpQT is an automated, comprehensive, and user-friendly software pipeline for quality control and imputation of genomic data, integrating multiple steps into a scalable, reproducible workflow suitable for researchers without extensive bioinformatics expertise.
Contribution
It introduces a flexible, all-in-one pipeline that simplifies genomic data QC and imputation, reducing the need for multiple tools and expert knowledge.
Findings
Automates 36 quality control and correction steps.
Includes built-in population stratification and imputation.
Provides visualizations and thresholds for quality assessment.
Abstract
Motivation: Quality control of genomic data is an essential but complicated multi-step procedure, often requiring separate installation and expert familiarity with a combination of disparate bioinformatics tools. Results: To provide an automated solution that retains comprehensive quality checks and flexible workflow architecture, we have developed snpQT, a scalable, stand-alone software pipeline, offering some 36 discrete quality filters or correction steps, with plots before-and-after user-modifiable thresholding. This includes build conversion, population stratification against 1,000 Genomes data, population outlier removal, and built-in imputation with its own pre- and post- quality controls. Common input formats are used and users need not be superusers nor have any prior coding experience. A comprehensive online tutorial and installation guide is provided through to GWAS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetalloenzymes and iron-sulfur proteins · Genomics and Phylogenetic Studies · Genomics and Rare Diseases
