GEDI: Scalable Algorithms for Genotype Error Detection and Imputation
Justin Kennedy, Ion I. Mandoiu, and Bogdan Pasaniuc

TL;DR
GEDI is a scalable software package that efficiently performs genotype error detection, correction, imputation, and phasing in large genome-wide datasets, with linear runtime scaling.
Contribution
This paper introduces GEDI, a new open-source C++ software that provides scalable algorithms for genotype error detection, imputation, and phasing in large genomic datasets.
Findings
High accuracy in genotype error detection and correction
Linear runtime scaling with data size
Open source availability for community use
Abstract
Genome-wide association studies generate very large datasets that require scalable analysis algorithms. In this report we describe the GEDI software package, which implements efficient algorithms for performing several common tasks in the analysis of population genotype data, including genotype error detection and correction, imputation of both randomly missing and untyped genotypes, and genotype phasing. Experimental results show that GEDI achieves high accuracy with a runtime scaling linearly with the number of markers and samples. The open source C++ code of GEDI, released under the GNU General Public License, is available for download at http://dna.engr.uconn.edu/software/GEDI/
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Genetic Mapping and Diversity in Plants and Animals · Genetic and phenotypic traits in livestock
