TL;DR
This paper introduces novel matrix completion algorithms for haplotype assembly from NGS reads, demonstrating improved performance over existing methods in terms of SNP missing rate and haplotype block length.
Contribution
The paper develops three new algorithms, HapSVT, HapNuc, and HapOPT, applying matrix completion techniques to enhance haplotype assembly accuracy and efficiency.
Findings
HapOPT outperforms HapCUT2 in SNP missing rate and haplotype block length.
The algorithms achieve comparable reconstruction and switch error rates to state-of-the-art methods.
The MATLAB implementation is publicly available for research use.
Abstract
We apply matrix completion methods for haplotype assembly from NGS reads to develop the new HapSVT, HapNuc, and HapOPT algorithms. This is performed by applying a mathematical model to convert the reads to an incomplete matrix and estimating unknown components. This process is followed by quantizing and decoding the completed matrix in order to estimate haplotypes. These algorithms are compared to the state-of-the-art algorithms using simulated data as well as the real fosmid data. It is shown that the SNP missing rate and the haplotype block length of the proposed HapOPT are better than those of HapCUT2 with comparable accuracy in terms of reconstruction rate and switch error rate. A program implementing the proposed algorithms in MATLAB is freely available at https://github.com/smajidian/HapMC.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
