Merging High-Throughput, Amplicon-Based Second and Third Generation Sequencing Data: An Integrative and Modular Data Analysis Framework for Haplotype Prediction and Output Evaluation
Sylvia Mink, Christian Attenberger, Yannik Busch, Johanna Kiefer, Wolfgang Peter, Janne Cadamuro, Tim A. Steiert, Andre Franke, Christoph Gassner

TL;DR
This paper introduces a new framework that combines second and third generation sequencing data to improve haplotype prediction and simplify complex genomic analysis.
Contribution
The novel contribution is an integrative, modular framework that automates and streamlines haplotype prediction using both Illumina and ONT sequencing data.
Findings
The framework successfully validated using synthetic and real-life data from 400 blood donors.
It combines the accuracy of second-generation and the long-read capability of third-generation sequencing.
Haplotypes are frequency-ranked and discrepancies are color-coded for easy evaluation.
Abstract
Despite providing highly accurate results, the short reads generated by second generation sequencing have major limitations in mapping complex genomic regions. Longer reads can resolve these issues and additionally phase distant variants. The third generation sequencing platform ONT currently achieves the longest sequencing reads but falls short in sequencing accuracy. Additionally, deriving phased haplotypes from amplicon-based NGS data remains a complex and time-consuming task that requires extensive bioinformatic expertise. We constructed an integrative, open-access modular data-analysis framework that allows for automated processing of high-throughput sequencing data from both second (Illumina) and third generation (ONT) sequencing platforms, combining the strengths of both technologies. Variant information is automatically evaluated and color-coded for discrepancies. Haplotypes are…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Gene expression and cancer classification · RNA modifications and cancer
