DoBSeqWF: a framework for sensitive detection of individual genetic variation in pooled sequencing data
Mads Cort Nielsen, Christian Munch Hagen, Ulrik Kristoffer Stoltze, Thomas van Overeem Hansen, Mette Nyegaard, Henrik Hjalgrim, Marie Bækvad-Hansen, Anna Byrjalsen, Kjeld Schmiegelow, Karin Wadt, Jonas Bybjerg-Grauholm, Simon Rasmussen

TL;DR
DoBSeqWF is a new pipeline that improves the detection of rare genetic variants in cost-effective pooled sequencing data, helping with early diagnosis of genetic diseases.
Contribution
DoBSeqWF introduces a specialized Nextflow-based workflow with machine learning filters for accurate rare variant detection in double-batched sequencing data.
Findings
DoBSeqWF accurately detects rare variants in pooled sequencing data with high sensitivity.
Machine learning filters improve variant detection while maintaining scalability.
The pipeline was validated using a childhood cancer cohort with whole genome sequencing as a reference.
Abstract
Population screening for rare genetic diseases has the potential to increase early diagnosis and treatment, but the high cost of next-generation sequencing limits widespread implementation. Double-batched sequencing (DoBSeq) is a cost-effective method that uses two-dimensional overlapping pool sequencing to enable individual-level rare variant detection. However, the resulting high-depth, complex data require a specialized workflow for efficient, sensitive, and reproducible analysis. We developed DoBSeqWF (DoBSeq Workflow), a Nextflow-based pipeline that processes pooled sequencing data from alignment through variant calling, filtering, and final variant assignment. Using a childhood cancer cohort of 200 individuals with whole genome sequencing as a reference, we created training and validation datasets, benchmarked multiple variant callers, and implemented machine learning filters to…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Rare Diseases · Cancer Genomics and Diagnostics · Genetic Associations and Epidemiology
