Efficient Chromosome Parallelization for Precision Medicine Genomic Workflows

Daniel Mas Montserrat; Ray Verma; M\'iriam Barrab\'es; Francisco M. de la Vega; Carlos D. Bustamante; Alexander G. Ioannidis

arXiv:2511.15977·cs.DC·November 21, 2025

Efficient Chromosome Parallelization for Precision Medicine Genomic Workflows

Daniel Mas Montserrat, Ray Verma, M\'iriam Barrab\'es, Francisco M. de la Vega, Carlos D. Bustamante, Alexander G. Ioannidis

PDF

Open Access 1 Video

TL;DR

This paper introduces adaptive, RAM-efficient parallelization techniques for large-scale genomic workflows, improving resource utilization and reducing memory errors in chromosome-level bioinformatics processing.

Contribution

It presents novel symbolic regression and scheduling methods for dynamic, memory-aware parallelization of genomic workflows, enhancing efficiency over static approaches.

Findings

01

Reduced memory overruns in genomic workflows

02

Faster execution times in real-world pipelines

03

Effective load balancing across threads

Abstract

Large-scale genomic workflows used in precision medicine can process datasets spanning tens to hundreds of gigabytes per sample, leading to high memory spikes, intensive disk I/O, and task failures due to out-of-memory errors. Simple static resource allocation methods struggle to handle the variability in per-chromosome RAM demands, resulting in poor resource utilization and long runtimes. In this work, we propose multiple mechanisms for adaptive, RAM-efficient parallelization of chromosome-level bioinformatics workflows. First, we develop a symbolic regression model that estimates per-chromosome memory consumption for a given task and introduces an interpolating bias to conservatively minimize over-allocation. Second, we present a dynamic scheduler that adaptively predicts RAM usage with a polynomial regression model, treating task packing as a Knapsack problem to optimally batch jobs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Efficient Chromosome Parallelization for Precision Medicine Genomic Workflows· underline

Taxonomy

TopicsParallel Computing and Optimization Techniques · Genomics and Phylogenetic Studies · Scientific Computing and Data Management