Adaptive Parallel Downloader for Large Genomic Datasets
Rasman Mubtasim Swargo, Engin Arslan, and Md Arifuzzaman

TL;DR
FastBioDL is an adaptive parallel downloader that optimizes large genomic dataset downloads by dynamically adjusting concurrency, achieving up to 4x faster speeds and better resource utilization compared to existing tools.
Contribution
We introduce FastBioDL, a novel adaptive downloader that frames the download process as an online optimization problem to improve throughput for large biological datasets.
Findings
Achieves up to 4x speedup over state-of-the-art tools.
Demonstrates 2.1x faster performance in high-speed networks.
Effectively optimizes HTTP/FTP downloads for large datasets.
Abstract
Modern next-generation sequencing (NGS) projects routinely generate terabytes of data, which researchers commonly download from public repositories such as SRA or ENA. Existing download tools often employ static concurrency settings, leading to inefficient bandwidth utilization and prolonged download times due to their inability to adapt to dynamic network conditions. We introduce FastBioDL, a parallel file downloader designed for large biological datasets, featuring an adaptive concurrency controller. FastBioDL frames the download process as an online optimization problem, utilizing a utility function and gradient descent to adjust the number of concurrent socket streams in real-time dynamically. This approach maximizes download throughput while minimizing resource overhead. Comprehensive evaluations on public genomic datasets demonstrate that FastBioDL achieves up to speedup over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
