S3Mirror: Making Genomic Data Transfers Fast, Reliable, and Observable with DBOS
Steven Vasquez-Grinnell, Alex Poliakov

TL;DR
S3Mirror is an open source application that leverages the DBOS framework to enable fast, reliable, and observable large genomic data transfers between S3 buckets, outperforming existing solutions like AWS DataSync.
Contribution
The paper introduces S3Mirror, a novel application utilizing the DBOS framework for efficient, cost-effective, and observable large-scale genomic data transfers.
Findings
S3Mirror runs up to 40x faster than AWS DataSync.
S3Mirror is resilient to failures during data transfer.
S3Mirror provides real-time observability of data transfers.
Abstract
To meet the needs of a large pharmaceutical organization, we set out to create S3Mirror - an application for transferring large genomic sequencing datasets between S3 buckets quickly, reliably, and observably. We used the DBOS Transact durable execution framework to achieve these goals and benchmarked the performance and cost of the application. S3Mirror is an open source DBOS Python application that can run in a variety of environments, including DBOS Cloud Pro, where it runs as much as 40x faster than AWS DataSync at a fraction of the cost. Moreover, S3Mirror is resilient to failures and allows for real-time filewise observability of ongoing and past transfers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCancer Genomics and Diagnostics · Gene expression and cancer classification
