Optimizing STAR Aligner for High Throughput Computing in the Cloud
Piotr Kica, Sabina Licho{\l}ai, Micha{\l} Orzechowski, Maciej Malawski

TL;DR
This paper presents a scalable cloud-native architecture for RNA-seq data alignment using STAR, optimizing performance and cost-efficiency through resource management and software improvements.
Contribution
It introduces a cloud-based pipeline for high-throughput RNA-seq alignment with novel optimization techniques and resource strategies.
Findings
Significant computational savings achieved
Effective use of AWS cloud services demonstrated
Performance improvements through early stopping and resource tuning
Abstract
We propose a scalable, cloud-native architecture designed for Transcriptomics Atlas Pipeline, using a resource-intensive STAR aligner and processing tens or hundreds of terabytes of RNA-seq data. We implement the pipeline using AWS cloud services, introduce performance optimizations and perform experimental evaluation in the cloud. Our optimization techniques result in computational savings thanks to the "early stopping" approach, selection of right-sized resources, and using newer version of Ensembl genome.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptical Network Technologies · Optical Wireless Communication Technologies · Image Enhancement Techniques
