Optimizing STAR Aligner for High Throughput Computing in the Cloud

Piotr Kica; Sabina Licho{\l}ai; Micha{\l} Orzechowski; Maciej Malawski

arXiv:2409.05886·cs.DC·September 11, 2024

Optimizing STAR Aligner for High Throughput Computing in the Cloud

Piotr Kica, Sabina Licho{\l}ai, Micha{\l} Orzechowski, Maciej Malawski

PDF

Open Access

TL;DR

This paper presents a scalable cloud-native architecture for RNA-seq data alignment using STAR, optimizing performance and cost-efficiency through resource management and software improvements.

Contribution

It introduces a cloud-based pipeline for high-throughput RNA-seq alignment with novel optimization techniques and resource strategies.

Findings

01

Significant computational savings achieved

02

Effective use of AWS cloud services demonstrated

03

Performance improvements through early stopping and resource tuning

Abstract

We propose a scalable, cloud-native architecture designed for Transcriptomics Atlas Pipeline, using a resource-intensive STAR aligner and processing tens or hundreds of terabytes of RNA-seq data. We implement the pipeline using AWS cloud services, introduce performance optimizations and perform experimental evaluation in the cloud. Our optimization techniques result in computational savings thanks to the "early stopping" approach, selection of right-sized resources, and using newer version of Ensembl genome.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptical Network Technologies · Optical Wireless Communication Technologies · Image Enhancement Techniques