From RNA sequencing measurements to the final results: a practical guide to navigating the choices and uncertainties of gene set analysis
Milena W\"unsch, Christina Sauer, Patrick Callahan, Ludwig Christian Hinske, Anne-Laure Boulesteix

TL;DR
This paper provides a comprehensive, practical guide for gene set analysis from RNA sequencing data, emphasizing preprocessing choices, method selection, and uncertainties to improve reproducibility and transparency.
Contribution
It offers a detailed overview of gene set analysis methods, focusing on preprocessing steps and uncertainties, with illustrative R code and practical recommendations.
Findings
Highlights importance of data preprocessing choices
Provides illustrative R code for analysis pipelines
Discusses uncertainties and best practices in gene set analysis
Abstract
Gene set analysis, a popular approach for analyzing high-throughput gene expression data, aims to identify sets of related genes that show significantly enriched or depleted expression patterns between different conditions. In the last years, a multitude of methods and corresponding tools have been developed for this task. However, clear guidance is lacking: choosing the right method is the first hurdle a researcher is confronted with. No less challenging than overcoming this so-called method uncertainty is the procedure of preprocessing, from knowing which steps are required to selecting a corresponding approach from the plethora of valid options to create the accepted input object (data preprocessing uncertainty), with clear guidance again being scarce. Here, we provide a practical guide through all steps required to conduct gene set analysis, beginning with a concise overview of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Molecular Biology Techniques and Applications
