Modelling Computational Resources for Next Generation Sequencing Bioinformatics Analysis of 16S rRNA Samples
Matthew J. Wade, Thomas P. Curtis, Russell J. Davenport

TL;DR
This paper presents a novel method using multiple linear regression to accurately predict processing times for bioinformatics analysis of 16S rRNA sequencing data, aiding resource allocation in next-generation sequencing workflows.
Contribution
It introduces a new predictive modeling approach for computational time in bioinformatics, applicable to various software and architectures, improving resource planning.
Findings
Models accurately predict run time for natural community data
Caution needed when parallelizing AmpliconNoise processing
Method can be extended to other bioinformatics pipelines
Abstract
In the rapidly evolving domain of next generation sequencing and bioinformatics analysis, data generation is one aspect that is increasing at a concomitant rate. The burden associated with processing large amounts of sequencing data has emphasised the need to allocate sufficient computing resources to complete analyses in the shortest possible time with manageable and predictable costs. A novel method for predicting time to completion for a popular bioinformatics software (QIIME), was developed using key variables characteristic of the input data assumed to impact processing time. Multiple Linear Regression models were developed to determine run time for two denoising algorithms and a general bioinformatics pipeline. The models were able to accurately predict clock time for denoising sequences from a naturally assembled community dataset, but not an artificial community. Speedup and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · RNA modifications and cancer · Cancer Genomics and Diagnostics
