Systematically Examining Reproducibility: A Case Study for High Throughput Sequencing using the PRIMAD Model and BioCompute Object
Meznah Aloqalaa, Stian Soiland-Reyes, Carole Goble

TL;DR
This paper critically evaluates the BioCompute Object standard for reproducibility in high throughput sequencing pipelines using the PRIMAD model, identifying gaps and proposing extensions to improve reliability in biomedical research.
Contribution
It systematically assesses the BioCompute Object framework for reproducibility claims using the PRIMAD model, revealing necessary improvements and extensions.
Findings
Identified omissions in BCO documentation of pipelines
Mapped BCO elements onto PRIMAD revealing gaps
Proposed extensions to enhance reproducibility validation
Abstract
The reproducibility of computational pipelines is an expectation in biomedical science, particularly in critical domains like human health. In this context, reporting next generation genome sequencing methods used in precision medicine spurred the development of the IEEE 2791-2020 standard for Bioinformatics Analyses Generated by High Throughput Sequencing (HTS), known as the BioCompute Object (BCO). Championed by the USA's Food and Drug Administration, the BCO is a pragmatic framework for documenting pipelines; however, it has not been systematically assessed for its reproducibility claims. This study uses the PRIMAD model, a conceptual framework for describing computational experiments for reproducibility purposes, to systematically review the BCO for depth and coverage. A meticulous mapping of BCO and PRIMAD elements onto a published BCO use case reveals potential omissions and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Scientific Computing and Data Management
