Data provenance tracking as the basis for a biomedical virtual research environment
Richard McClatchey

TL;DR
This paper discusses the development of a Virtual Research Environment for biomedical data analysis that captures comprehensive provenance information, supporting reproducibility, collaboration, and validation in complex, distributed workflows.
Contribution
It introduces an extended provenance management system within a Virtual Laboratory framework, enabling full tracking of data, workflows, and results in biomedical research environments.
Findings
Supports reproducibility and validation of biomedical analyses.
Enables collaborative research through traceable provenance.
Integrates with CRISTAL software for comprehensive data and workflow management.
Abstract
In complex data analyses it is increasingly important to capture information about the usage of data sets in addition to their preservation over time to ensure reproducibility of results, to verify the work of others and to ensure appropriate conditions data have been used for specific analyses. Scientific workflow based studies are beginning to realize the benefit of capturing this provenance of data and the activities used to process, transform and carry out studies on those data. One way to support the development of workflows and their use in (collaborative) biomedical analyses is through the use of a Virtual Research Environment. The dynamic and distributed nature of Grid/Cloud computing, however, makes the capture and processing of provenance information a major research challenge. Furthermore most workflow provenance management services are designed only for data-flow oriented…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
