Applicability study of the PRIMAD model to LIGO gravitational wave search workflows
Dylan Chapp, Danny Rorabaugh, Duncan Brown, Ewa Deelman, Karan Vahi,, Von Welch, and Michela Taufer

TL;DR
This study evaluates how well the PRIMAD model, which standardizes experimental components, applies to LIGO gravitational wave search workflows, highlighting its strengths and limitations.
Contribution
It provides the first assessment of PRIMAD's applicability to complex scientific workflows like LIGO's, revealing its potential and constraints.
Findings
PRIMAD effectively models key components of LIGO workflows.
Limitations identified in the model's abstraction levels for certain workflow details.
Insights into how PRIMAD can be adapted or extended for gravitational wave research.
Abstract
The PRIMAD model with its six components (i.e., Platform, Research Objective, Implementation, Methods, Actors, and Data), provides an abstract taxonomy to represent computational experiments and enforce reproducibility by design. In this paper, we assess the model applicability to a set of Laser Interferometer Gravitational-Wave Observatory (LIGO) workflows from literature sources (i.e., published papers). Our work outlines potentials and limits of the model in terms of its abstraction levels and application process.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Applicability study of the PRIMAD model to LIGO gravitational wave search workflows
Dylan Chapp
University of Delaware
,
Danny Rorabaugh
University of Tennessee, Knoxville
,
Duncan A. Brown
Syracuse University
,
Ewa Deelman
University of Southern California
,
Karan Vahi
University of Southern California
,
Von Welch
Indiana University, Bloomington
and
Michela Taufer
University of Tennessee, Knoxville
Abstract.
The PRIMAD model with its six components (i.e., Platform, Research Objective, Implementation, Methods, Actors, and Data), provides an abstract taxonomy to represent computational experiments and enforce reproducibility by design. In this paper, we assess the model applicability to a set of Laser Interferometer Gravitational-Wave Observatory (LIGO) workflows from literature sources (i.e., published papers). Our work outlines potentials and limits of the model in terms of its abstraction levels and application process.
Reproducibility, LIGO, Workflows, PRIMAD
††conference: 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems; June 24, 2019; Phoenix, AZ
1. Introduction
The ability of the scientific community to incrementally build on experimental results depends strongly on the ability to trust that those results are not accidental or transient, but rather that they can be reproduced to an acceptably high degree of similarity by subsequent experiments. This notion of reproducibility is magnified both in importance and difficulty in the setting of computational science workflows (Stodden et al., 2016; Stodden and Krafczyk, 2018). An increasingly large fraction of scientific endeavors depend on or incorporate computational elements, which in turn opens the door to reproducibility challenges associated with the implementation of those computational elements.
In order to reason about and assess reproducibility in the computational context, the PRIMAD model (Freire et al., 2016) was proposed. PRIMAD breaks reproducibility into six named components (i.e., Platform, Research objective, Implementation, Methods, Actors, and Data), each of which represents an element of a computational experiment where reproducibility can be enforced by design, or conversely where a lack of such design can allow irreproducibility to seep in and corrode the overall integrity of the experiment.
To evaluate the efficacy of PRIMAD as a tool for characterizing the reproducibility of real-world computational science workflows, we examine computational workflows used to detect gravitational waves using data from the Laser Interferometer Gravitational-Wave Observatory (LIGO) (Aasi et al., 2015) and the Virgo Observatory (Acernese et al., 2015). These computational workflows are designed to detect various astronomical events, including binary black hole mergers (Abbott et al., 2016d, c, 2017a, 2017b, 2017c) and binary neutron star mergers (Abbott et al., 2017d).
There are three factors that make the gravitational-wave search workflows particularly appropriate for our post-hoc study through the lens of PRIMAD: (1) LIGO and Virgo have reached a mature status with findings that have been recognized by the scientific community at large; (2) gravitational-wave search workflows support high impact scientific findings (i.e., empirical confirmation of the existence of gravitational waves) that are subject to the highest levels of scrutiny from the broader scientific community; and (3) the workflows consume a large amount of data and have high internal complexity, leading to reproducibility challenges in terms of the implementation and data management components of PRIMAD.
We tackle the study of gravitational-wave search workflows in a post-hoc fashion using literature sources (i.e., published papers), rather than from the runtime point of view. The study mimics the effect of scientists in replicating the work done by others as a starting point for new research. Specifically, we study two gravitational-wave searches presented by the LIGO Scientific Collaboration and the Virgo Collaboration, and a third search presented by an independent group of scientists111We note that this independent group contained former LIGO Collaboration members. However, they only had access to publicly available data, code, and information in their analysis.. These are: (1) the O1 Binary Catalogue (Abbott et al., 2016a); (2) the O2 Binary Catalogue (LIGO Scientific Collaboration et al., 2018); and (3) the Open Gravitational Wave Catalogue (1-OGC) (Nitz et al., 2019a). The findings are reported in this paper and organized in terms of the six components of PRIMAD.
2. The PRIMAD Model
Large-scale scientific applications are composed of complex workflows, which in turn are composed of and execute a series of experimental, computational, and data manipulation steps in one or multiple scientific domains, with at least one computational element. Research teams executes such workflows to gain insights that (1) confirm or disprove a hypothesis or (2) discover novel behavior or phenomena.
The PRIMAD Model (Freire et al., 2016) is a powerful method to describe these workflows in terms of this 6-component space:
[TABLE]
The model integrates multiple degrees of abstraction and is moldable to fit the characteristics of different scientific studies across domains.
A platform can be a simple execution environment such as a virtual machine or container; a more sophisticated combination of high-end clusters for computing and data processing; or a combination of remote sensing devices plugged into systems for collecting data, processing the data in preparation for analysis, analyzing the data, and displaying the results possibly through data visualization.
A research objective includes a specification of acceptable result variability provided by the scientist. If the scientist requires bitwise identical results (e.g., for regulatory reasons or out of technical necessity as in some techniques from mathematical physics and experimental mathematics (Bailey et al., 2012)), then this specification must be incorporated into the research objectives. Alternatively, if the scientist requires looser tolerances (e.g., convergence to a certain energy threshold as in ensemble simulations of protein folding trajectories (Piana et al., 2014)), this specification should be expressed as part of the research objective.
The implementation can comprise a single executable or source-code (e.g., from benchmark applications such as the CORAL procurement benchmarks (Wu and Taylor, 2015), to production-grade applications such as ALE3D (McClelland et al., 2014)), or multiple software artifacts (e.g., solver packages such as HYPRE (Falgout and Yang, 2002)) that support one or more research objectives.
Methods are broadly defined by individual computational algorithms, heuristic techniques, step-by-step procedures, or combinations thereof.
Actors are those who perform the scientific study (e.g., scientists, users, technicians). They may have designed the overall workflow or individual parts of it, or simply use it.
Data refers to all the data that are ingested by the initial stage of a workflow and those data objects that intermediate stages of the workflow generate.
3. Theoretical Scenarios Exhibiting Reproducibility Concepts
Changes in the six components of the model generate different scenarios that exhibit different reproducibility behaviors (Claerbout and Karrenbach, 1992; Donoho, 2010). We envision six scenarios of interest (i.e., scenarios that are plausible for real-world scientific workflows). These scenarios are summarized in Table 1. Note that this table is not meant to be an exhaustive list.
The first scenario is one in which the six PRIMAD components are all the same. In other words, the workflow is replayed using the same data and the same computational (software and hardware) environment. The same team of actors is able to produce results that confirm or disprove the hypothesis previously tested with the same workflow. Actors can repeat a measure over and over again, within the expected precision or threshold of acceptable variability. Challenges associated to this scenario are: (1) the platform availability can be an issue when dealing with workflows running on, for example, leadership class computing clusters; (2) documentation deficiencies can be an issue when dealing with complex workflows, for example, those concerning multiscale modeling simulations; and (3) run-to-run variability can occur, even when all six factors of the PRIMAD model are held fixed, due to hardware characteristics, platform concurrency, nondeterministic application behavior, and other factors. Specifically, aspects of the chosen methods or implementation may be inherently nondeterministic, as in the case of dynamic multithreading (Utterback et al., 2017).
The second scenario is one whose actors change, (e.g., a different team in the same or a different institution) and while the platform may be different, such differences shall be limited (e.g., different operating system version, upgraded library). In other words, by using the same data and the same or similar computational (software and hardware) platform, a different team of actors is able to produce results that confirm or disprove the hypothesis previously tested with the same workflow. In addition to the challenges of Scenario 1, this scenario is vulnerable to improper documentation dissemination, for example, sharing documentation with incomplete or ambiguous descriptions. This scenario and its challenges match such efforts as the Student Cluster Competition (Garrett et al., 2018).
The third scenario explores the impact of varying, to different degrees, all PRIMAD components of the model except the research objectives and the data. By using the same data (e.g., the same initial conditions such as pressure, temperature, and volume in a molecular dynamic simulation) but allowing limited variability of implementations and platforms or both, along with published material and related artifacts, an independent team is able to produce result that confirm or disprove the same hypothesis presented in the published material. The challenges associated with this, and following scenarios, depend on the degree of variation in the individual components of the PRIMAD model that we allow.
The fourth scenario is a scenario that is less constrained than the previous one. It explores the impact of varying all PRIMAD components except the research objectives. This scenario incorporates the additional challenge of collecting new data according to the data generation description in one or multiple original publications. The study can be performed by using the same or similar implementations and platforms. In other words, minor variations in implementation and platform are acceptable in this scenario. Along with the published material and related artifacts, an independent team is able to produce results that confirm or disprove the same hypothesis.
The fifth scenario studies the ability of a team of actors to reproduce the research of another team using a different implementation (e.g., in molecular dynamics simulations using AMBER rather than CHARMM). They address the same research objectives, but the level of variability in workflow outcomes depends upon the changes in implementation.
The sixth scenario tackles the study of a scientific application for which the workflow has a substantial change of the platform and implementation. Specifically, this scenario deals with a major technological transformation (e.g., from multiprocessor to accelerated platforms). All components of the model space in which the workflow is executed, with the exception of the research objective, change sufficiently and thus these variations cannot be ignored.
4. Empirical Scenarios Exhibiting Reproducibility Concepts in the LIGO Project
Our post-hoc study of the PRIMAD applicability to gravitational-wave search workflows falls into Scenario V. When we use the term workflow, we refer to the analysis of a specific set of data using a specific search pipeline. In practice, this may involve running several identically-configured workflows on smaller data sets to produce a single search result. Specifically, we target the gravitational wave search in three literature sources: (1) the LIGO-Virgo O1 Binary Catalogue (Abbott et al., 2016a); (2) the LIGO-Virgo O2 Binary Catalogue GWTC-1 (LIGO Scientific Collaboration et al., 2018); and (3) the Open Gravitational Wave Catalogue (1-OGC) (Nitz et al., 2019a), produced independently of the LIGO-Virgo Collaborations. The two LIGO-Virgo Binary Catalogue papers (Abbott et al., 2016a; LIGO Scientific Collaboration et al., 2018) describe two matched filtering workflows built on two implementations or source codes (with PyCBC (Usman et al., 2016; Abbott et al., 2016b; Nitz et al., 2017a; Nitz et al., 2017b) and GstLAL (Abbott et al., 2016b; Messick et al., 2017; Mukherjee et al., 2018; Sachdev et al., 2019) data analysis suites, respectively). The LIGO-Virgo O1 Binary Catalogue paper (Abbott et al., 2016a) and its constituent workflows use the dataset from Observing Run 1 of the LIGO detectors (O1). The LIGO-Virgo O2 Binary Catalogue paper (LIGO Scientific Collaboration et al., 2018) and its constituent analyses use the dataset from Observing Run 2 of the LIGO detectors (O2) as well as data from the VIRGO detector. The 1-OGC paper (Nitz et al., 2019a, b) describes a run of the PyCBC matched filtering workflow on the public LIGO O1 data using an updated pipeline configuration compared to the analysis in the LIGO-Virgo O1 Binary Catalogue. The relationships between the workflows and their parent publications are summarized in the first two rows of Table 2 together with the six components of the PRIMAD model.
In the table, the artifacts are executed on proprietary or public platforms. We consider proprietary platforms to be those platforms that are only the actors of the workflow have access to. In contrast we consider platforms to be public when they are accessible to a broader set of potential actors (e.g., the Open Science Grid (OSG) (Pordes et al., 2007; Sfiligoi et al., 2009)).
The research objective of the workflows is to detect binary mergers using the data gathered by the LIGO and Virgo detectors. The workflows may also re-analyze previous data with improved methods and tuning to re-assess the significance of previous events, or discover previously undetected events; this is considered similar enough to the core research objective to be considered the same within the context of PRIMAD.
The descriptions of matched filtering workflows provide numerous details in terms of implementation choices (e.g., choices of signal-to-noise ratio (SNR) thresholds for detecting significant matches) and data management choices (e.g., choices for how detector instrument noise is filtered and mitigated). The abundance of details concerning implementation and data variations across the matched filtering studies are summarized in Table 3 and Table 4 respectively.
Descriptions of the search pipeline implementation are listed in the leftmost column of Table 3. These sub-components define the particular way that a workflow (e.g., the O1BC-PyCBC workflow) implements matched filtering (i.e., the method component). Specifically, we have highlighted the following implementation sub-components: the template bank used, the waveform model used to generate the template bank, the policy used to decompose the template bank into sub-banks, any additional noise reduction or filtering steps performed, the signal-to-noise ratio threshold for selecting candidate matches (i.e., events that could potentially be binary black holes), and the ranking statistic used rank candidate events.
For the template bank component, we abbreviate the banks used for each workflow. TWB-OGC refers to the template bank used in the 1-OGC search workflows, the properties of which are defined in (Brown et al., 2012; Owen, 1996; Owen and Sathyaprakash, 1999). TWB-O1BC refers to the common template bank used in both the O1BC-PyCBC and O1BC-GstLAL workflows, the properties of which are defined in (Taracchini et al., 2014; Pürrer, 2016; Capano et al., 2016). TWB-O2BC-PyCBC refers to the template bank used for the O2 Binary Catalogue PyCBC workflow, the properties of which are defined in (Dal Canton and Harry, 2017). TWB-O2BC-GstLAL refers to the template bank used for the O2 Binary Catalogue GstLAL workflow, the properties of which are defined in (Mukherjee et al., 2018).
We find that for some components of the workflow, details are predominantly delegated to one or more cited publications. In other cases, configuration of the workflow is only available by examining the workflow’s source code or the configuration files used in a specific analysis. We also find that some workflow configuration files are public (e.g. (Nitz et al., 2018; Nitz et al., 2019b)), but others require proprietary access. Even for configuration and software repositories that are public, it can be difficult to determine which versions were used for a particular analysis. Similarly for Table 4, we describe sub-components of the Data PRIMAD component in the leftmost column. The sub-components we describe include: the source of the raw input data (i.e., which observing runs the data comes from), the number of coincident days of data available (i.e., data from when both LIGO detectors were operating), the choice of metadata used to filter data prior to the matched filtering search, the number of days of observation data remaining after filtering for data quality, and the number of days of observation data remaining after filtering insufficiently long-lasting events (i.e., filtering based on event duration). These details are largely provided across the chosen workflows, unless those details are proprietary.
Across all of the workflows described in the LIGO papers used for our study, the high-level method that is implemented is matched filtering (Allen et al., 2012), that is, the procedure of matching a stream of data from the LIGO detectors against template waveforms derived from numerical general relativity. Matches against these templates may indicate the presence of astronomical events such as binary black hole mergers, and thus are evaluated for statistical significance.
5. Discussion and Conclusions
We conclude with a reflection on PRIMAD, beginning with general benefits and challenges of the model itself. Then we evaluate our post-hoc application of PRIMAD to the LIGO workflows, which informs best practices for future runtime applications of PRIMAD in the execution of scientific workflows.
PRIMAD is a general model to guide reproducibility. It helps meet an acute need in the scientific community to ground reproducibility, yet it is inherently abstract due to its applicability across all scientific domains, leading to challenges in establishing a useful level of specificity. When researchers want to share their findings, many research teams (or actors) make best efforts at including all relevant information to enable reproducibility. Without standardization, the decisions about what constitutes “relevant information” are inevitably ad-hoc, and may not be uniform from publication to publication or across multiple workflows within a single publication. Thus, PRIMAD offers a framework with which to build sustainable reproducibility in an uniform fashion across scientific domains.
Challenges arise in identifying how specific the components of PRIMAD need to be defined in order to guarantee both consistent applicability across workflows in a specific domain and desired levels of reproducibility. For example, the division between implementation and methods is disputable. Minor adjustments to an algorithm would generally fall into implementation, yet it is hard to determine when changes are substantial enough to call it a new algorithm and thus a change in methods. In other cases, the effects of the actors on reproducibility may be difficult to document. Even within the same research group with consistent leadership, research objectives, and computational environments, changes in team members and shifts in member responsibility can introduce unacknowledged sources of variability. It appears to be a difficult problem to appropriately document the knowledge and experience that is applied to the elements of a workflow. Finally, different datasets may play different roles within a workflow. As a partial remedy, we propose a differentiation between input data (including initial parameters), intermediary data (which depends heavily on methodology and implementation), and output artifacts (having direct relevance to the research objective). In order to meet these challenges, we recommend that each field of science develop its own domain-appropriate refinement to PRIMAD.
The LIGO workflows demonstrate scientific research performed with a clear desire for reproducibility. Therefore, even though the authors do not apply an explicit reproducibility framework, we were able perform a post-hoc evaluation of the workflows with the PRIMAD model. In particular, we find that out of the six PRIMAD components, the implementation and data components are given the most attention in terms of documented details, as summarized in Table 3 and Table 4. One PRIMAD component not described with the same level of details as the other components is the platform used in the workflows. This may be due to multiple factors such as: (1) use of proprietary platforms; (2) prioritization of implementation or data management details over platform details; and (3) the technical challenge of tracking all the platform components on which a workflow manager deploys jobs. Uncertainty in the reason for incomplete information is one of a few limitations we found in post-hoc application of PRIMAD. Non-uniform descriptions of PRIMAD components and the necessity of citation chasing to fill in details also arise, though are mitigated by documentation efforts of actors within a study and overlap of actors between studies.
Challenges listed above can be mitigated by moving from a post-hoc to a runtime application of PRIMAD from the early stages of scientific experimentation. The PRIMAD components should be periodically referred to as a checklist for assessing the reproducibility of workflows, both as they are developed and during the transition from experimentation to publication. Moreover, rather than hampering scientific progress by imposing requirements inconsistent with a particular scientific workflow, we demonstrated with the various scenarios in Section 4 how such a framework can be a guide to ensure the desired level of reproducibility.
A wide adoption of PRIMAD would be most achievable through involvement of all contributors to the scientific process. That is, tools that researchers use to do computational experiments must: (1) be aware of the PRIMAD components; and (2) provide users with access to details about their experiments, organized in terms of the PRIMAD components, so that those details can be disseminated in publications. In particular, we look at the role of workflow managers in the computational components of a scientific workflow. In the case of the PyCBC matched filtering pipelines, the actors used Pegasus (Deelman et al., 2005; Brown et al., 2007) to distribute constituent jobs to participating computing centers (Weitzel et al., 2017). In the long run, workflow managers such as Pegasus can play a key role in tracking, for example, which computational platforms their jobs run on so that future research teams can attempt to reproduce results on similar platforms, and therefore automatically, deploying the PRIMAD model at runtime.
As part of our study of gravitational-wave search pipelines, both from the LIGO collaboration and from others, we have identified the various implementation and data components that need to be identified in order for an outside actor to reproduce the results of these pipelines. We feel this axis of classification can serve as a foundation for describing all (or a large number of) gravitational-wave search pipelines and lowering the barriers for reproducing the results outside of the LIGO collaboration.
Acknowledgements.
The authors thank Almadena Chtchelkanova and the U.S. National Science Foundation for sponsoring the July 25, 2017 workshop that provided the foundation on the PRIMAD model. The authors also thank the workshop attendees: Michael A. Heroux, Sandia National Laboratories; Lorena A. Barba, Ronald Boisvert, National Institute of Standards and Technology; Bruce Childers, University of Pittsburgh; Juliana Freire, New York University; Carlos Maltzahn, University of California, Santa Cruz; Wilf Pinfold, Urban.Systems; Jeff Spies, Center for Open Science. The work was supported by NSF awards OAC-1841399, OAC-1823405, OAC-1823378, and CCF-1841552.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Aasi et al . (2015) J. Aasi et al . 2015. Advanced LIGO. Class. Quant. Grav. 32 (2015), 074001. https://doi.org/10.1088/0264-9381/32/7/074001 · doi ↗
- 3Abbott et al . (2016 b) B. P. Abbott et al . 2016 b. GW 150914: First results from the search for binary black hole coalescence with Advanced LIGO. Phys. Rev. D 93, 12 (2016), 122003. https://doi.org/10.1103/Phys Rev D.93.122003 · doi ↗
- 4Abbott et al . (2016 c) B. P. Abbott et al . 2016 c. GW 151226: Observation of Gravitational Waves from a 22-Solar-Mass Binary Black Hole Coalescence. Phys. Rev. Lett. 116, 24 (2016), 241103. https://doi.org/10.1103/Phys Rev Lett.116.241103 · doi ↗
- 5Abbott et al . (2016 d) B. P. Abbott et al . 2016 d. Observation of Gravitational Waves from a Binary Black Hole Merger. Phys. Rev. Lett. 116, 6 (2016), 061102. https://doi.org/10.1103/Phys Rev Lett.116.061102 · doi ↗
- 6Abbott et al . (2017 a) B. P. Abbott et al . 2017 a. GW 170104: Observation of a 50-Solar-Mass Binary Black Hole Coalescence at Redshift 0.2. Phys. Rev. Lett. 118, 22 (2017), 221101. https://doi.org/10.1103/Phys Rev Lett.118.221101 · doi ↗
- 7Abbott et al . (2017 b) B. P. Abbott et al . 2017 b. GW 170608: Observation of a 19-solar-mass Binary Black Hole Coalescence. Astrophys. J. 851, 2 (2017), L 35. https://doi.org/10.3847/2041-8213/aa 9f 0c · doi ↗
- 8Abbott et al . (2017 c) B. P. Abbott et al . 2017 c. GW 170814: A Three-Detector Observation of Gravitational Waves from a Binary Black Hole Coalescence. Phys. Rev. Lett. 119, 14 (2017), 141101. https://doi.org/10.1103/Phys Rev Lett.119.141101 · doi ↗
