A multifactorial evaluation framework for gene regulatory network   reconstruction

Laurent Mombaerts; Atte Aalto; Johan Markdahl; Jorge Goncalves

arXiv:1906.12243·q-bio.MN·July 1, 2019

A multifactorial evaluation framework for gene regulatory network reconstruction

Laurent Mombaerts, Atte Aalto, Johan Markdahl, Jorge Goncalves

PDF

TL;DR

This study evaluates gene regulatory network inference methods using a new multifactorial framework, revealing how data quantity and perturbations differently impact algorithm performance in rhythmic and non-rhythmic systems.

Contribution

Introduces a multifactorial evaluation framework for gene network inference, analyzing the effects of data size and perturbations on algorithm performance.

Findings

01

Long time-series improve rhythmic system inference.

02

More perturbations benefit non-rhythmic system inference.

03

Algorithms vary in data sensitivity and effectiveness.

Abstract

In the past years, many computational methods have been developed to infer the structure of gene regulatory networks from time-series data. However, the applicability and accuracy presumptions of such algorithms remain unclear due to experimental heterogeneity. This paper assesses the performance of recent and successful network inference strategies under a novel, multifactorial evaluation framework in order to highlight pragmatic tradeoffs in experimental design. The effects of data quantity and systems perturbations are addressed, thereby formulating guidelines for efficient resource management. Realistic data were generated from six widely used benchmark models of rhythmic and non-rhythmic gene regulatory systems with random perturbations mimicking the effect of gene knock-out or chemical treatments. Then, time-series data of increasing lengths were provided to five…

Tables2

Table 1. Table 1: Properties of the different methods.

¹If higher-order dynamics are used.
Method	Nonlinear dynamics	Continuous-time	Combinatorial effects	Hidden nodes	Computation time (s)	Performance Millar/DREAM
All-to-all		$✓$		${(✓)}^{1}$	49.4	2/5
GPDM	$✓$	$✓$	$✓$		333.4	1/1
dynGENIE3	$✓$		$✓$		0.7	4/2
ARNI	$✓$		${(✓)}^{2}$		1.0	3/3
iCheMA	$✓$		${(✓)}^{2}$		1999	5/4
²Discussed in the article, but not in the implementation.

Table 2. Table 2: Left: Numbers of measurements in the Millar 10 experiments. Sampling rate is always 4h. Right: Numbers of measurements in the DREAM experiments. Window length is always 20.

Window (h)	24	36	48	60	72
WT	7	10	13	16	19
WT + 1 Mut	14	20	26	32	38
WT + 2 Mut	21	30	39	48	57
WT + 3 Mut	28	40	52	64	76

Equations8

\frac{dy_{i}(t)}{dt}=f(\mbox{\boldmath$\pi$}_{i}(t))-\alpha_{i}y_{i}(t)

\frac{dy_{i}(t)}{dt}=f(\mbox{\boldmath$\pi$}_{i}(t))-\alpha_{i}y_{i}(t)

fit = 1 - \frac{\sum _{t \in T^{1}} ( y _{1} ( t ) - y ^ _{1} ( t ∣ θ ) ) ^{2} + ... + \sum _{t \in T^{n}} ( y _{n} ( t ) - y ^ _{n} ( t ∣ θ ) ) ^{2}}{\sum _{t \in T^{1}} ( y _{1} ( t ) - y ˉ _{1} ( t ) ) ^{2} + ... + \sum _{t \in T^{n}} ( y _{n} ( t ) - y ˉ _{n} ( t ) ) ^{2}}

fit = 1 - \frac{\sum _{t \in T^{1}} ( y _{1} ( t ) - y ^ _{1} ( t ∣ θ ) ) ^{2} + ... + \sum _{t \in T^{n}} ( y _{n} ( t ) - y ^ _{n} ( t ∣ θ ) ) ^{2}}{\sum _{t \in T^{1}} ( y _{1} ( t ) - y ˉ _{1} ( t ) ) ^{2} + ... + \sum _{t \in T^{n}} ( y _{n} ( t ) - y ˉ _{n} ( t ) ) ^{2}}

{y_{j} = x (t_{j}) + v_{j} d x = g (x, θ) d t + d u

{y_{j} = x (t_{j}) + v_{j} d x = g (x, θ) d t + d u

\frac{d y _{i} ( t )}{d t} = u \in π_{i} \sum v_{u, i} \frac{I _{u, i} y _{u} ( t ) + ( 1 - I _{u, i} ) k _{u, i}}{y _{u} ( t ) + k _{u, i}} - α_{i} y_{i} (t)

\frac{d y _{i} ( t )}{d t} = u \in π_{i} \sum v_{u, i} \frac{I _{u, i} y _{u} ( t ) + ( 1 - I _{u, i} ) k _{u, i}}{y _{u} ( t ) + k _{u, i}} - α_{i} y_{i} (t)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A multifactorial evaluation framework for gene regulatory network reconstruction

Laurent Mombaerts

Atte Aalto

Johan Markdahl

Jorge Gonçalves

Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 avenue du Swing, 4367 Belvaux, Luxembourg (e-mail: [email protected], [email protected], [email protected], [email protected]).

Abstract

In the past years, many computational methods have been developed to infer the structure of gene regulatory networks from time-series data. However, the applicability and accuracy presumptions of such algorithms remain unclear due to experimental heterogeneity. This paper assesses the performance of recent and successful network inference strategies under a novel, multifactorial evaluation framework in order to highlight pragmatic tradeoffs in experimental design. The effects of data quantity and systems perturbations are addressed, thereby formulating guidelines for efficient resource management.

Realistic data were generated from six widely used benchmark models of rhythmic and non-rhythmic gene regulatory systems with random perturbations mimicking the effect of gene knock-out or chemical treatments. Then, time-series data of increasing lengths were provided to five state-of-the-art network inference algorithms representing distinctive mathematical paradigms. The performances of such network reconstruction methodologies are uncovered under various experimental conditions. We report that the algorithms do not benefit equally from data increments. Furthermore, for rhythmic systems, it is more profitable for network inference strategies to be run on long time-series rather than short time-series with multiple perturbations. By contrast, for the non-rhythmic systems, increasing the number of perturbation experiments yielded better results than increasing the sampling frequency. We expect that future benchmark and algorithm design would integrate such multifactorial considerations to promote their widespread and conscientious usage.

keywords:

Network Inference; Modelling; Dynamics and Control; Systems medicine

††thanks: AA and JM are supported by University of Luxembourg internal researh projects PPPD (both) and OptBioSys (AA).

1 Introduction

Genes in living organisms do not function in isolation, but may activate or suppress the activity of other genes, forming intricate networks of regulatory interactions. The investigation of the struture of gene regulatory networks (GRN), which consists in unveiling the set of biochemical interactions that control gene expression, is a fundamental step in the understanding of disease mechanisms and the development of future drugs and therapies. Recovering the topology of such networks from gene expression profiles remains, however, a crucial challenge in systems biology due to their vast complexity, intrinsic stochasticity, and limited observability.

In the last few years, many computational methods based on a variety of mathematical paradigms have been developed to reconstruct GRNs from time-series data, and their performances assessed on relevant mathematical models (Casadiego et al., 2017; Aderhold et al., 2017; Marbach et al., 2012). However, experimentalists are still faced with difficult questions: Which method to use with an available dataset? On the other hand, with fixed amount of resources, what kind of experiments to carry out to ensure optimal use of resources? How much can be gained by investing into a few more measurements?

The purpose of this study is to provide guidelines for conscientious management of biological resources by unveiling the performances of state-of-the-art network inference strategies under various experimental designs. To this end, the effects of data quantity and multi-experiment availability are assessed simultaneously on the accuracy of the topological reconstruction.

Navigating experimental tradeoffs for GRN inference is not an entirely new concern, although literature on the topic is surprisingly scarce (Haque et al., 2019). Such a multifactorial approach has never been undertaken for systematic evaluation of network reconstruction algorithms. Sefer et al. (2016) studied the tradeoffs between dense and replicate sampling strategies. Their results showed that, under reasonable noise assumptions, gene expression profiles reconstructed from dense sampling are more accurate than those resulting from replicate sampling. Geier et al. (2007) showed that at equivalent data size, short-time, gene knock-out experiments contain more information about the GRN structure than single experiment, longer recordings of non-rhythmic systems. The GRN inference algorithms used in their study, however, are no longer state-of-the-art. Markdahl et al. (2017) studied the cell cycle of Saccharomyces cerevisiae as a case-study to analyze the effect of temporal resolution on the quality of the inferred network. The performance as a function of time-series length resulting from a LASSO methodology (Tibshirani, 1996) resembled a sigmoid shape with a plateauing effect at the end. Mombaerts et al. (2016) used a model of the circadian network of Arabidopsis thaliana, a rhythmic system with period of about 24 hours, to show that sampling from transient dynamics provides richer information for the identification of the underlying GRN topology. Muldoon et al. (2019) identified previously unrecognized factors that affect inference outcomes, such as stimulus-specific experimental design and network motifs in the vicinity of a stimulus. Following the DREAM3 competition, Marbach et al. (2010) investigated strengths and weaknesses of algorithms in recognizing types of motifs that appear in gene networks. Finally, it has also been shown that, for mutual-information based techniques, the accuracy reaches a saturation point after a specific data size (Chaitankar et al., 2010). Algorithms based on correlation or mutual information are, however, excluded from this research as they cannot detect causality between genes (Marbach et al., 2012).

Realistic time-series datasets were generated from one rhythmic and five non-rhythmic models of gene regulatory networks that have been widely used as benchmarks in the literature (Aderhold et al., 2017; Prill et al., 2010). External interventions, i.e., gene deletion (knock-out) and chemical treatments, are explicitly simulated to provide a comprehensive picture of the performance of each algorithm under a range of experimental conditions. Increasingly rich multi-experiment time series datasets are fed to five network inference algorithms. Performance is assessed using standard techniques for classification algorithms, studying the area under the receiver operating characteristics (ROC) and the precision-recall (PR) curves. We expect such pragmatic considerations to contribute to the development of conscientious experimental designs and appropriate mathematical benchmarks.

2 Methods

2.1 Data generation

The use of in silico networks is preferable to random graphs, as they account for realistic structural properties of biological networks (Schaffter et al., 2011). The oscillating gene regulatory system used as a benchmark in this paper is a model of circadian regulation of Arabidopsis thaliana, hereafter referred to as Millar 10 (Figure 1A) (Pokhilko et al., 2010). Conceptually, it is composed of 2 interconnected feedback loops and an input pathway that incorporates external light cues in order to synchronize the plant to the surrounding light conditions. When subjected to another external regime, e.g., constant light or constant darkness, the system displays transient dynamics and reaches a new limit cycle. It is expected that measurements of the transient period contain more information than measurements of a limit cycle. This hypothesis was confirmed by Mombaerts et al. (2016). To complement these results, the performances of the network inference strategies are first analyzed under such transient dynamics. For this purpose, the model has been simulated for 240 hours in light/dark cycles to remove initial system transients and then switched to constant light regime. Time windows of 48 hours are then extracted from the light/dark limit cycle, at the transition to constant light and 48 hours after transition to constant light for further comparison, emulating the computational framework of Mombaerts et al. (2016). In the following analysis, time windows starting from the transition to constant light and up to 3 days of observations (24-36-48-60-72 hours) are considered.

The simulations are based on Langevin equations (stochastic differential equations) to model the intrinsic stochasticity in the dynamics of gene regulatory networks (accounting for molecular noise in both transcription and translation processes) (Gillespie, 2000). The intrinsic noise is expected to have a significant impact on the behavior of the system (Guerriero et al., 2011). Then, data are downsampled to resemble realistic experimental design (every 4 hours in the case of circadian experiments). Finally, the protein concentrations are not made available in the provided datasets (only mRNA concentrations levels).

The Millar 10 model has been simulated to reproduce gene knock-out experiments. Knock-out experiments are very informative, more than knock-down experiments, as they provide network response to individual and large perturbations (genes are deleted) (Marbach et al., 2012). Knock-out experiments were simulated as in Aderhold et al. (2014), by replacing the transcription rates of the targeted genes by random noise drawn from a truncated normal distribution to ensure non-negativity of the concentrations. Genes that have been knocked out are, thefore, not influenced by their structural regulators anymore. The datasets in the numerical experiments consist of a wildtype (WT) time series, and up to three randomly chosen knock-out time series at a time. This selection has been randomized 6 times to account for the uneven informative potential of different genes in the network. Furthermore, such simulations being stochastic, each experiment was replicated 10 times to provide a representative view of the performance of each computational method. In total, $950=10\cdot 5+10\cdot 6\cdot 5\cdot 3$ simulations were performed.

The studied non-rhythmic models (Figure 1 B–F) originate from the DREAM4 in silico network inference challenge (Marbach et al., 2009, 2010; Prill et al., 2010). New time series data were generated by the GeneNetWeaver software (Schaffter et al., 2011) which simulates the system’s response to a perturbation of about a third of its nodes, followed by a relaxation back to steady state after half of the recording, when the perturbation was removed. The data characteristics are as in the challenge, with the exception that the perturbation targets were randomized, whereas in the challenge data they were preferentially carried out to cover the whole network (Marbach et al., 2012). Such data simulation offers a realistic representation of the undetermined effects of a chemical treatment on a system at rest (steady-state). In essence, the modeling specificities are similar to those of the Millar 10 model in the sense that both transcription and translation are modeled, only the mRNA concentration levels are made available and the simulations are based on Langevin stochastic equations. The resulting data are then resampled using 3 different sampling rates (11-21-41 datapoints) to further assess the effect of changes in experimental design. Here, 10 chemical perturbations were simulated for each of the 5 available networks and replicated 3 times. Then, increasing amount of perturbation time series (up to 4 at a time) are randomly selected from those 10 generated perturbations, and provided to the inference algorithms. This random selection is performed 5 times. The perturbation targets are not known to the methods. In total, $900=3\cdot 5\cdot 3\cdot 4\cdot 5$ simulations were performed.

2.2 Network inference methods

Formally, the reconstruction of the structure of gene regulatory networks corresponds to the identification of the regulators $\mbox{\boldmath$ \pi $}_{i}$ for all given genes $i\in\{1,..,n\}$ (where $n$ is the amount of genes in the system), so that

[TABLE]

where $y_{i}(t)$ is the mRNA concentration of gene $i$ at time $t$ and the term $\alpha_{i}y_{i}(t)$ corresponds to the degradation rate of $y_{i}(t)$ . The nonlinear function $f$ represents the influence of the transcription factors of the parents on the target genes. In practice, however, the time course data of the regulators $\mbox{\boldmath$ \pi $}_{i}$ consist of gene expression levels as the protein levels are typically not available to us. Most methods then consider a simplified model (1) where gene expression values of the parent genes are used as proxies to transcription factors, and then $\mbox{\boldmath$ \pi $}_{i}(t)\subset\{y_{1}(t),...,y_{n}(t)\}$ . The concatenation of the structural regulators $\mbox{\boldmath$ \pi $}_{i}$ of every gene, then, forms the network underlying gene regulation, which is mostly sparse. The problem is typically undetermined, since the number of available time points is small compared to the number of possible solutions and since not all species are observable. Moreover, regulatory interactions can either be additive or multiplicative, while proteins might undergo post-translational modifications, adding another layer of complexity. The difficulty for the development of inference algorithms is to decide upon the complexity of the strategy while being aware of the inherent overfitting danger. We aim to highlight the advantages and optimal applicability ranges for current state-of-the-art methodologies by estimating their performances on various experimental scenarios. While the results presented are by no means exhaustive in terms of strategies or experimental design scenarios, they support the efficient use of biological resources.

The network inference methods included in the comparison represent the function $f$ at various levels of complexity. They are All-to-all (ATA) (Mombaerts et al., 2019), Gaussian Process Dynamical Models (GPDM) (Aalto et al., 2018), dynamical GEne Network Inference with Ensemble of trees (dynGENIE3) (Huynh-Thu and Geurts, 2018), Algorithm for Revealing Network Interactions (ARNI) (Casadiego et al., 2017) and Improved Chemical Model Averaging (iCheMA) (Aderhold et al., 2017). For details, we refer to publications presenting the methods.

The All-to-all method is a parametric method based on fitting a linear model (of order one, by default, but higher order models are also possible) i.e., $f$ is linear in (1), between each pair of genes, one pair at a time. A fitness score is computed for each pair, which is regarded as a confidence level on the existence of the corresponding regulation. Here, the methodology presented by Mombaerts et al. (2019) has been extended to deal with multi-experiment datasets. The datasets are merged together so that the dynamics to be identified are identical through all experimental conditions, assuming that the signal-to-noise ratios are similar in different experiments. The fitness score over multiple experiments is

[TABLE]

where $n$ represents the amount of experimental conditions, $T$ the sampling times of experiments, $y$ the gene expression level, $\hat{y}$ the modeled gene expression, and $\bar{y}$ is the average value of the gene expression level.

GPDM, a non-parametric method, models gene expression as a nonlinear stochastic differential equation

[TABLE]

where the dynamics function $g$ is modeled as a Gaussian process with some hyperparameters $\theta$ , $v_{j}$ ’s correspond to measurement errors, and $u$ is a Brownian motion. This defines gene expression as a stochastic process whose realizations can be sampled using a Markov Chain Monte-Carlo (MCMC) strategy. Network inference is based on estimating the hyperparameters of the covariance function of the GP. Multi-experiments are taken into account by assuming that all time series are produced by the same dynamics function $g$ . Independent samplers are then constructed for trajectories $x$ corresponding to different experiments. Performance of GPDM was recently compared to the best performers of the DREAM4 challenge and consistently shown superior in dealing with short time series data (Aalto et al., 2018).

ARNI is a recently developed non-parametric method used for the estimation of network topologies that performed well in network inference from a large collection of short time series. The derivatives are estimated explicitly through a difference approximation and the relationships between nodes in the network are estimated by solving a nonlinear regression problem, with a user-selected library of nonlinear basis functions, using a greedy approach (Casadiego et al., 2017). In the experiments, we used polynomial basis functions of degree at most 3.

The semi-parametric method dynGENIE3 is an adaptation of the GENIE3 method for time series data. GENIE3 was the best performer in the DREAM4 Multifactorial Network Inference challenge and the DREAM5 Network Inference Challenge. The transcription function $f$ in (1) is represented by an ensemble of regression trees which is estimated from the gene expression data and their derivatives, estimated using a difference approximation (Huynh-Thu and Geurts, 2018).

Alternatively, iCheMA estimates derivatives from the data by fitting a smooth Gaussian process to the time-series. Then, gene expression profiles are modeled using the Michaelis-Menten formula for mass-action kinetics:

[TABLE]

where $k_{u,i}$ corresponds to the Michaelis-Menten parameters and $I_{u,i}$ indicates whether the regulation is an activation ( $I_{u,i}=1$ ) or an inhibition ( $I_{u,i}=0$ ). Network inference is then based on estimating the parameters using an MCMC approach. iCheMA goes exhaustively through all possible combinations of regulators (typically, up to 3 at a time), which makes it a computationally heavy algorithm that does not scale easily to large systems.

Table 1 summarizes the properties of the methods included in the comparison. A method is deemed a continuous-time method if it is based on continuous trajectory-fitting, or modeling from a continuous-time system. Methods estimating derivatives from the data and then solving input-output regression are deemed discrete-time methods. Combinatorial effects mean dynamics of the form $\dot{y}_{i}=f(y_{j},y_{k})$ where $f(y_{j},y_{k})$ cannot be represented as a sum $f(y_{j},y_{k})=g(y_{j})+h(y_{k})$ . The table shows whether the methods explicitly take into account combinatorial effects. The computational time is based on 48 hours recordings (13 datapoints) of the Millar 10 model. Performance ranking is based on average AUPREC values.

3 Results

The performances of each algorithm are here assessed in terms of the resulting Area Under the Receiver Operating Characteristic (AUROC) and the Area Under the Precision-Recall (AUPREC). Precision-Recall curves allow for a more accurate picture of algorithms performances for sparse GRNs and is commonly used for the comparison of inference algorithms. Auto-regulatory interactions are not considered.

Decomposing the time-series resulting from the rhythmic model into synchronized-desynchronized states showed that, on average, the accuracy of the network reconstruction is improved by considering transient dynamics (Figure 2). While GPDM, ATA, and dynGENIE3 benefit —to a varying degree— from the transition to the desynchronized state, change in performance of iCheMA was not statistically significant and ARNI’s performance was slightly impaired. It should be noted that a significant increase in accuracy is observed for the strategies that do not explicitly estimate derivatives.

Figures 3 and 4 display the performance of each algorithm resulting respectively from the simulations with data from the Millar 10 model and the steady-state systems under several combinations of data types.

On one hand, these graphs show that GPDM outperforms the other approaches for almost every system and experimental configuration considered. It is outperformed by ATA in only one case with 24h wildype only data from the Millar 10 model. This illustrates the importance of various experimental scenarios in benchmarking network inference strategies and motivates the work undertaken in this paper. Interestingly, the simple pairwise low order linear modeling (ATA) seems to outperform dynGENIE3, iCheMA and ARNI in terms of AUPREC for every observation length and system perturbation considered in the rhythmic model. Only the AUROC values of the non-parametric approach ARNI exceed those of ATA for the 3 mutations case, starting from 36 hours of observation.

It is further interesting to notice that not all algorithms benefit equally from data increments. The gain of accuracy resulting from increasing the amount of data in the rhythmic model is only mild for the linear modeling strategy and iCheMA while it is significant for GPDM, dynGENIE3, and ARNI. In this regard, in average, GPDM benefits from the largest increase in accuracy whereas dynGENIE3 and ARNI compete at a slightly lower level for the experimental conditions presented. A saturation effect, however, can be observed at AUPREC values of around 0.8 for GPDM, 0.63 for the ATA, and 0.58 for ARNI.

The analysis of the DREAM competition models delivers a different view on network reconstruction as not all nodes are stimulated in a given system perturbation. For those networks, the benefit of additional system perturbations is considerable as they allow investigation of novel, previously unstimulated segments of the network. In the experimental design cases presented, none of the algorithms seemed to approach a saturation point for the data combinations considered. While the GPDM succeeds in providing the best accuracy for the DREAM networks as well, dynGENIE3 ranks second, ARNI third, iCheMA fourth and ATA last. The reason why the linear modeling strategy is surpassing dynGENIE3 and ARNI for the 1 perturbation only case and does not improve for additional datasets is likely related to the partially stimulated nature of the whole dynamical system and has yet to be investigated.

On the other hand, Figures 3 and 4 allow for proper visualization of experimental tradeoffs. Doubling the amount of datapoints by performing another experiment does not provide the same level of information than doubling the amount of datapoints in a given experimental setup. Table 2 summarizes the amount of datapoints in each of the presented experimental setups. While the cost of each datapoint might not be equivalent whether it originates from a novel system perturbation or from longer recording, these tables provide insight on how to choose an appropriate experimental scenario regarding the performances of each of the algorithms presented in Figures 3 and 4. For instance, sequencing a gene in WT every 4 hours during 72 hours requires a similar amount of datapoints as the WT with 2 mutations for 24 hours or the WT with 1 mutation for 36 hours. In this case, the experimental design that provides the best results would be a single recording of 72 hours resulting in an AUPREC of 0.84 using GPDM, compared to 0.75 or 0.7. Regardless of the algorithm and assuming an equivalent cost per datapoint, it can be observed that, as a general rule of thumb and for top performing strategies, it is often preferable to observe the rhythmic system for a longer amount of time. By contrast, increasing the sampling frequency of the steady-state systems only resulted in a marginal improvement in the accuracy of network reconstruction. Surely, a lower bound on the sampling rate is required for a reliable construction of those systems but it was not reached in this analysis.

4 Discussion

Choosing between different experimental designs and network inference strategies depends on the research question and the resource constraints. For this purpose, performing a complete cycle study involving multiple inference strategies and specific benchmarks, such as in (Madhamshettiwar et al., 2012), is not uncommon. However, while such analysis provides a comprehensive idea of the relevance of the inferred network topology, it represents a significant investment of both time and money, and is sometimes not even possible.

In this paper, the general effects of data quantity and system perturbations on the accuracy of the GRN reconstruction were evaluated for one rhythmic model of gene regulation and five steady-state models. Our contribution is threefold. We showed the relevance of multifactorial benchmarks to assess the performances of network inference strategies, the importance of an appropriate choice of model complexity given data availability, and revealed pragmatic considerations for experimental designs. Depending on the cost of performing more experiments or increasing the amount of datapoints, one of those choices should be preferred.

The algorithms considered here showed consistent performances across the 6 investigated networks. All network inference strategies did not, however, benefit equally from the increasing amount of data. Nevertheless, the fact that the parameter free, Gaussian process strategy GPDM has been consistently outperforming all strategies presented is noticeable and of further interest. In addition, by looking at the data expense and the resulting reconstruction accuracy, it should be further noted that GRN inference algorithms should improve the way various time series experiments originating from the same biological system are taken into account.

Marbach et al. (2012) showed that, on average, a combination of network inference strategies leads to the best network reconstruction. As such, we noticed that the order in which the links were inferred by each algorithm, and experiment, was different. Further research, therefore, should learn the ranks, or confidence levels, of each link in the network reconstruction process and design a proper combination of the algorithms that optimizes their synergy, depending on the experimental conditions.

Multifactorial studies such as the one presented here require a considerable amount of simulations. Some algorithms, such as those involving MCMC sampling, took several days to run on a 24 core workstation. As such, a complete analysis of the experimental design space is not possible but other decisional factors exist and require further inspection. Among those, Sefer et al. (2016) showed that denser sampling is preferable to additional replicates. Such strategy could be particularly profitable for transient data and to algorithms that explicitly estimate derivatives. Muldoon et al. (2019) used time series data originating from the DREAM 4 challenge to show that using only half of the perturbation data (without the recovery to steady-state) might be beneficial to some algorithms. Furthermore, some methods are able to incorporate information on external inputs, such as perturbations (with the targets still unknown), which increases the average performance. In addition, in practice, gene regulatory networks are often of bigger dimension which is not always accessible to the most computationally expensive algorithms.

Finally, our study did not take into account prior knowledge of the system, which could potentially be iteratively integrated into each step of the network reconstruction. For example, a strategy such as the one presented by Ud-Dean and Gunawan (2015) actively optimizes the precision of the predictions by proposing the next most informative knock out. In such case, the aforementioned results would likely understimate the resulting accuracy of the reconstruction. In fact, doubling the amount of data points by doubling the observation time or by performing an additional experiment not only provides different levels of information, but can reveal different parts of the network as well. Such strategy might be necessary to cope with the most isolated genes.

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Aalto et al. (2018) Aalto, A., Viitasaari, L., Ilmonen, P., and Gonçalves, J. (2018). Continuous time Gaussian process dynamical models in gene regulatory network inference. Ar Xiv:1808.08161 .
2Aderhold et al. (2017) Aderhold, A., Husmeier, D., and Grzegorczyk, M. (2017). Approximate Bayesian inference in semi-mechanistic models. Statistics and Computing , 27(4), 1003–1040.
3Aderhold et al. (2014) Aderhold, A., Husmeier, D., and Grzegorczyk, M. (2014). Statistical inference of regulatory networks for circadian regulation. Statistical Applications in Genetics and Molecular Biology , 13(3), 227–273. 10.1515/sagmb-2013-0051 . · doi ↗
4Casadiego et al. (2017) Casadiego, J., Nitzan, M., Hallerberg, S., and Timme, M. (2017). Model-free inference of direct network interactions from nonlinear collective dynamics. Nature Communications , 8(1), 2192.
5Chaitankar et al. (2010) Chaitankar, V., Ghosh, P., Perkins, E.J., Gong, P., and Zhang, C. (2010). Time lagged information theoretic approaches to the reverse engineering of gene regulatory networks. BMC Bioinformatics , 11(6), S 19.
6Geier et al. (2007) Geier, F., Timmer, J., and Fleck, C. (2007). Reconstructing gene-regulatory networks from time series, knock-out data, and prior knowledge. BMC systems biology , 1(1), 11.
7Gillespie (2000) Gillespie, D. (2000). The chemical Langevin equation. The Journal of Chemical Physics , 113(1), 297–306.
8Guerriero et al. (2011) Guerriero, M., Pokhilko, A., Fernández, A., Halliday, K., Millar, A., and Hillston, J. (2011). Stochastic properties of the plant circadian clock. Journal of the Royal Society Interface , 9(69), 744–756.