Gene Regulatory Network Inference from Pseudotime-Ordered scRNA-seq Data via Time-Lagged Divergence Measures

Lingling Zhang; Tong Si; Lucas Koch; Haijun Gong

PMC · DOI:10.1145/3774976.3774995·January 2, 2026

Gene Regulatory Network Inference from Pseudotime-Ordered scRNA-seq Data via Time-Lagged Divergence Measures

Lingling Zhang, Tong Si, Lucas Koch, Haijun Gong

PDF

Open Access

TL;DR

This paper introduces PseudoGRN, a new method for building gene regulatory networks from single-cell RNA sequencing data that tracks changes over time.

Contribution

PseudoGRN combines pseudotime inference, time-lagged divergence measures, and penalized network inference to improve GRN reconstruction.

Findings

01

PseudoGRN outperforms existing methods in reconstructing gene regulatory networks from scRNA-seq data.

02

The method provides robust and interpretable results for dynamic regulatory mechanisms in single-cell systems.

Abstract

Inferring cell type-specific gene regulatory networks (GRNs) from time-series single-cell RNA sequencing (scRNA-seq) data is challenging due to sparse temporal resolution, high dimensionality, and inherent cellular heterogeneity. We present a novel integrative framework, called PseudoGRN, that unifies multiple pseudotime inference methods, different time-lagged divergence measures, non-redundant penalized network inference, and partial correlation analysis to reconstruct directed GRNs from time-series scRNA-seq data. Applying our method to the real-world scRNA-seq dataset, we demonstrate its superior performance over existing approaches, offering a robust and interpretable tool for uncovering dynamic regulatory mechanisms in single-cell systems.

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Cell lines1

THP-1— Homo sapiens (Human) · Childhood acute monocytic leukemia · Cancer cell line

Diseases1

myeloid leukemia

Figures2

Click any figure to enlarge with its caption.

Keywords

Applied Computing → BioinformaticsGene Regulatory NetworkscRNA-seq Dataf-DivergencePseudotime AnalysisIntegral Probability MetricPartial Correlation

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSingle-cell and spatial transcriptomics · Gene Regulatory Network Analysis · Neural Networks and Reservoir Computing

Full text

Introduction

1

Single-cell RNA sequencing enables gene expression profiling at single-cell level, allowing researchers to identify distinct cell states, uncover differentiation trajectories, and investigate dynamic cellular processes [9, 13]. Analysis of scRNA-seq data has greatly deepened our understanding of complex biological processes and opened new avenues for the development of personalized medicine [27]. In particular, time-series scRNA-seq data further provides temporal information that can be used to reconstruct cell type-specific gene regulatory networks (GRNs) and investigate dynamic transcriptional regulation [5]. A variety of machine learning methods have been developed for GRN inference from omics data, including scRNA-seq, including Boolean networks [11], differential equation models [14], correlation networks [4], dynamic Bayesian networks [1, 18], regression-based frameworks [16, 28], and deep learning approaches [22, 24]. Most of these methods are based on canonical time, which reflects the actual chronological progression of molecular events. A major challenge in time-series scRNA-seq data analysis is the limited number of discrete time points, which reduces the temporal resolution and complicates the GRNs inference.

To address the challenges of network inference from time-series scRNA-seq data, particularly the sparse temporal sampling, and intrinsic cellular variability, pseudotime analysis [19] has emerged as a critical preprocessing step. Also known as trajectory inference, pseudotime methods estimate the relative progression of individual cells by ordering them along a continuous trajectory based on similarities in their gene expression profiles. This approach provides a high-resolution temporal framework that approximates dynamic biological processes without relying on explicitly measured time points. A variety of pseudotime inference methods have been developed to reconstruct cellular trajectories from single-cell transcriptomic data. These include graph-based approaches such as Monocle [26], Slingshot [23], PHATE [15], diffusion-based methods like Diffusion Pseudotime (DPT) [8] and Diffusion Maps [7], as well as probabilistic and velocity-based models such as scVelo [2].

Recently, Zeng et al.’s work [30] proposed a method called Normi, which integrates the Slingshot pseudotime algorithm with mutual information to infer gene regulatory networks. Normi has demonstrated promising performance, however, it also has several notable limitations. A major issue is the assumption that regulatory effects can have long time delays, suggesting that a gene’s expression may be influenced by distant past events, an idea often inconsistent with biological evidence. Most studies support the first-order Markov property [6], where the current state depends primarily on the immediate past. Thus, a time lag of one is typically sufficient and biologically more plausible for modeling regulatory dependencies [28]. Moreover, the estimation of the optimal time lag in Normi is computationally intensive, significantly limiting its scalability to larger datasets. Another limitation lies in its reliance on mutual information (MI) to infer regulatory relationships. Since MI is symmetric, it cannot capture directionality and is thus unsuitable for inferring causal interactions. Moreover, methods like Normi cannot distinguish between activation and inhibition, critical features of gene regulation, leading the inferred networks to resemble directed correlation networks rather than true regulatory networks.

To address these limitations, we propose a novel integrative framework, called PseudoGRN, that incorporates multiple pseudotime inference strategies, time-lagged f-divergence and integral probability metrics (IPMs), penalized non-redundant edge selection and partial correlation analysis within a unified platform for reconstructing directed gene regulatory networks from time-series scRNA-seq data. The remainder of this paper is organized as follows. Section 2 introduces the proposed methodology and algorithm. In Section 3, we evaluate the performance of our approach using real-world scRNA-seq datasets. Finally, Section 4 concludes the paper with a discussion of the results and directions for future research.

Methods

2

The time-series single-cell RNA sequencing dataset at any given time point $[eqn]$ can be represented by a matrix $[eqn]$ , where $[eqn]$ denotes the number of genes and $[eqn]$ denotes the number of cells. Our aim is to reconstruct the regulatory networks from the time-series scRNA-seq data. Since cells captured at the same experimental time point can be at different stages of biological progression, relying solely on discrete time points may obscure the true temporal dynamics. To more accurately estimate cellular progression, we apply different pseudotime inference methods to generate a continuous-valued vector representing the inferred temporal ordering of individual cells.

Pseudotime Inference

2.1

In this work, we implement several pseudotime inference methods, including Slingshot, PHATE, Diffusion Maps, PAGA, and PCA, to estimate cellular trajectories from scRNA-seq data. Below, we briefly describe two widely used approaches, Slingshot and PHATE. For details on Diffusion Maps and PAGA, readers are referred to the original publications [7, 29].

Slingshot [23] first constructs a minimum spanning tree (MST) over cell clusters to infer the global lineage structure, incorporating prior biological knowledge to guide the branching topology. Then, it fits simultaneous principal curves to model smooth, branching trajectories through the expression space. The pseudotime values are assigned to individual cells along each inferred lineage, providing a continuous and biologically meaningful temporal ordering that captures cellular progression.

PHATE [15] first computes local affinities between cells, models transitions using a diffusion process, and transforms the result into a heat potential representation to stabilize structure. Finally, non-metric multidimensional scaling embeds the data into a low-dimensional space, revealing smooth progression paths and branching trajectories. The pseudotime values are estimated by computing the Euclidean distances from each cell to a designated root cell along the PHATE manifold.

Since pseudotime inference can introduce noise and errors due to inaccuracies in temporal ordering, we adopt the approach of Zeng et al. [30], applying a sliding window (width $[eqn]$ , step size 1) and average smoothing to construct representative, smoothed cell profiles along the pseudotime trajectory.

Time-lagged Divergence Measures

2.2

Previous studies [30, 31] have applied mutual information to infer gene regulatory networks (GRNs). In particular, the Normi method [30] infers regulatory links by computing time-delayed mutual information. However, due to the limitations discussed in the Introduction, we extend this idea and propose a more general framework that employs time-lagged f-divergence and integral probability metrics (IPMs) to quantify the influence of gene $[eqn]$ on gene $[eqn]$ .

The time-lagged f-divergence quantifying the directional dependency between gene $[eqn]$ at time $[eqn]$ and gene $[eqn]$ at time $[eqn]$ is defined as

[eqn]

where, $[eqn]$ and $[eqn]$ are the probability distribution functions of genes $[eqn]$ and $[eqn]$ at time points $[eqn]$ and $[eqn]$ , respectively, with $[eqn]$ denoting the time lag. The function $[eqn]$ is a proper, lower semi-continuous, and convex function satisfying $[eqn]$ . This formulation generalizes several well-known divergence measures, including Kull–back-Leibler, Jensen-Shannon, and Pearson divergence.

Similar to the time-lagged f-divergence, the time-lagged integral probability metric (IPM) is defined as:

[eqn]

where $[eqn]$ is a class of bounded, measurable functions. The choice of $[eqn]$ determines the specific form of the divergence, such as the Wasserstein, Cramér and Energy distance.

In the previous study [30], the time lag was determined by maximizing the distance correlation between two genes, allowing the lag to be greater than one. This implies that gene expression could be influenced by events in a very distant past. However, biological studies support the Markov property in gene regulatory network modeling [6], which suggests that the current state of a gene is primarily influenced by its immediate past. Therefore, a time lag of one is typically sufficient and more biologically plausible for capturing regulatory dependencies [28]. Moreover, estimating the optimal lag in Normi [30] introduces substantial computational overhead. In our method, we adopt a first-order time lag assumption, $[eqn]$ , to enhance both biological relevance and computational efficiency. A time-lagged divergence-based score between a transcription factor $[eqn]$ and a target gene $[eqn]$ is defined as $[eqn]$ , where $[eqn]$ denotes either an f-divergence or an integral probability metric (IPM), which quantifies how changes in the expression of $[eqn]$ at time $[eqn]$ influence the expression of $[eqn]$ at a later time point $[eqn]$ .

Directed Network Inference Algorithm

2.3

Algorithm 1 outlines the pseudocode for the proposed divergence-based method PseudoGRN for inferring directed gene regulatory networks from scRNA-seq data. The procedure consists of four key steps: (1) pseudotime analysis, (2) computation of time-lagged divergence, (3) inference of non-redundant edges, and (4) identification of regulatory relationship. Algorithm 1Divergence-Based Directed Network InferenceRequire: scRNA-seq data, Pseudotime methods, Divergence measures, Parameter λ.Ensure: Signed gene regulatory network 1:Step 1: Pseudotime Analysis 2:Estimate pseudotime using trajectory inference methods (Slingshot, PHATE...) 3:Sort the single cells according to pseudotime 4:Step 2: Computation of Time-lagged Divergence 5:for all gene pairs $[eqn]$ do 6: Compute time-lagged divergence: $[eqn]$ using f-divergence or IPM. 7:end for 8:Step 3: Inference of Non-redundant Edges 9:for each target gene $[eqn]$ do10: Initialize candidate TF set $[eqn]$ (all TFs)11: Initialize selected TF set $[eqn]$ 12: Rank all $[eqn]$ by descending $[eqn]$ 13: $[eqn]$ 14: $[eqn]$ 15: while $[eqn]$ do16: for all $[eqn]$ do17: Compute $[eqn]$ 18: end for19: $[eqn]$ 20: $[eqn]$ 21: end while22: for all $[eqn]$ do23: if $[eqn]$ then24: Add edge $[eqn]$ to GRN25: end if26: end for27:end for28:Step 4: Identification of Regulatory Relationship29:for all inferred edges $[eqn]$ do30: Compute partial correlation $[eqn]$ 31: Label activation if $[eqn]$ , inhibition if $[eqn]$ 32:end for

After computing the first-order time-lagged divergence-based scores $[eqn]$ for all gene pairs using either f-divergence or IPM, next, we proposed a penalized variant of the max-relevance and min-redundancy (mRMR) strategy, originally introduced in [17] and later adapted for mutual information-based network inference in [30]. In our framework, we replace mutual information with divergence-based scores and incorporate a tunable penalization term, allowing for more flexible and accurate modeling of regulatory relationships. Step 3 in Algorithm 1 details this inference procedure. To infer regulatory links for a target gene $[eqn]$ , we first construct a candidate set of transcription factors $[eqn]$ and initialize an empty selected set $[eqn]$ . Genes in $[eqn]$ are ranked in descending order based on their time-lagged divergence with $[eqn]$ , and the top-ranked gene is added to $[eqn]$ . For each remaining candidate gene $[eqn]$ , we compute an adjusted divergence score which is defined as:

[eqn]

where the first term captures relevance to the target gene $[eqn]$ , and the second term penalizes redundancy with previously selected regulators, and $[eqn]$ is a turning parameter. This selection process is repeated iteratively to identify the most relevant and least redundant regulators, retaining only those with non-negative divergence scores as edges in the network.

The final step of Algorithm 1 determines the type of regulatory interaction, activation or inhibition. Following our recent work [28], we compute the partial correlation $[eqn]$ between each gene pair ( $[eqn]$ ) along the inferred edges from Step 3. A positive $[eqn]$ indicates activation, while a negative value suggests inhibition.

Results

3

In this section, we apply Algorithm 1 to reconstruct gene regulatory networks from time-series scRNA-seq data.

Data and Evaluation Metrics

3.1

To evaluate the performance of PseudoGRN, we analyze both simulated and real-world single-cell RNA-seq datasets. Due to space limitations, we present only the results on the real-world THP-1 dataset, which profiles the differentiation of THP-1 human myeloid leukemia cells into macrophages across eight time points, with 120 cells sampled at each time point [10]. The THP-1 dataset has been widely used as a benchmark for network inference [30].

To evaluate the performance of our method, we compute the Area Under the Receiver Operating Characteristic Curve (AUROC) and Area Under the Precision-Recall Curve (AUPRC), two standard metrics commonly used to assess the accuracy of predicted regulatory edges. The tuning parameter $[eqn]$ , which controls redundancy and sparsity of the inferred network in our algorithm, is selected via 5-fold cross-validation using Slingshot pseudotime and the Cramér divergence. The cross-validation results identify $[eqn]$ as the optimal value, which is used in all subsequent analyses for GRN reconstruction.

Network Inference Analysis

3.2

We implement Algorithm 1 using five pseudotime inference methods, including Slingshot, PHATE, Diffusion Maps (DiffMap), PCA, and PAGA, and nine divergence measures: forward KL (F-KL), symmetric KL (S-KL), Jensen-Shannon (JS), Pearson, symmetric Pearson (S-Pearson), Neyman, Wasserstein (Wass), Energy, and Cramér divergence. In all experiments, each trial is repeated five times, and we report the average AUROC and AUPRC scores. Performance is assessed by comparing the inferred network against a known gene regulatory network composed of 20 genes [25].

Table 1 summarizes the mean AUROC and AUPRC scores for gene regulatory networks inferred using nine different divergence measures in combination with five pseudotime inference methods, all evaluated at the optimal regularization parameter $[eqn]$ . Standard errors are not reported, as their magnitudes are on the order of 10^−3^ and thus negligible. Our results indicate that, for a given divergence measure, the overall performance is relatively robust to the choice of pseudotime inference method. However, the performance is substantially affected by the choice of divergence measure, highlighting its critical role in network reconstruction.

Among all the divergence measures, Cramér, a special case of the integral probability metric (IPM), consistently achieves the best performance, with a mean AUROC exceeding 0.6 and an AUPRC around 0.3 across nearly all pseudotime inference methods. This significantly outperforms baseline methods reported in [30], including DeepSEM, GRNBOOST2, SCODE, SCRIBE, GRNVBEM, LEAP, and SINCERITIES, whose mean AUROC scores typically hover around 0.5 and AUPRC values remain below 0.2. Under the Slingshot pseudotime, the AUROC of the Cramér-based method reaches approximately 0.65, while the computation time is only one-fifth of that required by Normi, which involves estimating an optimal time lag, a step that may not be biologically meaningful. Additionally, symmetric Pearson, Neyman, and standard Pearson divergences demonstrate competitive performance across various pseudotime strategies, further highlighting the robustness and effectiveness of divergence-based approaches. These results are consistent with Fig. 2, which presents boxplots of AUROC and AUPRC scores for each divergence measure, aggregated across five pseudotime inference methods. The boxplots illustrate the distribution of scores and their associated standard errors, highlighting the relative stability and performance of each measure.

To illustrate the inferred directed regulatory networks using our PseudoGRN method, Fig. 2 presents representative structures obtained using Slingshot and Cramér divergence under varying regularization parameters $[eqn]$ . The results show that both network structure and sparsity depend on $[eqn]$ ; larger values of $[eqn]$ yield sparser networks with higher AUROC scores.

Conclusion

4

In this study, we introduce PseudoGRN, an integrative framework for inferring directed gene regulatory networks from scRNA-seq data. PseudoGRN incorporates several methodological innovations. First, multiple pseudotime inference methods are employed to estimate cellular trajectories. Next, we define a first-order time-lagged divergence-based score to quantify directional dependencies between gene pairs. A penalized, non-redundant edge selection strategy is then used to infer sparse, biologically meaningful network structures. Finally, partial correlation analysis is applied to classify each regulatory interaction as either activation or inhibition.

We evaluate the performance of our framework using real-world scRNA-seq data from THP-1 cells. Our experimental results demonstrate that the proposed approach outperforms all existing methods, including DeepSEM, GRNBOOST2, SCODE, SINCERITIES, and Normi. Notably, our results suggest that the overall performance is not highly sensitive to the choice of pseudotime analysis methods, but is significantly influenced by the choice of divergence function. Among the divergence measures evaluated, the Cramér distance achieved the best performance, substantially higher than those obtained by other divergence measures and all existing baseline methods. Compared to the Normi method, PseudoGRN demonstrates superior biological interpretability and computational efficiency.

Since scRNA-seq data often contain a large proportion of missing values that our current method cannot handle directly, one direction of future work is to incorporate our missing value imputation techniques [3, 12, 20] into this framework to enhance the accuracy of network inference. In addition, gene regulatory networks may exhibit non-stationary structures across different stages of cellular processes. Another future direction is to integrate changepoint detection algorithms [21] with pseudotime analysis to infer time-varying gene regulatory networks across different biological stages.

Bibliography31

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Ajmal Hamda B. and Madden Michael G.. 2022. Dynamic Bayesian Network Learning to Infer Sparse Models From Time Series Gene Expression Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 19, 5 (Sept. 2022), 2794–2805.34181549 10.1109/TCBB.2021.3092879 · doi ↗ · pubmed ↗
2Bergen Volker, Lange Marius, Peidli Stefan, Wolf F Alexander, and Theis Fabian J. 2020. Generalizing RNA velocity to transient cell states through dynamical modeling. Nature biotechnology 38, 12 (2020), 1408–1414.
3Bishop Graham, Si Tong, Luebbert Isabelle, Al-Hammadi Noor, and Gong Haijun. 2025. t BN-CSDI: a time-varying blue noise-based diffusion model for time-series imputation. Bioinformatics Advances 5, 1 (2025), vbaf 225.41127882 10.1093/bioadv/vbaf 225PMC 12538563 · doi ↗ · pubmed ↗
4Chan Thalia E, Stumpf Michael PH, and Babtie Ann C. 2017. Gene regulatory network inference from single-cell data using multivariate information measures. Cell systems 5, 3 (2017), 251–267.28957658 10.1016/j.cels.2017.08.014PMC 5624513 · doi ↗ · pubmed ↗
5Ding Jun, Sharon Nadav, and Bar-Joseph Ziv. 2022. Temporal modelling using single-cell transcriptomics. Nature Reviews Genetics 23, 6 (2022), 355–368.
6Grzegorczyk Marco and Husmeier Dirk. 2009. Non-stationary continuous dynamic Bayesian networks. Advances in neural information processing systems 22 (2009).
7Haghverdi Laleh, Buettner Florian, and Theis Fabian J. 2015. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics 31, 18 (2015), 2989–2998.26002886 10.1093/bioinformatics/btv 325 · doi ↗ · pubmed ↗
8Haghverdi Laleh, Büttner Maren, Wolf F Alexander, Buettner Florian, and Theis Fabian J. 2016. Diffusion pseudotime robustly reconstructs lineage branching. Nature methods 13, 10 (2016), 845–848.27571553 10.1038/nmeth.3971 · doi ↗ · pubmed ↗