# Dynamic Factor Analysis for Sparse and Irregular Longitudinal Data: An Application to Metabolite Measurements in a COVID‐19 Study

**Authors:** Jiachen Cai, Robert J. B. Goudie, Brian D. M. Tom

PMC · DOI: 10.1002/sim.70499 · Statistics in Medicine · 2026-03-16

## TL;DR

This paper introduces a new dynamic factor analysis method to study metabolite data from a COVID-19 study, identifying key biological pathways like the kynurenine pathway and biomarkers such as taurine.

## Contribution

The novel contribution is a dynamic factor analysis model with a multi-output Gaussian process prior and a scalable StEM algorithm for sparse and irregular longitudinal data.

## Key findings

- The proposed model identifies a kynurenine pathway linked to clinical severity in COVID-19 patients.
- The biomarker taurine is uncovered as significant in the study.
- The StEM algorithm is 20 times faster and more accurate than previous methods in simulations.

## Abstract

Factor analysis (FA) can be used to identify key biomarkers in biological processes by assuming that latent biological pathways (statistically, “latent factors”) drive the activity of measurable biomarkers (“observed variables”). However, biological pathways often interact, meaning that the classical FA assumption of independence between factors is questionable. Motivated by sparsely and irregularly collected longitudinal measurements of metabolites in a COVID‐19 study, we propose a dynamic factor analysis model that accounts for cross‐correlations between pathways via a multi‐output Gaussian processes (MOGP) prior on the factor trajectories. To mitigate against overfitting caused by sparsity of longitudinal measurements, we introduce a roughness penalty upon MOGP hyperparameters and allow for non‐zero mean functions. We also propose a scalable stochastic expectation maximization (StEM) algorithm that, in simulations, is both 20 times faster and provides more accurate and stable MOGP hyperparameter estimates than a previously‐proposed Monte Carlo Expectation Maximization algorithm. In the motivating COVID‐19 study, our methodology identifies a kynurenine pathway that affects the clinical severity of patients with COVID‐19 disease and uncovers the role of the biomarker taurine. Our R package DFA4SIL implements the proposed method.

## Linked entities

- **Chemicals:** taurine (PubChem CID 1123)
- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** cancer (MESH:D009369), viral infection (MESH:D014777), dementia (MESH:D003704), COVID-19 (MESH:D000086382), death (MESH:D003643), infection (MESH:D007239), FA (MESH:D005171)
- **Chemicals:** quinolinic acid (MESH:D017378), kynurenine (MESH:D007737), 3-hydroxykynurenine (MESH:C005045), taurine (MESH:D013654), oxygen (MESH:D010100), MCEM (-), Tryptophan (MESH:D014364)
- **Species:** Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12992701/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12992701/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/PMC12992701/full.md

---
Source: https://tomesphere.com/paper/PMC12992701