# A supervised Bayesian method for time (re)annotation of transcriptomics data

**Authors:** Elio Nushi, François P Douillard, Katja Selby, Benjamin A Blount, Oliver J Pennington, Nigel P Minton, Miia Lindström, Antti Honkela

PMC · DOI: 10.1093/nargab/lqaf203 · NAR Genomics and Bioinformatics · 2025-12-31

## TL;DR

This paper introduces a Bayesian method to realign transcriptomics experiments using a reference time course, improving the accuracy of gene expression analysis over time.

## Contribution

A novel Bayesian approach using Gaussian process regression for time realignment of transcriptomics data is proposed.

## Key findings

- The Bayesian method improved growth phase descriptions in microarray data compared to original annotations.
- More differentially expressed genes were detected between successive growth phases using the new method.
- The method outperformed a k-nearest neighbor baseline with higher resolution and accuracy in sparse time series data.

## Abstract

Transcriptomics experiments are often conducted to capture changes in gene expression over time. However, time annotations may be missing, imprecise, or not reflect the same physiological state of the bacterial culture between different experiments. Assigning accurate time points to these experiments using a reference time course is therefore crucial for identifying differentially expressed genes, and understanding gene regulatory networks for elucidating the studied organism’s physiology and life cycle. This important task, which could enhance the biological interpretation of the transcriptomics experiments, has not been previously addressed. In this work, we propose a novel method to solve the challenge of realigning transcriptomics experiments based on a reference time course. Our method is based on a Bayesian approach that uses Gaussian process regression modeling. We show a use case of applying our method for assigning time annotations in legacy microarray samples of the bacterium Clostridium botulinum, which were solely annotated based on the growth phase at the time when the culture aliquots were sampled, utilizing recently collected RNA-Seq time series data comprising multiple replicates as a reference. The method significantly improved the description of the growth phases of the microarray data compared to the original annotations by clearly delineating the microarray samples belonging to different growth phases, as demonstrated by principal component analysis. Consequently, a larger number of differentially expressed genes was detected when comparing experiments belonging to successive growth phases. We compare this innovative approach with a baseline method that uses k-nearest neighbor algorithm and show that our method offers a higher resolution in the description of the data by exposing smaller time changes between samples. We also test the performance of the method on sparse RNA-Seq time series (i.e. sampled every second hour). All the predictions for the samples were within a 30-min margin of their true time.

## Linked entities

- **Species:** Clostridium botulinum (taxon 1491)

## Full-text entities

- **Species:** Clostridium botulinum (species) [taxon 1491]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12754789/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12754789/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/PMC12754789/full.md

---
Source: https://tomesphere.com/paper/PMC12754789