# Variability in drought gene expression datasets highlights the need for paired physiology and community standardization

**Authors:** Robert VanBuren, Annie Nguyen, Rose A Marks, Catherine Mercado, Anna Pardo, Jeremy Pardo, Jenny Schuster, Brian St. Aubin, Mckena Lipham Wilson, Seung Y Rhee

PMC · DOI: 10.1093/plphys/kiaf653 · 2025-12-16

## TL;DR

Plant drought studies are inconsistent due to varied experimental designs, and adding physiological data could improve reproducibility and interpretation.

## Contribution

The study introduces supervised learning classifiers to identify drought-stressed RNAseq samples and emphasizes the need for paired physiological data.

## Key findings

- Drought gene expression studies are highly variable and difficult to compare even after accounting for genotype and environment.
- Many studies, including those on Arabidopsis, lack high-quality physiological data to assess drought stress severity.
- Supervised learning classifiers can help identify drought-stressed RNAseq samples in the absence of direct physiological measurements.

## Abstract

Physiologically relevant drought stress is difficult to apply consistently, and the heterogeneity in experimental design, growth conditions, and sampling schemes makes it challenging to compare water deficit studies in plants. Here, we reanalyzed hundreds of drought gene expression experiments across diverse model and crop species and quantified the variability across studies. We found that drought studies are surprisingly incomparable, even when accounting for differences in genotype, environment, drought severity, and method of drying. Many studies, including most Arabidopsis (Arabidopsis thaliana) work, lack high-quality phenotypic and physiological datasets to accompany gene expression, making it challenging to assess the severity or consistency of water deficit stress events. To help address this, we developed supervised learning classifiers that can distinguish RNAseq samples that likely experienced drought stress. While not a substitute for direct measurements, these classifiers may aid in interpreting existing datasets and assessing drought severity in studies lacking physiological metadata. Together, our analyses highlight the importance of paired physiological data to quantify stress severity for reproducibility and future data analyses.

A comparative analysis of plant drought studies reveals widespread variability and highlights the need for paired physiological data to improve consistency and interpretability.

## Linked entities

- **Species:** Arabidopsis thaliana (taxon 3702)

## Full-text entities

- **Diseases:** drought (MESH:C536747), water deficit (MESH:D000069578)
- **Species:** Arabidopsis thaliana (mouse-ear cress, species) [taxon 3702]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12854406/full.md

---
Source: https://tomesphere.com/paper/PMC12854406