The data synergy effects of time-series deep learning models in hydrology
Kuai Fang, Daniel Kifer, Kathryn Lawson, Dapeng Feng, Chaopeng Shen

TL;DR
This paper demonstrates that deep learning models in hydrology perform better when trained on pooled, diverse datasets across regions rather than on regional data alone, highlighting the importance of data sharing.
Contribution
It introduces the concept of data synergy in deep learning for hydrology, showing pooled data from different regions enhances model performance over regional models.
Findings
Deep learning models outperform regional models when trained on pooled data.
Diverse training data improves model accuracy more than homogeneous data.
Pooling data from different regions benefits model generalization.
Abstract
When fitting statistical models to variables in geoscientific disciplines such as hydrology, it is a customary practice to regionalize - to divide a large spatial domain into multiple regions and study each region separately - instead of fitting a single model on the entire data (also known as unification). Traditional wisdom in these fields suggests that models built for each region separately will have higher performance because of homogeneity within each region. However, by partitioning the training data, each model has access to fewer data points and cannot learn from commonalities between regions. Here, through two hydrologic examples (soil moisture and streamflow), we argue that unification can often significantly outperform regionalization in the era of big data and deep learning (DL). Common DL architectures, even without bespoke customization, can automatically build models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
