# ARCHI: A New R Package for Automated Imputation of Regionally Correlated Hydrologic Records

**Authors:** Zeno F. Levy, Robin L. Glas, Timothy J. Stagnitta, Neil Terry

PMC · DOI: 10.1111/gwat.13474 · Ground Water · 2025-02-28

## TL;DR

ARCHI is a new R package that automates the imputation of missing hydrological data using regionally correlated records.

## Contribution

ARCHI introduces an iterative algorithm for imputation that dynamically uses sites as both targets and references.

## Key findings

- ARCHI improves imputation accuracy by iteratively expanding the pool of reference records.
- The package outperforms existing tools in computational efficiency and model transparency.
- Benchmarking on groundwater datasets showed ARCHI's effectiveness in real-world applications.

## Abstract

Missing data in hydrological records can limit resource assessment, process understanding, and predictive modeling. Here, we present ARCHI (Automated Regional Correlation Analysis for Hydrologic Record Imputation), a new, open‐source software package in R designed to aggregate, impute, cluster, and visualize regionally correlated hydrologic records. ARCHI imputes missing data in “target” records by linear regression using more complete “reference” records as predictors. Automated imputation is implemented using a novel, iterative algorithm that allows each site to be considered a target or reference for regression, growing the pool of complete references with each imputed record until viable gap‐filling ceases. Users can limit artifacts from spurious correlations by specifying model‐acceptance criteria and applying geospatial, correlation, and group‐based filters to control reference selection. ARCHI provides additional functions for visualizing results, clustering records with similar correlation structures, evaluating holdout data, and interactive parameterization with an accessible and intuitive graphical user interface (GUI). This methods brief provides an overview of the ARCHI package, modeling guidelines, and benchmarking on two regional groundwater‐level datasets from the Central Valley, CA and Long Island, NY. We evaluate ARCHI alongside widely used multivariate imputation software to highlight and contextualize its computational efficiency, imputation accuracy, and model transparency when applied to large, groundwater‐level datasets.

## Full-text entities

- **Genes:** Eno2 (enolase 2, gamma neuronal) [NCBI Gene 13807] {aka D6Ertd375e, Eno-2, NSE}
- **Diseases:** drought (MESH:C536747), CA (MESH:D004670)
- **Cell lines:** MOVE.1 — Homo sapiens (Human), Transformed cell line (CVCL_G005)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12272003/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12272003/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/PMC12272003/full.md

---
Source: https://tomesphere.com/paper/PMC12272003