# Information-Theoretic Modeling of Categorical Spatiotemporal GIS Data

**Authors:** David Percy, Martin Zwick

PMC · DOI: 10.3390/e26090784 · 2024-09-13

## TL;DR

This paper uses an information-theoretic approach to model land use changes over time, focusing on predicting shifts in evergreen forest coverage.

## Contribution

The study introduces Reconstructability Analysis as a novel method for modeling categorical spatiotemporal GIS data.

## Key findings

- RA predicts the presence or absence of Evergreen Forest with approximately 80% accuracy using a sparse set of neighboring cells.
- Cells with Shrubs and Grasses are strongly associated with future Evergreen Forest states, while cells with Evergreen Forest are associated with non-Evergreen Forest states.
- The findings suggest cyclical forest clear-cut patterns explain the dynamic nature of the Evergreen Forest class.

## Abstract

An information-theoretic data mining method is employed to analyze categorical spatiotemporal Geographic Information System land use data. Reconstructability Analysis (RA) is a maximum-entropy-based data modeling methodology that works exclusively with discrete data such as those in the National Land Cover Database (NLCD). The NLCD is organized into a spatial (raster) grid and data are available in a consistent format for every five years from 2001 to 2021. An NLCD tool reports how much change occurred for each category of land use; for the study area examined, the most dynamic class is Evergreen Forest (EFO), so the presence or absence of EFO in 2021 was chosen as the dependent variable that our data modeling attempts to predict. RA predicts the outcome with approximately 80% accuracy using a sparse set of cells from a spacetime data cube consisting of neighboring lagged-time cells. When the predicting cells are all Shrubs and Grasses, there is a high probability for a 2021 state of EFO, while when the predicting cells are all EFO, there is a high probability that the 2021 state will not be EFO. These findings are interpreted as detecting forest clear-cut cycles that show up in the data and explain why this class is so dynamic. This study introduces a new approach to analyzing GIS categorical data and expands the range of applications that this entropy-based methodology can successfully model.

## Full-text entities

- **Genes:** SHB (SH2 domain containing adaptor protein B) [NCBI Gene 6461] {aka bA3J10.2}, LOC514876 (calcitonin related polypeptide beta) [NCBI Gene 514876] {aka CALCA, CALCB, CT}
- **Diseases:** injury to people or property (MESH:C000719191), HID (MESH:C566528), SARS (MESH:D045169), chronic inflammation (MESH:D007249), EFO (MESH:D007733), cancer (MESH:D009369), IPF (MESH:D012640), AIDS (MESH:D000163), DV (MESH:C537362), PHE-t- (OMIM:613700), toxicities (MESH:D064420)
- **Chemicals:** Ethanol (MESH:D000431), Hydroxamic acid (MESH:D006877), H2O2 (MESH:D006861), ABC (MESH:C106538), carbon (MESH:D002244), xenon (MESH:D014978), free radical (MESH:D005609), EB (MESH:D004996), linoleate (MESH:D019787), beta-Carotene (MESH:D019207), Hydrogen (MESH:D006859), DC (MESH:D003841), TAE buffer (MESH:C115179), HCl (MESH:D006851), NaCl (MESH:D012965), citric acid (MESH:D019343), nitrogen (MESH:D009584), N-m-tolyl-4-chlorophenoxyaceto hydroxamic acid (-), phenylalanine (MESH:D010649), Na+ (MESH:D012964), O (MESH:D010100), alginic acid (MESH:D000077322), water (MESH:D014867), agarose (MESH:D012685)
- **Species:** Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932], Homo sapiens (human, species) [taxon 9606], Centrocercus urophasianus (greater sage grouse, species) [taxon 9002]
- **Mutations:** cytosine base at position 70
- **Cell lines:** S2 — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232)

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11431066/full.md

---
Source: https://tomesphere.com/paper/PMC11431066