Spatio-Temporal Change of Support Modeling with R

Andrew M. Raim; Scott H. Holan; Jonathan R. Bradley; and Christopher; K. Wikle

arXiv:1904.12092·stat.CO·January 19, 2024·Comput. Stat.

Spatio-Temporal Change of Support Modeling with R

Andrew M. Raim, Scott H. Holan, Jonathan R. Bradley, and Christopher, K. Wikle

PDF

TL;DR

This paper introduces the R package stcos for Bayesian spatio-temporal change of support modeling, demonstrating its application to U.S. Census data to improve estimates on custom geographies and time periods.

Contribution

It provides a practical guide for implementing STCOS models in R, making advanced spatio-temporal analysis accessible to a broader audience.

Findings

01

The stcos package facilitates efficient computation of change of support models in R.

02

Application to ACS data demonstrates improved estimation accuracy.

03

The methodology enables flexible analysis on custom spatial and temporal scales.

Abstract

Spatio-temporal change of support methods are designed for statistical analysis on spatial and temporal domains which can differ from those of the observed data. Previous work introduced a parsimonious class of Bayesian hierarchical spatio-temporal models, which we refer to as STCOS, for the case of Gaussian outcomes. Application of STCOS methodology from this literature requires a level of proficiency with spatio-temporal methods and statistical computing which may be a hurdle for potential users. The present work seeks to bridge this gap by guiding readers through STCOS computations. We focus on the R computing environment because of its popularity, free availability, and high quality contributed packages. The stcos package is introduced to facilitate computations for the STCOS model. A motivating application is the American Community Survey (ACS), an ongoing survey administered by…

Tables1

Table 1. (a) Estimates based on STCOS model.

Region	Mean	SD	CI Lo	CI Hi	Median	MOE
Central	26,931.74	1,921.27	23,693.43	29,925.61	26,963.39	3,160.21
East	44,199.97	2,449.92	40,217.61	48,043.27	44,256.71	4,029.76
North	44,329.41	2,861.68	39,679.15	49,037.54	44,202.85	4,707.04
Paris	20,822.12	3,636.90	14,965.75	26,665.93	20,772.98	5,982.17

Equations98

Z_{t}^{(ℓ)} (A) = Y_{t}^{(ℓ)} (A) + ε_{t}^{(ℓ)} (A), ε_{t}^{(ℓ)} (A) \sim ind N (0, V_{t}^{(ℓ)} (A)),

Z_{t}^{(ℓ)} (A) = Y_{t}^{(ℓ)} (A) + ε_{t}^{(ℓ)} (A), ε_{t}^{(ℓ)} (A) \sim ind N (0, V_{t}^{(ℓ)} (A)),

Y_{t}^{(ℓ)} (A)

Y_{t}^{(ℓ)} (A)

[η ∣ σ_{K}^{2}]

[ξ_{t}^{(ℓ)} (A) ∣ σ_{ξ}^{2}]

μ_{B} \sim N (0, σ_{μ}^{2} I), σ_{μ}^{2} \sim IG (a_{μ}, b_{μ}), σ_{K}^{2} \sim IG (a_{K}, b_{K}), σ_{ξ}^{2} \sim IG (a_{ξ}, b_{ξ}) .

μ_{B} \sim N (0, σ_{μ}^{2} I), σ_{μ}^{2} \sim IG (a_{μ}, b_{μ}), σ_{K}^{2} \sim IG (a_{K}, b_{K}), σ_{ξ}^{2} \sim IG (a_{ξ}, b_{ξ}) .

Y (u, v) = δ (u) + j = 1 \sum \infty ψ_{j} (u, v) \cdot η_{j}, for u \in ⋃_{i = 1}^{n_{B}} B_{i} and v \in T,

Y (u, v) = δ (u) + j = 1 \sum \infty ψ_{j} (u, v) \cdot η_{j}, for u \in ⋃_{i = 1}^{n_{B}} B_{i} and v \in T,

Y_{t}^{(ℓ)} (A)

Y_{t}^{(ℓ)} (A)

+ \frac{1}{ℓ ∣ A ∣} k = t - ℓ + 1 \sum t j = r + 1 \sum \infty \int_{A} ψ_{j} (u, v_{k}) \cdot η_{j} d u

= μ (A) + s_{t}^{(ℓ)} (A)^{⊤} η + ξ_{t}^{(ℓ)} (A) .

μ (A) = \frac{1}{∣ A ∣} \int_{A} δ (u) d u,

μ (A) = \frac{1}{∣ A ∣} \int_{A} δ (u) d u,

s_{t}^{(ℓ)} (A)^{⊤} η = \frac{1}{ℓ ∣ A ∣} k = t - ℓ + 1 \sum t j = 1 \sum r \int_{A} ψ_{j} (u, v_{k}) \cdot η_{j} d u,

ξ_{t}^{(ℓ)} (A) = \frac{1}{ℓ ∣ A ∣} k = t - ℓ + 1 \sum t j = r + 1 \sum \infty \int_{A} ψ_{j} (u, v_{k}) \cdot η_{j} d u,

ψ_{j} (u, v) = [2 - \frac{∥ u - c _{j} ∥ ^{2}}{w _{s}^{2}} - \frac{∣ v - g _{j} ∣ ^{2}}{w _{t}^{2}}]^{2} \cdot I (∥ u - c_{j} ∥ \leq w_{s}) \cdot I (∣ v - g_{j} ∣ \leq w_{t}),

ψ_{j} (u, v) = [2 - \frac{∥ u - c _{j} ∥ ^{2}}{w _{s}^{2}} - \frac{∣ v - g _{j} ∣ ^{2}}{w _{t}^{2}}]^{2} \cdot I (∥ u - c_{j} ∥ \leq w_{s}) \cdot I (∣ v - g_{j} ∣ \leq w_{t}),

ψ_{j}^{(ℓ)} (A, v) = \frac{1}{ℓ} k = t - ℓ + 1 \sum t \frac{1}{∣ A ∣} \int_{A} ψ_{j} (u, v_{k}) d u,

ψ_{j}^{(ℓ)} (A, v) = \frac{1}{ℓ} k = t - ℓ + 1 \sum t \frac{1}{∣ A ∣} \int_{A} ψ_{j} (u, v_{k}) d u,

ψ_{j}^{(ℓ)} (A, v) \approx \frac{1}{ℓ Q} k = t - ℓ + 1 \sum t q = 1 \sum Q ψ_{j} (u_{q}, v_{k}),

ψ_{j}^{(ℓ)} (A, v) \approx \frac{1}{ℓ Q} k = t - ℓ + 1 \sum t q = 1 \sum Q ψ_{j} (u_{q}, v_{k}),

s_{t}^{(ℓ)} (A)^{⊤} = (ψ_{1}^{(ℓ)} (A, v), \dots, ψ_{r}^{(ℓ)} (A, v)) .

s_{t}^{(ℓ)} (A)^{⊤} = (ψ_{1}^{(ℓ)} (A, v), \dots, ψ_{r}^{(ℓ)} (A, v)) .

δ (u) = i = 1 \sum n_{B} μ_{i} I (u \in A \cap B_{i}),

δ (u) = i = 1 \sum n_{B} μ_{i} I (u \in A \cap B_{i}),

h (A) = (\frac{∣ A \cap B _{1} ∣}{∣ A ∣}, \dots, \frac{∣ A \cap B _{n_{B}} ∣}{∣ A ∣})^{⊤}

h (A) = (\frac{∣ A \cap B _{1} ∣}{∣ A ∣}, \dots, \frac{∣ A \cap B _{n_{B}} ∣}{∣ A ∣})^{⊤}

μ (A)

μ (A)

Z

Z

H

S

ξ

ε

V

Z = H μ_{B} + S η + ξ + ε,

Z = H μ_{B} + S η + ξ + ε,

ε \sim N (0, V), [η ∣ σ_{K}^{2}] \sim N (0, σ_{K}^{2} K), [ξ ∣ σ_{ξ}^{2}] \sim N (0, σ_{ξ}^{2} I),

[μ_{B} ∣ σ_{μ}^{2}] \sim N (0, σ_{μ}^{2} I), σ_{μ}^{2} \sim IG (a_{μ}, b_{μ}), σ_{K}^{2} \sim IG (a_{K}, b_{K}), σ_{ξ}^{2} \sim IG (a_{ξ}, b_{ξ}) .

Y_{t}^{*} = μ_{B} + ζ_{t}, ζ_{t} = M ζ_{t - 1} + b_{t}, [b_{t} ∣ σ_{K}^{2}] \sim iid N (0, σ_{K}^{2} Q^{- 1}),

Y_{t}^{*} = μ_{B} + ζ_{t}, ζ_{t} = M ζ_{t - 1} + b_{t}, [b_{t} ∣ σ_{K}^{2}] \sim iid N (0, σ_{K}^{2} Q^{- 1}),

∥ Σ_{Y^{*}} - S^{*} C S^{* ⊤} ∥_{F}, such that C is an r \times r positive semidefinite matrix,

∥ Σ_{Y^{*}} - S^{*} C S^{* ⊤} ∥_{F}, such that C is an r \times r positive semidefinite matrix,

Var (Y ∣ μ_{B}, σ_{μ}^{2}, σ_{ξ}^{2}, σ_{K}^{2}) = σ_{K}^{2} S K S^{⊤} + σ_{ξ}^{2} I,

Var (Y ∣ μ_{B}, σ_{μ}^{2}, σ_{ξ}^{2}, σ_{K}^{2}) = σ_{K}^{2} S K S^{⊤} + σ_{ξ}^{2} I,

C^{*} = (S^{* ⊤} S^{*})^{- 1} S^{* ⊤} Σ_{Y^{*}} S^{*} (S^{* ⊤} S^{*})^{- 1},

C^{*} = (S^{* ⊤} S^{*})^{- 1} S^{* ⊤} Σ_{Y^{*}} S^{*} (S^{* ⊤} S^{*})^{- 1},

C^{*} = σ_{K}^{2} K, K = (S^{* ⊤} S^{*})^{- 1} S^{* ⊤} \tilde{Σ}_{Y^{*}} S^{*} (S^{* ⊤} S^{*})^{- 1} .

C^{*} = σ_{K}^{2} K, K = (S^{* ⊤} S^{*})^{- 1} S^{* ⊤} \tilde{Σ}_{Y^{*}} S^{*} (S^{* ⊤} S^{*})^{- 1} .

Γ (s, t) = Cov (Y_{s}^{*}, Y_{t}^{*}) = min (s, t) σ_{K}^{2} Q^{- 1},

Γ (s, t) = Cov (Y_{s}^{*}, Y_{t}^{*}) = min (s, t) σ_{K}^{2} Q^{- 1},

\tilde{Σ}_{Y^{*}} = \tilde{Γ} (1, 1) ⋮ \tilde{Γ} (T, 1) \dots ⋱ \dots \tilde{Γ} (1, T) ⋮ \tilde{Γ} (T, T)

\tilde{Σ}_{Y^{*}} = \tilde{Γ} (1, 1) ⋮ \tilde{Γ} (T, 1) \dots ⋱ \dots \tilde{Γ} (1, T) ⋮ \tilde{Γ} (T, T)

K

K

K

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Spatio-Temporal Change of Support Modeling with R

Andrew M. Raima, Scott H. Holanb,c, Jonathan R. Bradleyd, & Christopher K. Wikleb

Abstract

Spatio-temporal change of support methods are designed for statistical analysis on spatial and temporal domains which can differ from those of the observed data. Previous work introduced a parsimonious class of Bayesian hierarchical spatio-temporal models, which we refer to as STCOS, for the case of Gaussian outcomes. Application of STCOS methodology from this literature requires a level of proficiency with spatio-temporal methods and statistical computing which may be a hurdle for potential users. The present work seeks to bridge this gap by guiding readers through STCOS computations. We focus on the R computing environment because of its popularity, free availability, and high quality contributed packages. The stcos package is introduced to facilitate computations for the STCOS model. A motivating application is the American Community Survey (ACS), an ongoing survey administered by the U.S. Census Bureau that measures key socioeconomic and demographic variables for various populations in the United States. The STCOS methodology offers a principled approach to compute model-based estimates and associated measures of uncertainty for ACS variables on customized geographies and/or time periods. We present a detailed case study with ACS data as a guide for change of support analysis in R, and as a foundation which can be customized to other applications.

Keywords: American Community Survey, Areal Data, Basis Functions, Bayesian Statistics, Model-Based Estimates, Official Statistics

00footnotetext:

aCenter for Statistical Research and Methodology, U.S. Census Bureau

bDepartment of Statistics, University of Missouri

cOffice of the Associate Director for Research and Methodology

dDepartment of Statistics, Florida State University

∗Emails: [email protected], [email protected], [email protected], [email protected]

1 Introduction

In the course of an analysis where data are inherently spatio-temporal, an investigator may desire estimates on spatial and/or temporal domains not coinciding exactly with domains of the observations. This can include customized geographies and time periods conceived long after the data have been collected. Spatio-temporal change of support methods aim to provide this capability. A methodology recently proposed by Bradley et al. (2015b) captures spatio-temporal dependencies in areal data by constructing several key matrices which become the foundation of a Bayesian hierarchical model. Model fitting is done via Markov chain Monte Carlo (MCMC); in particular, the model permits a Gibbs sampler which is conveniently composed of draws from standard distributions. Estimates, predictions, and appropriate measures of uncertainty are provided by the fitted model. This methodology, hereafter referred to as the STCOS model or STCOS methodology, is the focus of the present paper. Although STCOS methodology has been fully specified by Bradley et al. (2015b), potential users—such as subject-domain scientists who may not be experts in spatio-temporal statistics—may find proceeding from the previous literature to their own applications to be a difficult hurdle. A successful implementation requires managing datasets containing estimates, geospatial data, operations on sparse matrices, Bayesian computing, plotting, as well as carrying out computations tailored to the STCOS model.

In this paper, we demonstrate an assortment of tools to perform STCOS modeling through a detailed case study, with the objective of making the methodology more accessible to potential users. The required tasks can be accomplished with a variety of modern computing platforms, but we will focus on R, the popular open source environment for statistical computing (R Core Team, 2020). R is supported by an active community of academic, corporate, and individual users. A large and diverse collection of packages has been contributed by its community and published to repositories such as the Comprehensive R Archive Network (CRAN). Much of R, including the base platform and CRAN packages, is freely available on the internet. The high-level R programming language facilitates data analysis, fast prototyping of new methods, and simulation, and can be augmented with C, C++, and FORTRAN when speed or efficient use of memory are crucial. In addition to highlighting some established R packages, we introduce the stcos package to handle some of the more intricate STCOS computations in an efficient and user-friendly way. Familiarity with R will be assumed throughout the remainder of the paper.

Bradley et al. (2015b) developed STCOS with a motivating application to the American Community Survey (ACS), an ongoing survey administered by the U.S. Census Bureau for the purpose of measuring key socioeconomic and demographic variables for the U.S. population. ACS is again showcased in the present paper as it remains an important application for change of support methods (Weinberg et al., 2018). However, STCOS methodology is not limited to applications involving the ACS or the U.S. Census Bureau. The problem of spatial change of support has arisen in atmospheric science and oceanography (Wikle and Berliner, 2005), water quality modeling (Rode et al., 2010), environmental health (Fuentes et al., 2006), and remote sensing (Nguyen et al., 2012), among others. See Gotway and Young (2002), Bradley et al. (2015b), and the references therein for a review of the change of support literature.

Existing software tools for change of support appear to originate from geographic information systems (GIS) literature, emphasizing methods such as pycnophylactic interpolation (Tobler, 1979), areal-weighted interpolation (Lam, 1983), and dasymetric mapping (e.g. Eicher and Brewer, 2001). In R, such tools include the pycno package (Brunsdon, 2014), the st_interpolate_aw function in the sf package (Pebesma, 2018), and the areal package (Prener and Revord, 2019). The Tobler package was developed for Python by Cortes et al. (2019). Qiu et al. (2012) and Mileu and Queirós (2018) describe adding change of support capabilities to the ArcGIS and QGIS platforms, respectively. From the perspective of software tools, the present work offers two major contributions: (1) measures of uncertainty expressed via a statistical model, and (2) the capability to carry out change of support in both space and time.

The remainder of the article proceeds as follows. Section 2 discusses STCOS concepts in the context of the ACS. Section 3 reviews STCOS methodology. Here some additional details are provided—and some small modifications are made—from the original formulation of Bradley et al. (2015b). Section 4 discusses the set of R tools to be demonstrated, including basic functionality of the stcos package. Section 5 presents our case study to demonstrate STCOS programming; we produce model-based estimates of median household income for several neighborhoods in the City of Columbia in Boone County, Missouri. Section 6 concludes the article. This article is intended to be largely self-contained for a wide range of readers; those eager to begin programming can focus primarily on Sections 4 and 5. The stcos package is available on the CRAN at https://CRAN.R-project.org/package=stcos. The complete code for the City of Columbia data analysis is provided as a supplement to this article.

2 Change of support concepts and the ACS

To facilitate our discussion of the change of support problem and STCOS methodology, we now give a brief overview of the ACS. Public-use ACS data are available through the Census Bureau’s ACS website (https://www.census.gov/programs-surveys/acs) dating back to the year 2005. Estimates have historically been released for 1-year, 3-year, or 5-year periods; 3-year period estimates were discontinued after 2013. The Census Bureau releases annual ACS period estimates for a variety of geographies including states, counties, census tracts, and school districts. At their finest geography, data are released at the census block-group level; however, estimates for an area are suppressed unless the area meets certain criteria. An area typically must have a population of at least 65,000 for 1-year estimates to be released, but there is no population requirement for 5-year estimates (U.S. Census Bureau, 2016). ACS estimates consist of point estimates and associated measures of uncertainty such as margins of error (MOEs) corresponding to 90% confidence intervals, or variance estimates; we will refer to them collectively as direct estimates. Because statistical agencies like the Census Bureau have direct access to the confidential microdata, special tabulations for new geographies or period lengths can be prepared internally as needed. However, data users outside of the Census Bureau may be interested in custom geographies and/or nonstandard time periods which are not provided by the agency. Providing ACS data users tools for change of support has recently been identified as an important problem by a National Academy of Sciences panel (National Academy of Sciences, 2015). STCOS methodology enables model-based estimates to be computed with public-use ACS releases.

The change of support problem can be illustrated by a concrete example, taking median household income as the variable of interest here and for the remainder of the article. Suppose we would like to produce 3-year model-based estimates in Missouri congressional districts for the year 2015. Congressional districts are geographic regions which receive representation by an elected official in the U.S. House of Representatives and are determined by a redistricting process which is based on data from each decennial census. The Census Bureau does release ACS estimates on congressional districts, but releases of 3-year estimates for all geographies were discontinued after 2013; therefore, model-based estimates may be of interest to data users. Geographies on which we want to produce estimates and predictions are referred to as target supports. Figure 1 displays the eight designated congressional districts in Missouri for the year 2015. Geographies on which direct estimates are available are used to fit the STCOS model and are referred to as source supports. For this illustration, we could take the source supports to be all 1-year, 3-year, and 5-year ACS releases for the counties within Missouri. Including available periods over a number of years allows the STCOS model to find trends in both time and space, and make use of estimates which represent varying levels of granularity and sparseness. Figure 2 shows direct estimates for Missouri in the year 2013. We notice that 1-year and 3-year period estimates have been suppressed for many counties. We emphasize that counties and congressional districts do not necessarily align, and the crux of the STCOS problem is to “translate” between the county-level observations and the congressional districts. The third type of support which must be discussed is the fine-level support. For this example, we could take the fine-level support to be the 2015 definition of counties in Missouri, shown in Figure 1. The STCOS methodology works by translating each of the source supports to the fine-level support during the model fitting process. Once the model has been fit, estimates and predictions on target supports of interest are obtained by translating from the fine-level support. Raim et al. (2017) presents a model selection study with counties in the continental U.S. as target and source supports, congressional districts as target supports, and median household income as the ACS variable of interest. Section 5 will demonstrate a smaller-scale problem which requires less computing time.

3 The STCOS model

Let $\mathcal{T}=\{v_{1},\ldots,v_{T}\}$ represent the set of times for which direct estimates are available, indexed by $t=1,\ldots T$ . Let $\mathcal{L}$ denote the set of possible lookback periods for which these estimates have been constructed. We will take $\mathcal{T}$ to consist of the years 2005 through 2017, corresponding to the ACS releases available during the preparation of this article, and $\mathcal{L}=\{1,3,5\}$ to denote 1-year, 3-year, and 5-year period releases. Therefore, $\ell$ -year direct estimates for year $v_{t}$ are based on the time period $(v_{t-\ell+1},\ldots,v_{t})$ . Data may not be released for all $(v_{t},\ell)\in\mathcal{T}\times\mathcal{L}$ ; for example, ACS 3-year estimates were discontinued after 2013. Let $(\mathcal{T}\times\mathcal{L})^{*}$ denote the subset of $\mathcal{T}\times\mathcal{L}$ that corresponds to a data release. For each $(v_{t},\ell)\in(\mathcal{T}\times\mathcal{L})^{*}$ , the associated source support $D_{t\ell}$ is a collection of areal units whose estimates are included in the release. For each areal unit $A\in D_{t\ell}$ , $Z_{t}^{(\ell)}(A)$ is the direct point estimate for one ACS variable of interest and $V_{t}^{(\ell)}(A)$ is the corresponding variance estimate. The fine level support will be denoted $D_{B}=\{B_{1},\ldots,B_{n_{B}}\}$ . The total surface area of a given areal unit $A$ will be denoted $|A|$ .

The STCOS model is a Bayesian hierarchical model (Cressie and Wikle, 2011, Section 2.1) which will first state before describing components in detail. Let $\text{N}(\bm{\mu},\bm{\Sigma})$ denote the multivariate normal distribution with density $\phi(\bm{x}\mid\bm{\mu},\bm{\Sigma})=(2\pi)^{-k/2}|\bm{\Sigma}|^{-1/2}\exp\{-\frac{1}{2}(\bm{x}-\bm{\mu})^{\top}\bm{\Sigma}^{-1}(\bm{x}-\bm{\mu})\}$ for $\bm{x}\in\mathbb{R}^{k}$ , where the dimension $k$ depends on the context. Let $\text{IG}(a,b)$ denote the Inverse Gamma distribution with density $f_{\text{IG}}(x\mid a,b)=b^{a}x^{-a-1}e^{-b/x}/\Gamma(a)\cdot I(x>0)$ , where $I(\cdot)$ is the indicator function. First, the data model is

[TABLE]

for $A\in D_{t\ell}$ and $(v_{t},\ell)\in(\mathcal{T}\times\mathcal{L})^{*}$ . Second, the process model is

[TABLE]

for $A\in D_{t\ell}$ and $(v_{t},\ell)\in(\mathcal{T}\times\mathcal{L})^{*}$ . Finally, the parameter model is

[TABLE]

The STCOS model assumes that direct estimates $Z_{t}^{(\ell)}(A)$ constitute a noisy observation of an underlying latent process $Y_{t}^{(\ell)}(A)$ . The variance of the noise $\varepsilon_{t}^{(\ell)}(A)$ is assumed to be the direct variance estimate $V_{t}^{(\ell)}(A)$ . The mean of the latent process $Y_{t}^{(\ell)}(A)$ consists of a coarse spatial trend $\bm{h}(A)^{\top}\bm{\mu}_{B}$ and a spatio-temporal random process $\bm{s}_{t}^{(\ell)}(A)^{\top}\bm{\eta}$ . Conjugate priors are assumed for the coefficients and variance parameters from the previous two stages. The matrix $\bm{K}$ , which provides the covariance structure for the random coefficient of $\bm{\eta}$ , is assumed to be known and is computable from the fine-level support.

The latent process model is motivated by the following construction. Define a continuous-space discrete-time process,

[TABLE]

where $\delta(\bm{u})$ is a large-scale spatial trend process and $\{\psi_{j}(\bm{u},v)\}_{j=1}^{\infty}$ is a prespecified set of spatio-temporal basis functions. Integrating $Y(\bm{u},v)$ uniformly over $\bm{u}\in A$ and an $\ell$ -year period $\bm{v}=(v_{t-\ell+1},\ldots,v_{t})$ ,

[TABLE]

In (3.1), we have used the notation

[TABLE]

so that (3.2) represents a large-scale spatial trend, (3.3) is a spatio-temporal random process, and (3.4) is the remainder. We assume that $[\xi_{t}^{(\ell)}(A)\mid\sigma_{\xi}^{2}]\stackrel{{\scriptstyle\text{iid}}}{{\sim}}\text{N}(0,\sigma_{\xi}^{2})$ , and make use of local bisquare basis functions for the small-scale spatio-temporal trend, which are of the form

[TABLE]

for $j=1,\ldots,r$ , with specified knots $\{(\bm{c}_{j},g_{j}):j=1,\ldots,r\}$ . Knots may be taken as the Cartesian product of a set of spatial knot points $\{\bm{c}_{a}:a=1,\ldots,r_{\text{space}}\}$ and a set of temporal knot points $\{g_{b}:b=1,\ldots,r_{\text{time}}\}$ ; however, this is not required in general. The basis functions also require specification of a spatial radius $w_{s}$ and temporal radius $w_{t}$ . We will select evenly spaced temporal knot points and spatial knot points according to a space-filling design (Nychka and Saltzman, 1998). It can be difficult to specify $w_{s}$ directly, as the influence of $w_{s}$ depends on the coordinate system used in the supports. We therefore take $w_{s}=\tilde{w}_{s}\cdot Q_{0.05}$ , where $Q_{0.05}$ is the 0.05 quantile of all nonzero pairwise distances between spatial cutpoints and $\tilde{w}_{s}$ is a parameter to be selected by the user. See Raim et al. (2017) for a model selection study varying several factors in this model such as the number of knot points and the selection of $\tilde{w}_{s}$ and $w_{t}$ .

Basis functions at the area level may be obtained from bases (3.5) defined at the point level; for area $A$ and an $\ell$ -year period specified by years $\bm{v}=(v_{t-\ell+1},\ldots,v_{t})$ , let

[TABLE]

which can be computed by a Monte Carlo approximation via

[TABLE]

based on a random sample of locations $\bm{u}_{1},\ldots,\bm{u}_{Q}$ from a uniform distribution on the region $A$ . Therefore, the basis expansion for an $\ell$ -year lookback period $\bm{v}=(v_{t-\ell+1},\ldots,v_{t})$ and area $A$ is

[TABLE]

Next, for the large-scale spatial trend process, we make the simplifying assumption that

[TABLE]

for an area $A$ . Then $\delta(\bm{u})$ takes on a constant value on each overlap $A\cap B_{i}$ for $B_{i}\in D_{B}$ . Define

[TABLE]

as the vector of proportions in which $A$ overlaps with each area $B_{i}$ in the fine-level support; this is based on the geography, and is therefore a known quantity in the analysis. Manipulation of geographical data in R will be discussed in Sections 4 and 5. Integrating over $\bm{u}\in A$ yields

[TABLE]

The coefficient $\bm{\mu}_{B}=(\mu_{1},\ldots,\mu_{n_{B}})^{\top}$ represents the change of support coefficient between the fine-level support and all other supports, and is the primary quantity of interest in the model.

To simplify the remaining presentation, we now write the model in vector form. Suppose there are $N$ total observations, indexed $i=1,\ldots,N$ , in all of the source supports combined. Let $\mathcal{H}$ be the mapping from each index $i$ to a triple $(A,t,\ell)$ consisting of the area $A$ , time $v_{t}$ , and lookback $\ell$ for the $i$ th observation. Let $\text{vec}(\mathcal{S})$ denote a vector constructed from the elements of an ordered collection $\mathcal{S}$ , $\operatorname{Diag}(\mathcal{S})$ represent a diagonal matrix with the elements of $\mathcal{S}$ , and $\text{rbind}(\mathcal{S})$ represent a matrix with the elements of $\mathcal{S}$ as rows. We may then write

[TABLE]

where $\bm{h}_{t}^{(\ell)}(A)=\bm{h}(A)$ does not vary with $t$ or $\ell$ . The STCOS model can now be written

[TABLE]

We will also define $\bm{Y}=\bm{H}\bm{\mu}_{B}+\bm{S}\bm{\eta}+\bm{\xi}$ as the latent process for the observations.

We now discuss specification of the matrix $\bm{K}$ . Let $A\sim B$ be the predicate that area $A$ is adjacent to area $B$ with $A\sim A$ taken to be false by definition. Denote $\bm{W}=(w_{ij})$ as the $n_{B}\times n_{B}$ adjacency matrix with $w_{ij}=I(B_{i}\sim B_{j})$ for $i,j\in\{1,\ldots,n_{B}\}$ , and $\bm{D}=\operatorname{Diag}(w_{1+},\ldots,w_{n+})$ with $i$ th diagonal entry $w_{i+}=\sum_{\ell=1}^{n_{B}}w_{ij}$ . The matrix $\bm{Q}=\bm{I}-\tau\bm{D}^{-1}\bm{W}$ corresponds to the precision matrix of a particular class of conditional autoregressive (CAR) process. We take $\tau\in(0,1)$ to be known, for simplicity, to ensure that $\bm{Q}$ is nonsingular provided that areas $B_{1},\ldots,B_{n_{B}}$ form a connected graph. Other choices of $\bm{Q}$ can be considered to obtain other classes of CAR precision matrices (see Cressie and Wikle, 2011; Banerjee et al., 2014, and the references therein).

For the purpose of specifying a spatio-temporal variance, suppose the fine-level support is distributed according to the process

[TABLE]

for $v_{t}\in\mathcal{T}$ and assume $\bm{b}_{0}=\bm{0}$ . That is, $\{\bm{Y}_{t}^{*}\}$ is a vector autoregressive (VAR) process in time and a CAR process in space. Let $\bm{\Sigma}_{Y^{*}}$ denote the covariance matrix of $(\bm{Y}_{t}^{*}:v_{t}\in\mathcal{T})$ under model (3.9). We take $\bm{K}$ to be the minimizer of

[TABLE]

under the Frobenius norm $\lVert\cdot\rVert_{\text{F}}$ , where $\bm{S}^{*}=\text{rbind}\left(\bm{s}_{t}^{(\ell)}(A)^{\top}:A\in D_{B},v_{t}\in\mathcal{T},\ell\in\mathcal{L}\right)$ is the basis function expansion on the fine-level geography. In (3.10), $\bm{\Sigma}_{Y^{*}}$ represents the desired covariance structure under model (3.9), while $\bm{S}^{*}\bm{C}\bm{S}^{*\top}$ represents the realized covariance contribution of $\bm{S}\bm{\eta}$ in the model (3.8), where

[TABLE]

conditionally on the random variables in the parameter model. The solution to (3.10),

[TABLE]

provides the best positive approximant to $\bm{\Sigma}_{Y^{*}}$ ; details are given in Appendix A. For the remainder of the article, we will take $\bm{\Sigma}_{Y^{*}}$ to be positive definite and $\bm{S}^{*}$ to be full rank so that $\bm{K}$ is positive definite. Bradley et al. (2015b) and Bradley et al. (2015a) further discuss this approach within the context spatio-temporal models, and Higham (1988) discusses the positive approximant problem in the general setting. We may write $\bm{\Sigma}_{Y^{*}}=\sigma_{K}^{2}\tilde{\bm{\Sigma}}_{Y^{*}}$ so that

[TABLE]

Notice that $\tilde{\bm{\Sigma}}_{Y^{*}}$ and $\bm{K}$ are free of unknown parameters so that the solution of (3.10) does not need to be recomputed within MCMC iterations as parameter values are updated.

We consider several possible structures for $\bm{K}$ . First, assume that $\bm{M}=\bm{I}$ so that the fine-level process defined in (3.9) is a vector random walk with nonstationary autocovariance function

[TABLE]

conditioning on $\sigma_{K}^{2}$ . Letting $\tilde{\bm{\Gamma}}(s,t)=\sigma_{K}^{-2}\bm{\Gamma}(s,t)$ , which is free of $\sigma_{K}^{2}$ , and choosing

[TABLE]

as the covariance of $\{\bm{Y}_{t}^{*}\}$ , $\bm{K}$ is obtained from (3.11) to be

[TABLE]

where we define $\bm{S}_{t}^{*}=\text{rbind}\left(\bm{s}_{t}^{(\ell)}(A)^{\top}:A\in D_{B},\ell\in\mathcal{L}\right)$ for each $v_{t}\in\mathcal{T}$ . Another useful covariance structure arises if we assume that $\bm{M}=\bm{0}$ . This yields autocovariance function $\bm{\Gamma}(s,t)=I(s=t)\sigma_{K}^{2}\bm{Q}^{-1}$ and $\bm{\Sigma}_{Y^{*}}=\sigma_{K}^{2}\bm{Q}^{-1}\otimes\bm{I}_{T}$ , conditioning on $\sigma_{K}^{2}$ , where $\otimes$ represents the Kronecker product. This structure supports nonzero covariance among areas at common times but independence between areas across times. The approximant (3.11) with $\tilde{\bm{\Sigma}}_{Y^{*}}=\bm{Q}^{-1}\otimes\bm{I}_{T}$ is

[TABLE]

One more useful covariance structure assumes no spatial or temporal covariance;

[TABLE]

It is worth emphasizing that $\bm{K}$ describes the covariance structure for $\bm{\eta}$ , but the covariance contribution to the model occurs through $\bm{Y}$ via $\bm{S}\bm{K}\bm{S}^{\top}$ . For example, an independence assumption for $\bm{\eta}$ yields $\bm{S}\bm{K}\bm{S}^{\top}=\bm{S}\bm{S}^{\top}$ , which is not necessarily a diagonal matrix using the basis functions (3.6) or the dimension-reduced version discussed in Section 5. The covariance structures we consider in this work—namely (3.12), (3.13), and (3.14)—are a departure from Bradley et al. (2015b), who recommend computing $\bm{M}$ itself by further basis function decomposition.

We can obtain a Gibbs sampler by considering the joint distribution of the random quantities in (3.8),

[TABLE]

and deriving the full conditional distribution of each unknown parameter (e.g., Banerjee et al., 2014, Section 5.3). Here, the derivation is routine and details have been omitted for brevity. The steps of the Gibbs sampler which result from the full conditionals of $\bm{\mu}_{B}$ , $\bm{\eta}$ , $\bm{\xi}$ , $\sigma_{\mu}^{2}$ , $\sigma_{K}^{2}$ , and $\sigma_{\xi}^{2}$ are stated as Algorithm 3.1. The notation $[\bm{X}\mid\operatorname{---}]$ is used to denote the distribution of a given random variable $\bm{X}$ conditioned on all other random quantities.

4 Implementing STCOS in R

STCOS modeling can be roughly separated into three phases: assembling published estimates and geospatial data into a usable form, preparing matrices and vectors needed to fit the model, and finally fitting the model and producing results. To read and manipulate geospatial data, we will highlight the sf package (Pebesma, 2018), which we find to be intuitive and comprehensive. For general data manipulation, such as filtering records and selecting columns from a table, we will make use of the dplyr package (Wickham et al., 2020). To produce high quality graphics, we use the ggplot2 package (Wickham, 2016). The dplyr and ggplot2 packages are especially convenient because of their compatibility with sf objects. The tigris package (Walker, 2018) provides a convenient way to request geographical data from the Census Bureau Tiger/Line database within R. The fields package (Nychka et al., 2017) can be used to select spatial knot points by a space-filling design. General purpose platforms for Bayesian computing, including Stan (Carpenter et al., 2017), JAGS (Depaoli et al., 2016), BUGS (Lunn et al., 2009), and Nimble (de Valpine et al., 2017), are accessible through an R interface. A major advantage of such platforms is that samplers can be programmed simply by specifying a model and providing the data. In contrast, the traditional Gibbs sampler approach may require derivation, programming, and testing for each new model. However, general purpose platforms may not be well-suited to certain classes of models or to very large datasets. We will illustrate the use of Stan via the rstan package (Stan Development Team, 2020) in addition to the Gibbs sampler from Section 3.

Some aspects of implementing STCOS analysis in R can be laborious and prone to error. To reduce the burden, we introduce the stcos package. The stcos package provides several major capabilities including: functions to compute overlap matrix $\bm{H}$ and adjacency matrix $\bm{W}$ , basis functions to compute $\bm{S}$ , construction of covariance $\bm{K}$ , maximum likelihood estimation for the STCOS model, and an STCOS Gibbs sampler. Basis functions discussed in Section 3 will be demonstrated shortly. Internal basis function calculations are carried out in C++, for efficiency, via the Rcpp and RcppArmadillo packages (Eddelbuettel, 2013; Eddelbuettel and Sanderson, 2014). Matrices such as $\bm{H}$ and $\bm{S}$ are likely to be sparse in many STCOS applications; we use the Matrix package (Bates and Maechler, 2019) to support operations on sparse matrices.

We will now give an overview of the major STCOS computations which will be needed in R. Section 5 will provide a demonstration connecting these pieces into a complete analysis. The following packages are assumed to be loaded in all coding examples.

1R> library("sf")2R> library("dplyr")3R> library("stcos")A natural way to encode geographical features in source, fine-level, and target supports is via sf objects. Data associated with the supports can be embedded into sf objects to facilitate model preparation and graphical display. Therefore, our preferred workflow will be to produce sf objects with direct and model-based estimates. An example of a prepared source support object is as follows.

1R> head(acs5_2013, 3)2Simple feature collection with 3 features and 8 fields3geometry type: POLYGON4dimension: XY5bbox: xmin: -10280140 ymin: 4712766 xmax: -10277220 ymax: 47147506CRS: EPSG:38577 geoid state county tract blockgroup DirectEst DirectMOE DirectVar81 290190005001 29 019 000500 1 9970 3157 368378892 290190005002 29 019 000500 2 12083 7048 18360194103 290190006001 29 019 000600 1 105156 16979 10655398711 geometry121 POLYGON ((-10278231 4713772...132 POLYGON ((-10279369 4713339...143 POLYGON ((-10280135 4712926...Note that we have manipulated this output and some subsequent outputs to ensure that they fit on the page. The CRS descriptor specifies a geographical coordinate system for the data. A number of standard coordinate systems are used to express geographical data, each having its own benefits and drawbacks. Coordinates such as latitude and longitude used in the global positioning system (GPS) describe points on the surface of the globe. Map projections provide two-dimensional representations of a region, which are convenient in many applications but necessarily distort the geography in some way. Conformal projections are designed to preserve local shape and are considered suitable for smaller domains, but distort areas when applied to large regions. On the other hand, equal-area projections are designed to preserve areas over large regions. To apply STCOS and other spatial-temporal methods, the analyst must select an appropriate coordinate system. We have chosen the Web Mercator projection (EPSG:3857) which is utilized in a number of online mapping services and is a variation of the conformal Mercator projection (Battersby et al., 2014). Further discussion and references on coordinate systems can be found in Bivand et al. (2013, Chapter 4) and Waller and Gotway (2004, Chapter 3).

All source, target, and fine-level supports in an analysis should use a common coordinate system so that they are compatible; this is not a limitation, as the analyst may transform an sf object from its original coordinates using the sf::st_transform function. Furthermore, methods described in this article are suited toward coordinates in a map projection rather than a globe representation. For example, the Euclidean distance utilized in (3.6) does not take into account the curvature of the Earth. An analysis using spherical coordinates—which may be appropriate for a larger-scale domain—might instead consider a great circle distance. Now that the importance of the coordinate system has been emphasized, coordinates will be considered as raw numerical values for the remainder of the article.

The last four lines of the previous display show a table with nine fields, where each row corresponds to an area (county) in the file. The geometry field contains details about the county’s geography, which we typically will not want to manipulate directly. The fields STATE and COUNTY represent Federal Information Processing Standards (FIPS) codes for the state and county respectively, and GEO_ID is an identifier which combines the two. The fields DirectEst, DirectMOE, and DirectVar represent direct ACS estimates of median household income and an associated estimate of margin of error and variance. Preparation of such an sf object from geographical data and direct estimates will be discussed in Section 5.1.

A function overlap_matrix is provided to compute the $\bm{H}$ matrix.

1R> H = overlap_matrix(dom1, dom2, proportion = TRUE)Here, dom1 and dom2 are sf objects which describe domains of areal units. The result is an nrow(dom1) by nrow(dom2) matrix. If proportion = FALSE, the entries represent the amount of area for each overlap; otherwise rows are normalized to proportions which sum to 1.

The stcos package provides several variations of the local bisquare basis functions discussed in Section 3. The following functions operate on data where space is represented at the point-level.

1R> S = spatial_bisquare(dom, knots, w_s)2R> S = spacetime_bisquare(dom, knots, w_s, w_t)The function spacetime_bisquare implements (3.5) which uses information both in space and time, while spatial_bisquare implements a space-only version

[TABLE]

The object dom may either be a numerical matrix or an object of type sf or sfc containing points. In both cases, the first two columns/coordinates represent the spatial coordinates and the third represents time, if applicable. The object knots provides knot points, and may similarly be specified as either a numerical matrix or a sf or sfc object containing points. Coordinates systems for the points in knots are expected to be compatible with those in dom. Two-dimensional points are expected in spatial_bisquare, where each represents a $\bm{c}_{j}$ . Similarly, three-dimensional points are expected in spacetime_bisquare, so that each represents a $(\bm{c}_{j},g_{j})$ . The variables w_s and w_t correspond to the spatial and temporal radius, respectively.

The following functions operate on data where space is represented at an area-level.

1R> S = areal_spatial_bisquare(dom, knots, w_s, control = NULL)2R> S = areal_spacetime_bisquare(dom, period, knots, w_s, w_t, control = NULL)The function areal_spacetime_bisquare implements (3.6), while areal_spatial_bisquare computes a space-only version

[TABLE]

based on (4.1). Here, the object dom is of type sf or sfc and provides the geography for one or more areal units. The variable period is a numeric vector which represents time period $\bm{v}=(v_{t-\ell+1},\ldots,v_{t})$ used to evaluate (3.6). For example, if dom represents ACS 5-year estimates for 2017, we will take period = 2013:2017. The arguments knots, w_s, and w_t are interpreted similarly as in the point-level functions. The optional control argument is a list in which some additional factors can be adjusted, such as the number of Monte Carlo repetitions to used in the approximation. The remainder of the demonstration focuses on the STCOS analysis; further details and examples for the basis functions can be found in the stcos manual.

Several options were described in Section 3 to compute the covariance matrix $\bm{K}$ ; the stcos package provides functions to assist with the computations.

1R> K = cov_approx_blockdiag(Qinv, S_fine)2R> K = cov_approx_randwalk(Qinv, S_fine)Both calls produce an $r\times r$ matrix. The call to cov_approx_randwalk corresponds to the random walk structure in (3.12), while cov_approx_blockdiag corresponds to (3.13) which assumes independence across time. The structure in (3.14) which represents independent and identically distributed elements of $\bm{\eta}$ can be achieved with K = Diagonal(n = r). The arguments Qinv, and S_fine correspond to the matrices $\bm{Q}^{-1}$ and $\bm{S}^{*}$ described in Section 3. Note that the function car_precision in stcos can be used to compute $\bm{Q}$ from adjacency matrix $\bm{W}$ .

1R> Q = car_precision(W, tau = 0.9, scale = TRUE)The matrix $\bm{I}-\tau\bm{D}^{-1}\bm{W}$ is returned if scale = TRUE; otherwise $\bm{D}-\tau\bm{W}$ is returned.

Although we focus on Bayesian analysis, a function to compute maximum likelihood estimates (MLEs) is provided.

1R> out = mle_stcos(z, v, H, S, K, init = list(sig2K = 1, sig2xi = 1))2R> sig2K_hat = out $sig2K_hat,3R> sig2xi_hat = out$ sig2xi_hat,4R> mu_hat = out $mu_hatSome details on MLE computation are given in Appendix [A](#A1). MLE computation is often much quicker than Bayesian computation, and may provide good starting values for an MCMC sampler. Here, H, S, and K are the matrices$ \bm{H} $,$ \bm{S} $, and$ \bm{K} $described in ([3.7](#S3.E7)), while z represents the vector$ \bm{Z} $and v is the diagonal of the matrix$ \bm{V}$. The Gibbs sampler described in Section 3 can be invoked using the gibbs_stcos function.

1R> out = gibbs_stcos(z, v, H, S, Kinv = solve(K),2+ R = 10000, report_period = 1000, burn = 1000, thin = 10,3+ init = init)4R> muB_mcmc = out $muB_hist5R> eta_mcmc = out$ eta_hist6R> xi_mcmc = out $xi_hist7R> sig2mu_mcmc = out$ sig2mu_hist8R> sig2xi_mcmc = out $sig2xi_hist9R> sig2K_mcmc = out$ sig2K_histSome helper functions are provided to process the output from the Gibbs sampler.

1print(out)2logLik(out)3DIC(out)4E_mcmc = fitted(out, H_new, S_new)5Y_mcmc = predict(out, H_new, S_new)The print function displays a brief summary of results from the sampler, while logLik computes the log-likelihood for each saved draw and DIC computes the Deviance information criterion (Spiegelhalter et al., 2002) using saved draws. Let $\tilde{\bm{H}}$ be an $\tilde{N}\times n$ overlap matrix and $\tilde{\bm{S}}$ be an $\tilde{N}\times r$ basis matrix computed from target supports of interest, and H_new and S_new denote their representations in code. Let $\tilde{\bm{Y}}$ denote the vector composed of the $\tilde{N}$ latent process variables

[TABLE]

associated with matrices $\tilde{\bm{H}}$ and $\tilde{\bm{S}}$ . The fitted function produces draws from the posterior distribution of the mean

[TABLE]

so that E_mcmc is a matrix with $\tilde{N}$ columns where each row corresponds to a saved draw. Alternatively, the predict function produces draws from the posterior distribution of

[TABLE]

5 Demonstration: City of Columbia neighborhoods

We now demonstrate an STCOS analysis on a small-scale but complete example using real data. Our target support consists of four neighborhoods in the City of Columbia in Boone County, Missouri. Geospatial data of the four neighborhoods has been provided by staff from the GIS Office for the City of Columbia. We would like to produce model-based estimates of median household income using observed ACS estimates from recent years. Specifically, we will consider 5-year ACS estimates at the block-group level for years 2013–2017 as our source supports, and will produce 5-year ACS estimates for year 2017 on the four neighborhoods as our target support.

The demonstration is split into several subsections. Section 5.1 considers raw inputs—ACS direct estimates and geographical features—and discusses how they can be assembled into a useful form for the analysis. Section 5.2 then prepares the inputs to the STCOS model: namely, $\bm{Z}$ , $\bm{V}$ , $\bm{H}$ , $\bm{S}$ , and $\bm{K}$ . Section 5.3 uses the Gibbs sampler in the stcos package to produce draws from the posterior distribution of STCOS parameters and consequently obtain the desired results from the analysis. Section 5.4 uses the Stan platform via the rstan package as an alternative method to obtain results. Finally, Section 5.5 compares the MLE to Bayesian results.

5.1 Assembling the data

We now briefly discuss how to gather geospatial data and ACS estimates and assemble them into sf objects for convenience. This is not intended to be an extensive guide, as numerous options to gather data (e.g. portals, APIs, and R packages) are available and continue to evolve. In Section 5.2, we will make use of datasets which have been constructed for the demonstration.

Geospatial data representing the target support were provided in the shapefile format (ESRI, 1998). We now read the file and transform it to a projection of choice.

1R> neighbs = st_read("neighborhoods.shp") %>% st_transform(crs = 3857)To prepare the source supports, we must gather ACS estimates and corresponding geographical features. For this example, ACS estimates can be requested from the Census Bureau’s Data API. The interface and data availability of the API are subject to change in the future, and examples shown next may need to be modified accordingly. Breakstone and Anderson (2019) provide a user guide with current specifications, including URL query format, available datasets, and codes for variable names. Note that limits are placed on the frequency and size of queries for unregistered users; higher-volume users may register for an API key to reduce restrictions. Estimates for our source supports can be requested from the API by constructing URLs with the following formats.

1R> est_url = paste(’https://api.census.gov/data/’, year,2+ ’/acs/acs5?get=NAME,B19013_001E&for=block%20group:&in=state:29+county:019’,3+ sep = ’’)4R> moe_url = paste(’https://api.census.gov/data/’, year,5+ ’/acs/acs5?get=NAME,B19013_001M&for=block%20group:&in=state:29+county:019’,6+ sep = ’’)Data for the direct point estimates and corresponding MOEs have been gathered using two separate calls to the API. The FIPS code for Missouri is 29 and the code for Boone County is 019. The variable B19013_001E represents point estimates for “Median household income in the past 12 months”, and B19013_001M represents corresponding MOEs. We can request the years of interest by taking year to be values 2013 through 2017. We use the jsonlite package (Ooms, 2014) to call the API and load the results into an R data.frame.

1R> json_data = jsonlite::fromJSON(est_url)2R> est_dat = data.frame(json_data[-1,])3R> colnames(est_dat) = json_data[1,]45R> json_data = jsonlite::fromJSON(moe_url)6R> moe_dat = data.frame(json_data[-1,])7R> colnames(moe_dat) = json_data[1,]We now merge est_dat and moe_dat together into a single data.frame.

1my_dat = est_dat %>%2inner_join(moe_dat, by = c(’state’ = ’state’, ’county’ = ’county’,3’tract’ = ’tract’, ’block group’ = ’block group’)) %>%4select(state, county, tract, blockgroup = ‘block group‘,5DirectEst = B19013_001E, DirectMOE = B19013_001M) %>%6mutate(DirectEst = as.numeric(DirectEst)) %>%7mutate(DirectMOE = as.numeric(DirectMOE)) %>%8mutate(DirectEst = replace(DirectEst, DirectEst < 0, NA)) %>%9mutate(DirectMOE = replace(DirectMOE, DirectMOE < 0, NA)) %>%10mutate(DirectVar = (DirectMOE / qnorm(0.95))^2) %>%11arrange(tract, blockgroup)There are a few details to mention in this data manipulation. We have taken some care because there is a space in the variable name block group, and because variables in the ACS data are interpreted as strings by default. We have transformed the MOE to a variance estimate, noting that published MOEs are to be interpreted as margins of error from $\alpha=0.90$ confidence intervals (U.S. Census Bureau, 2018); i.e.,

[TABLE]

where $z_{\alpha/2}\approx 1.645$ . We have also taken care to handle special values coded in the data; namely, large negative numbers for estimates and MOEs are returned by the API when estimates are not available,111https://census.gov/data/developers/data-sets/acs-1year/notes-on-acs-estimate-and-annotation-values.html which we convert to NA. We sort the entries by tract and block group for readability. The resulting data.frame appears as follows.

1R> head(my_dat)2 state county tract blockgroup DirectEst DirectMOE DirectVar31 29 019 000200 1 41063 6512 1567379942 29 019 000200 2 31250 6978 1799730353 29 019 000300 1 19420 7643 2159102264 29 019 000300 2 NA NA NA75 29 019 000300 3 21369 14558 7833375086 29 019 000500 1 10995 5563 11438356The presence of NA values in direct estimates—such as in tract 000300, blockgroup 2—can vary over area, year, and period. NA values will be addressed in Section 5.2, before the analysis. The tigris package (Walker, 2018) provides a convenient way to request shapefiles from the Census Bureau Tiger/Line database. It is necessary that all supports are converted to a common coordinate system for the analysis, so use the function st::transform to match the projection we used earlier in the target support.

1my_shp = tigris::block_groups(state = ’29’, county = ’019’, year = 2017) %>%2st_as_sf() %>%3st_transform(crs = 3857)Now we augment the geospatial data with direct point estimates, MOEs, and variance estimates obtained earlier.

1acs5_2017 = my_shp %>%2inner_join(my_dat, by = c(’STATEFP’ = ’state’, ’COUNTYFP’ = ’county’,3’TRACTCE’ = ’tract’, ’BLKGRPCE’ = ’blockgroup’)) %>%4select(geoid = GEOID, state = STATEFP, county = COUNTYFP,5tract = TRACTCE, blockgroup = BLKGRPCE,6DirectEst, DirectMOE, DirectVar)The resulting acs5_2017 is an object of type sf, whose first few entries are as follows (geometry column is not shown).

1R> head(acs5_2017)2Simple feature collection with 6 features and 8 fields3geometry type: POLYGON4dimension: XY5bbox: xmin: -10280690 ymin: 4712766 xmax: -10256290 ymax: 47521096CRS: EPSG:38577 geoid state county tract blockgroup DirectEst DirectMOE DirectVar81 290190005001 29 019 000500 1 10995 5563 1143835692 290190005002 29 019 000500 2 13872 9503 33378510103 290190006001 29 019 000600 1 45208 39073 564285643114 290190006002 29 019 000600 2 107500 19868 145899495125 290190020002 29 019 002000 2 62237 13529 67651414136 290190020003 29 019 002000 3 51019 11166 46082999

5.2 Preparing the analysis

The steps in Section 5.1 can be repeated so that all target, source, and fine-level supports are assembled as sf objects. The stcos package includes the following pre-constructed datasets to facilitate our demonstration.

1R> data("acs_sf")2R> ls(pattern = "acs5_.*")3[1] "acs5_2013" "acs5_2014" "acs5_2015" "acs5_2016" "acs5_2017"4R> data("columbia_neighbs")5R> ls(pattern = "columbia")6[1] "columbia_neighbs"Before we begin to prepare the terms in (3.7) for the STCOS model, let us create a version of the source supports with NA estimates removed. This will help to avoid complications in model fitting.

1source_2013 = acs5_2013 %>% filter(!is.na(DirectEst) & !is.na(DirectVar))2source_2014 = acs5_2014 %>% filter(!is.na(DirectEst) & !is.na(DirectVar))3source_2015 = acs5_2015 %>% filter(!is.na(DirectEst) & !is.na(DirectVar))4source_2016 = acs5_2016 %>% filter(!is.na(DirectEst) & !is.na(DirectVar))5source_2017 = acs5_2017 %>% filter(!is.na(DirectEst) & !is.na(DirectVar))We will choose our fine-level support based on the acs5_2017 geography; i.e. the block group level geography for Boone County in 2017. However, because we have dropped some areas from the source supports, we should check for areas in acs5_2017 which have zero or very little overlap with any areas in the source supports. If we identify such areas, we will drop them from the analysis to avoid rank-deficiency of the $\bm{H}$ matrix.

1U = rbind(2overlap_matrix(source_2013, acs5_2017, proportion = FALSE),3overlap_matrix(source_2014, acs5_2017, proportion = FALSE),4overlap_matrix(source_2015, acs5_2017, proportion = FALSE),5overlap_matrix(source_2016, acs5_2017, proportion = FALSE),6overlap_matrix(source_2017, acs5_2017, proportion = FALSE)7)8dom_fine = acs5_2017 %>%9mutate(keep = (colSums(U) >= 10)) %>%10filter(keep == TRUE) %>%11select(-c("DirectEst", "DirectMOE", "DirectVar", "keep"))12n = nrow(dom_fine)This creates dom_fine as a version of acs5_2017, excluding two block-groups having very little overlap (less than 10 square meters) with any of the source support areas, and ignoring the columns for the direct estimates, MOEs, and variance estimates.

The overlap matrix $\bm{H}$ for the analysis can now be created as follows.

1H = rbind(2overlap_matrix(source_2013, dom_fine),3overlap_matrix(source_2014, dom_fine),4overlap_matrix(source_2015, dom_fine),5overlap_matrix(source_2016, dom_fine),6overlap_matrix(source_2017, dom_fine)7)8N = nrow(H)To construct a bisquare basis, we must select spatio-temporal knot points. To select spatial knot points, we first draw a large number of points uniformly over the fine-level domain using the st_sample function. We then use the cover.design function in the fields package, which finds a subset of these points to fill the space.

1u = st_sample(dom_fine, size = 2000)2P = st_coordinates(u)3out = fields::cover.design(P, 200)4knots_sp = out $designTo select the spatial radius$ w_{s} $, we compute the$ 0.05$ quantile of the pairwise distances among the rows of knots_sp, as discussed in Section 3.

1ws_tilde = 12D = dist(knots_sp)3w_s = ws_tilde * quantile(D[D > 0], prob = 0.05, type = 1)Alternatively, evenly spaced points can be achieved with the hexagonal sampling method in the sf::st_sample function. This is quicker than fields::cover.design.

1u = st_sample(dom_fine, 200, type = "hexagonal")2knots_sp_alt = st_coordinates(u)3D = dist(knots_sp_alt)4w_s_alt = ws_tilde * quantile(D[D > 0], prob = 0.05, type = 1)Figure 3 illustrates the selected spatial knot points and radius using both the space-filling method and hexagonal sampling. Both methods succeed in creating a grid of evenly-spaced points, although the latter follow a more strict pattern. More evenly-spaced points can also be obtained with the space-filling method by taking an initial sample size larger than our selection of 2,000.

We choose the temporal knot points to be $(2009,2009.5,\ldots,2016.5,2017)$ , covering the years relevant to the 5-year ACS estimates for years 2013–2017.

1knots_t = seq(2009, 2017, by = 0.5)2w_t = 1More sophisticated date/time functions can assist in constructing temporal knots, though a numerical representation is ultimately needed. An alternative choice for temporal knots created with Date objects is given next. When treated as numerical, such objects represent days elapsed since January 1, 1970. Here we may again use the quantile approach to determine a radius which is suitable for this unit of time.

1dates = seq(as.Date("2009-01-01"), as.Date("2017-01-01"), by = "6 months")2knots_t_alt = as.numeric(dates)3wt_tilde_alt = 14D = dist(knots_t_alt)5w_t_alt = wt_tilde_alt * quantile(D[D > 0], prob = 0.05, type = 1)Now we use the merge function in the base package (R Core Team, 2020) to perform a Cartesian join between the spatial knots knots_sp and temporal knots knots_t, which yields the set of spatio-temporal knots.

1knots = merge(knots_sp, knots_t)Now, we use the basis functions to compute the design matrix $\bm{S}$ .

1bs_ctrl = list(mc_reps = 500)2S_full = rbind(3areal_spacetime_bisquare(source_2013, 2009:2013, knots, w_s, w_t, bs_ctrl),4areal_spacetime_bisquare(source_2014, 2010:2014, knots, w_s, w_t, bs_ctrl),5areal_spacetime_bisquare(source_2015, 2011:2015, knots, w_s, w_t, bs_ctrl),6areal_spacetime_bisquare(source_2016, 2012:2016, knots, w_s, w_t, bs_ctrl),7areal_spacetime_bisquare(source_2017, 2013:2017, knots, w_s, w_t, bs_ctrl)8)We can also compute the design matrix $\bm{S}^{*}$ on the fine-level support, which is needed to compute $\bm{K}$ under some of the possible structures.

1S_fine_full = rbind(2areal_spacetime_bisquare(dom_fine, 2009, knots, w_s, w_t, bs_ctrl),3areal_spacetime_bisquare(dom_fine, 2010, knots, w_s, w_t, bs_ctrl),4areal_spacetime_bisquare(dom_fine, 2011, knots, w_s, w_t, bs_ctrl),5areal_spacetime_bisquare(dom_fine, 2012, knots, w_s, w_t, bs_ctrl),6areal_spacetime_bisquare(dom_fine, 2013, knots, w_s, w_t, bs_ctrl),7areal_spacetime_bisquare(dom_fine, 2014, knots, w_s, w_t, bs_ctrl),8areal_spacetime_bisquare(dom_fine, 2015, knots, w_s, w_t, bs_ctrl),9areal_spacetime_bisquare(dom_fine, 2016, knots, w_s, w_t, bs_ctrl),10areal_spacetime_bisquare(dom_fine, 2017, knots, w_s, w_t, bs_ctrl)11)Next we need vectors z and v to represent the direct point estimates and associated variance estimates.

1z = c(source_2013 $DirectEst, source_2014$ DirectEst, source_2015 $DirectEst,2source_2016$ DirectEst, source_2017 $DirectEst)3v = c(source_2013$ DirectVar, source_2014 $DirectVar, source_2015$ DirectVar,4source_2016 $DirectVar, source_2017$ DirectVar)Because z and v contain rather large numbers, we standardize z for the analysis and make a corresponding transformation to v.

1z_scaled = scale(z)2v_scaled = v / var(z)The expression for v_scaled arises from considering $\operatorname{Var}[a^{-1/2}(Z_{i}-b)]=a^{-1}\operatorname{Var}(Z_{i})$ for constants $a>0$ and $b\in\mathbb{R}$ , which is estimated by $a^{-1}\bm{e}_{i}^{\top}\bm{V}\bm{e}_{i}$ with $\bm{e}_{i}$ the $i$ th column of an $N\times N$ identity matrix. The design matrix $\bm{S}$ with our choice of basis function can have a large number of columns and a high degree of multicollinearity; if not addressed, this can lead to poor mixing in the MCMC sampler. A simple workaround is to reduce the dimension of $\bm{S}$ using principal components analysis (PCA). First we compute the reduction, using 65% of the variability, as expressed as a proportion of the eigenvalues.

1eig = eigen(t(S_full) %*% S_full)2idx_S = which(cumsum(eig $values) / sum(eig$ values) < 0.65)Figure 4 shows that this can be accomplished by projecting from the original 3,400 columns to $r=19$ columns. Now we apply the reduction to $\bm{S}$ as well as $\bm{S}^{*}$ .

1Tx_S = eig $vectors[,idx_S]2S = S_full %*% Tx_S3S_fine = S_fine_full %*% Tx_S4r = ncol(S)The last ingredient needed to run the analysis is the matrix$ \bm{K} $. We will use the random walk structure in ([3.12](#S3.E12)) to express both spatial and temporal dependence. First, let us compute the covariance matrix$ \bm{Q}^{-1}$ of a CAR process for the fine-level support.

1W = adjacency_matrix(dom_fine)2Q = car_precision(W, tau = 0.9, scale = TRUE)3Qinv = solve(Q)Now compute $\bm{K}$ using $\bm{Q}^{-1}$ and $\bm{S}^{*}$ .

1K = cov_approx_randwalk(Qinv, S_fine)

5.3 Fitting with Gibbs sampler

We now proceed to run the Gibbs sampler. We will produce a chain of 10,000 iterations, discard the first 2,000 draws, and keep one of every 10th remaining draw. We will use hyperparameters $a_{\mu}=1$ , $b_{\mu}=1$ , $a_{K}=1$ , $b_{K}=2$ , $a_{\xi}=1$ , and $b_{\xi}=2$ .

1R> hyper = list(a_sig2K = 1, b_sig2K = 2, a_sig2xi = 1, b_sig2xi = 2,2+ a_sig2mu = 1, b_sig2mu = 2)3R> gibbs_out = gibbs_stcos(z = z_scaled, v = v_scaled, H = H, S = S,4+ Kinv = Kinv, R = 10000, report_period = 2000, burn = 2000,5+ thin = 10, hyper = hyper)62020-05-17 17:11:15 - Begin Gibbs sampler72020-05-17 17:11:52 - Begin iteration 200082020-05-17 17:12:29 - Begin iteration 400092020-05-17 17:12:50 - Begin iteration 6000102020-05-17 17:13:09 - Begin iteration 8000112020-05-17 17:13:28 - Begin iteration 10000122020-05-17 17:13:28 - Finished Gibbs sampler13R> print(gibbs_out)14Fit for STCOS model15--16 Mean SD 2.5% 25% 75% 97.5%17sig2mu 0.52574414 0.094603198 0.36964018 0.45835116 0.57962109 0.7425270118sig2K 1.12632689 0.829211900 0.34238233 0.62142495 1.34628662 3.3518350919sig2xi 0.04368451 0.005022067 0.03485603 0.04021765 0.04669926 0.0553360320--21Saved 800 draws22DIC: 210.79898123Elapsed time: 00:02:08The mcmc class in the coda package (Plummer et al., 2006) helps to manage and plot the draws.

1library("coda")2varcomps_mcmc = mcmc(data.frame(3sig2mu = gibbs_out $sig2mu_hist,4sig2xi = gibbs_out$ sig2xi_hist,5sig2K = gibbs_out $sig2K_hist6))7plot(varcomps_mcmc)Figure [5](#Sx1.F5) displays trace and density plots of the variance components$ \sigma_{\mu}^{2} $,$ \sigma_{\xi}^{2} $, and$ \sigma_{K}^{2}$.

Using the fitted model, we can produce model-based estimates on target supports of interest. In this example, we would like to produce 5-year 2017 estimates for our four neighborhoods in Boone County: Central, East, North, and Paris. The following code computes model-based estimates for these areas and embeds them into an sf object for plotting.

1nb_out = neighbs2H_new = overlap_matrix(nb_out, dom_fine) # New overlap3S_new_full = areal_spacetime_bisquare(nb_out,42013:2017, knots, w_s, w_t, bs_ctrl) # New basis fn5S_new = S_new_full %*% Tx_S # Reduce dimension67EY_scaled = fitted(gibbs_out, H_new, S_new) # Get draws of E(Y)8EY = sd(z) * EY_scaled + mean(z) # Uncenter and unscale910alpha = 0.1011nb_out $E_mean = colMeans(EY) # Point estimates12nb_out$ E_sd = apply(EY, 2, sd) # SDs13nb_out $E_lo = apply(EY, 2, quantile, prob = alpha/2) # Credible interval lo14nb_out$ E_hi = apply(EY, 2, quantile, prob = 1-alpha/2) # Credible interval hi15nb_out $E_median = apply(EY, 2, median) # Median16nb_out$ E_moe = apply(EY, 2, sd) * qnorm(1-alpha/2) # MOE

The objects H_new and S_new represent design matrices $\tilde{\bm{H}}$ and $\tilde{\bm{S}}$ , respectively, based on the geography of neighborhoods. The fitted function was then used to produce draws from the posterior distribution of $\operatorname{E}(\tilde{\bm{Y}})$ . We then transformed the resulting estimates back to the original scale, having previously centered and scaled them before model fitting. The remainder of the code display summarizes draws of the posterior mean in several ways, obtaining a model-based estimate of its median, mean, standard deviation, MOE ( $z_{\alpha/2}\times\text{standard deviation}$ ), and a 90% credible interval. The resulting sf object is displayed below (geometry column is not shown).

1R> print(nb_out)2Simple feature collection with 4 features and 7 fields3geometry type: POLYGON4dimension: XY5bbox: xmin: -10280270 ymin: 4715036 xmax: -10269750 ymax: 47238606CRS: EPSG:38577 Region E_mean E_sd E_lo E_hi E_median E_moe81 Central 26705.85 1921.623 23456.96 29709.53 26720.70 3160.78892 East 44127.78 2450.811 40155.21 47983.46 44186.03 4031.225103 North 44171.24 2863.373 39519.40 48933.11 44040.79 4709.829114 Paris63Corridor 20386.72 3663.098 14448.61 26325.73 20309.78 6025.261We are now ready to plot our estimates. The code to reproduce our plots is somewhat lengthy and can be found in the supplemental materials. First we compare direct and model-based estimates for 2017 source supports to assess whether the model fit is reasonable. Figures 6(a) and 6(b) show maps of the two sets of estimates. Figures 6(c) and 6(d) compare the two sets of estimates via scatter plots; year 2014 and year 2017 estimates are shown for comparison. Variation between direct and model-based estimates appears to be smaller for year 2014, with the exception of the block group with the largest direct estimate that year. Finally, Figure 7 shows the four neighborhoods of our target support in the context of the 2017 5-year direct estimates. This provides a visual aid to assess plausibility of the target support estimates. The North and East neighborhoods appear to be in the immediate vicinity of block groups with higher median household income than the West and Paris neighborhoods.

5.4 Fitting with Stan

We will now refit the model from Section 5.3 using Stan instead of the Gibbs sampler. First, we will need a Stan model specification. We will create a file named stcos.stan with the following contents.

1data {2int<lower=0> N; int<lower=0> n; int<lower=0> r;3vector[N] z; vector[N] v; matrix[N,n] H;4matrix[N,r] S; matrix[r,r] K; real alpha_K;5real beta_K; real alpha_xi; real beta_xi;6real alpha_mu; real beta_mu;7}8parameters {9vector[n] mu; real<lower=0> sig2K;10vector[r] eta; real<lower=0> sig2xi;11vector[N] xi; real<lower=0> sig2mu;12}13model {14sig2K ~ inv_gamma(alpha_K, beta_K);15sig2xi ~ inv_gamma(alpha_xi, beta_xi);16sig2mu ~ inv_gamma(alpha_mu, beta_mu);17eta ~ multi_normal(rep_vector(0,r), sig2K * K);18mu ~ normal(0, sqrt(sig2mu));19xi ~ normal(0, sqrt(sig2xi));20z ~ normal(Hmu + Seta + xi, sqrt(v));21}Now, in R, pass the data and model specification to stan to initiate fitting.

1library("rstan")2stan_dat = list(3N = N, n = n, r = r, z = z_scaled, v = v_scaled, H = as.matrix(H),4S = as.matrix(S), K = as.matrix(K),5alpha_K = 1, beta_K = 2, alpha_xi = 1, beta_xi = 2, alpha_mu = 1, beta_mu = 26)

1R> stan_out = stan(file = "stcos.stan", data = stan_dat, iter = 2000, chains = 2)2SAMPLING FOR MODEL ’stcos’ NOW (CHAIN 1).3...4Chain 1: Elapsed Time: 11.8561 seconds (Warm-up)5Chain 1: 10.5266 seconds (Sampling)6Chain 1: 22.3827 seconds (Total)7...8SAMPLING FOR MODEL ’stcos’ NOW (CHAIN 2).9...10Chain 2: Elapsed Time: 10.9951 seconds (Warm-up)11Chain 2: 9.87796 seconds (Sampling)12Chain 2: 20.873 seconds (Total)Here we have requested two chains of length 2,000 each. In addition to the time needed for sampling, Stan may require time to compile the model specification. Upon successful completion of sampling, the following R code can be used to extract draws and produce results.

1stan_draws = extract(stan_out, pars = c("mu", "eta"), permuted = TRUE)23nb_out = neighbs4H_new = overlap_matrix(nb_out, dom_fine) # New overlap5S_new_full = areal_spacetime_bisquare(nb_out,62013:2017, knots, w_s, w_t, bs_ctrl) # New basis fn7S_new = S_new_full %% Tx_S # Reduce dimension89EY_scaled = stan_draws $mu %*% t(H_new) +10 stan_draws$ eta %% t(S_new) # Draws of E(Y)11EY = sd(z) * EY_scaled + mean(z) # Uncenter and unscale1213alpha = 0.1014nb_out $E_mean = colMeans(EY) # Point estimates15nb_out$ E_sd = apply(EY, 2, sd) # SDs16nb_out $E_lo = apply(EY, 2, quantile, prob = alpha/2) # Credible interval lo17nb_out$ E_hi = apply(EY, 2, quantile, prob = 1-alpha/2) # Credible interval hi18nb_out $E_median = apply(EY, 2, median) # Median19nb_out$ E_moe = apply(EY, 2, sd) * qnorm(1-alpha/2) # MOEThe result of print(nb_out) can be compared to the corresponding output from the Gibbs sampler in Section 5.3.

5.5 Fitting with Maximum Likelihood

Finally, we compute maximum likelihood estimates to compare to our Bayesian results.

1R> mle_out = mle_stcos(z_scaled, v_scaled, H, S, K)2R> print(mle_out $sig2K_hat)3[1] 1.310004e-114R> print(mle_out$ sig2xi_hat)5[1] 1.225813e-116R> print(mle_out $loglik)7[1] 67.29006Estimates for both$ \sigma_{\xi}^{2} $and$ \sigma_{K}^{2} $are very small, which indicates that the direct variance estimates$ \bm{V} $are capturing much of the variability among$ \bm{Z} $. This can be contrasted with the Bayesian approach, which finds a non-zero effect of$ \sigma_{\xi}^{2} $and$ \sigma_{K}^{2} $through the addition of prior information. The following code extracts the MLE$ \hat{\bm{\mu}}_{B} $, computes estimates$ \tilde{\bm{H}}\hat{\bm{\mu}}$ of the four neighborhoods, and transforms those estimates to the original scale of the direct estimates.

1H_new = overlap_matrix(neighbs, dom_fine)2mu_hat = mle_out $mu_hat3z_hat_scaled = as.numeric(H_new %*% mu_hat)4z_hat = sd(z) * z_hat_scaled + mean(z)Figure [8](#Sx1.F8) plots MLEs with box plots of corresponding saved draws from the Gibbs sampler. Figure [8(a)](#Sx1.F8.sf1) plots the first 10 components of$ \hat{\bm{\mu}}{B} $, while Figure [8(b)](#Sx1.F8.sf2) plots the four components of$ \tilde{\bm{H}}\hat{\bm{\mu}} $corresponding to neighborhoods. Code to reproduce Figure [8](#Sx1.F8) is provided in the supplemental materials. Bayesian and maximum likelihood estimates are not seen to be vastly different, and we anticipate that they would become closer as the total number of observations$ N $becomes large relative to$ n{B} $. In the current setting where$ N=421 $and$ n_{B}=85$, we would recommend the Bayesian approach.

6 Conclusions

In this article, we have demonstrated a complete implementation of STCOS methodology for R users. We worked through a small example to estimate median household income in several neighborhoods in the City of Columbia in Boone County, MO. Established R packages such as sf, dplyr, Matrix, and rstan were instrumental in the process, from initially gathering the data, to carrying out the MCMC, to placing results into a usable form. The stcos package was introduced to assist with some intricate programming steps not covered by other packages, especially computing areal spatio-temporal basis functions. Use of the highlighted tools significantly reduces the learning curve to program an analysis; however, some technical experience and effort are still required for a successful implementation. Future work may involve additional improvements to the stcos package for efficiency and usability, as well as software support for other spatial and spatio-temporal methodologies.

Acknowledgements

This research was partially supported by the U.S. National Science Foundation (NSF) and the U.S. Census Bureau under NSF grant SES-1132031, funded through the NSF-Census Research Network (NCRN) program, and NSF Awards SES-1853096 and SES-1853099. This article is released to inform interested parties of ongoing research and to encourage discussion. The views expressed on statistical issues are those of the authors and not the NSF or U.S. Census Bureau. The authors thank Taylor Bowen and Toni Messina from the Office of Information Technology/GIS, City of Columbia, Missouri for supplying the shapefile used in the case study and for useful discussion.

Appendix A Computational details and proofs

We will make use of the following well-known property in several places.

Property A.1.

If $\bm{A}\in\mathbb{R}^{m\times k}$ , $\bm{B}\in\mathbb{R}^{k\times l}$ , $\bm{C}\in\mathbb{R}^{l\times n}$ , then $\text{vec}(\bm{A}\bm{B}\bm{C})=(\bm{C}^{\top}\otimes\bm{A})\text{vec}(\bm{B})$ .

The following proposition gives the explicit solution to the minimization problem stated in (3.10). Bradley et al. (2015a) considers a similar problem featuring a more general objective function but assuming that the columns of $\bm{S}$ are orthonormal. Higham (1988) gives a general discussion of problems involving Frobenius and 2-norm distance minimization.

Proposition A.2 (Frobenius Norm Minimization).

Suppose $\bm{S}\in\mathbb{R}^{n\times r}$ has rank $r$ and $\bm{\Sigma}\in\mathbb{R}^{n\times n}$ is positive definite. The minimizer $\bm{X}\in\mathbb{R}^{r\times r}$ of $\lVert\bm{\Sigma}-\bm{S}\bm{X}\bm{S}^{\top}\rVert_{\text{F}}$ is $\bm{X}=(\bm{S}^{\top}\bm{S})^{-1}\bm{S}^{\top}\bm{\Sigma}\bm{S}(\bm{S}^{\top}\bm{S})^{-1}$ .

Proof.

Using Property A.1, we have

[TABLE]

where the norm on the last line is the usual 2-norm on $\mathbb{R}^{n^{2}}$ . We recognize the expression in (A.1) as a standard least squares minimization whose solution is

[TABLE]

Therefore, the minimizer is $\bm{X}=(\bm{S}^{\top}\bm{S})^{-1}\bm{S}^{\top}\bm{\Sigma}\bm{S}(\bm{S}^{\top}\bm{S})^{-1}$ , as desired. ∎

Remark A.3 (MLE Computation).

To compute the MLE for the STCOS model, we first note that the likelihood, excluding the parameter model, is

[TABLE]

where $\bm{\Delta}=\sigma_{\xi}^{2}\bm{I}+\bm{V}+\sigma_{K}^{2}\bm{S}\bm{K}\bm{S}^{\top}$ . Given $\sigma_{K}^{2}$ and $\sigma_{\xi}^{2}$ , the likelihood is maximized by the weighted least squares estimator $\hat{\bm{\mu}}_{B}=(\bm{H}^{\top}\bm{\Delta}^{-1}\bm{H})^{-1}\bm{H}^{\top}\bm{\Delta}^{-1}\bm{Z}$ . To estimate the unknown $\sigma_{K}^{2}$ and $\sigma_{\xi}^{2}$ , we carry out numerical maximization on the partially maximized log-likelihood

[TABLE]

To enforce the constraints that $\sigma_{K}^{2}>0$ and $\sigma_{\xi}^{2}>0$ , we optimize over $(\vartheta_{1},\vartheta_{2})\in\mathbb{R}^{2}$ and take $\sigma_{K}^{2}=\exp(\vartheta_{1})$ , $\sigma_{\xi}^{2}=\exp(\vartheta_{2})$ .

Appendix B Supplementary: Scaling to Large Datasets

The demonstration in Section 5 was carried out with a relatively small dataset based on $N=421$ total observations in the source supports and $n_{B}=85$ areas in the fine-level support. Here we will list some challenges that may be encountered when scaling to larger datasets.

Work to compute the basis function (3.6) is proportional to the number of areas in the supports, the length of their lookback periods, and the number of Monte Carlo repetitions requested. Basis computations are independent across supports, or across areas within the same support, and therefore can be computed in parallel. 2. 2.

Time to compute the overlap matrix $\bm{H}$ also increases proportionally with the number of areas in supports. This is less substantial than basis function computation, but can also be parallelized across supports. 3. 3.

The Monte Carlo approximation for (3.6) utilizes a rejection sampling method to draw a sample from each area $A$ , first drawing a point from the bounding box that surrounds $A$ , then accepting the point only if it belongs to $A$ itself. This method has difficulty accepting samples when $|A|$ is very small relative to the bounding box; for example, if $A$ is a thin rectangle. 4. 4.

We used PCA to reduce the dimension of $\bm{S}$ based on a spectral decomposition of $\bm{S}^{\top}\bm{S}$ . Before dimension reduction, the matrix $\bm{S}^{\top}\bm{S}$ is typically sparse, but the dimension can be quite large depending on the number of knot points used. Here it can be helpful to request a limited number of eigenvalue/eigenvector pairs; for example, this can be done using sparse matrices with the RSpectra package (Qiu and Mei, 2019). After dimension reduction, neither $\bm{S}$ nor $\bm{S}^{\top}\bm{S}$ are typically sparse. 5. 5.

Although the CAR precision matrix $\bm{Q}$ is typically sparse, its inverse is dense when it exists. However, $\bm{Q}^{-1}$ is currently used only in the construction of $\bm{K}$ . 6. 6.

The Gibbs sampler in Algorithm 3.1 involves repeated operations with $n_{B}\times n_{B}$ and $r\times r$ matrices: $r$ can be kept to a manageable size using the suggested dimension reduction, but $n_{B}$ depends on the choice of fine-level support. 7. 7.

For both the Gibbs and Stan sampling, storing all of $\bm{\mu}_{B}\in\mathbb{R}^{n_{B}}$ , $\bm{\xi}\in\mathbb{R}^{N}$ , and $\bm{\eta}\in\mathbb{R}^{r}$ for every iteration of the sampler can become a memory/storage burden. The vector $\bm{\xi}$ has not been used in post-processing and need not be stored. However, $\bm{\mu}_{B}$ and $\bm{\eta}$ are both utilized in post-processing. For very large $n_{B}$ , it may be more efficient to compute desired functions of the draws within the sampler rather than saving the draws for later use. 8. 8.

To support a sparse representation for a large $\bm{H}$ matrix in Stan, we can make use of Stan’s csr_matrix_times_vector function to compute $\bm{H}\bm{\mu}_{B}$ . 9. 9.

The MLE computation in Remark A.3 utilizes the $N\times N$ covariance matrix $\bm{\Delta}=\sigma_{K}^{2}\bm{S}\bm{K}\bm{S}^{\top}+\sigma_{\xi}^{2}\bm{I}+\bm{V}$ of the marginal distribution of $\bm{Z}$ . This matrix is also utilized in a Stan sampler with the quantities $\bm{\xi}$ and $\bm{\eta}$ integrated out. We may not be able to explicitly construct $\bm{\Delta}$ if $N$ is very large. As previously noted, the matrix $\bm{S}$ will be dense if its dimension has been reduced via PCA, so that $\bm{\Delta}$ in turn will also be dense. Some additional matrix algebra may assist in computing the likelihood. For example, the Sherman-Morrison-Woodbury identity yields

[TABLE]

where $\bm{U}=\sigma_{\xi}^{2}\bm{I}+\bm{V}$ . Using (B.1), the quadratic form $\bm{Z}^{\top}\bm{\Delta}^{-1}\bm{Z}$ can be computed without forming (or inverting) any $N\times N$ matrices.

To illustrate the methods on a larger dataset, code for a larger scale data analysis is also provided in the supplemental materials for this article. Here the analysis is structured similarly to Raim et al. (2017). There are 17 source supports with counties in the continental United States and median household income as the ACS variable of interest. We use 1-year estimates for years 2011–2017, 3-year estimates for years 2011–2013, and 5 year estimates for years 2011–2017. The 2017 county geography was taken to be the fine-level support, and the target support was taken to be 2017 congressional districts. This yields $N=32943$ observations in the source supports and $n_{B}=3105$ fine-level support areas. For the dimension reduction of $\bm{S}$ , we took $r=56$ . The code was run on an Intel Core i7-2600 3.40GHz with 4 cores and 8 GB of memory using stcos version 0.3.0. The following highlights were noted.

Overall run time was about 5.5 hours. This consisted of: 1 minute to download and assemble data using Census API and tigris package as in Section 5.1, 167 minutes to prepare the analysis, similarly to Section 5.2, 75 minutes to run the Gibbs sampler and produce plots of estimates for congressional districts, and 88 minutes to run the Stan sampler and produce equivalent plots. 2. 2.

Overlap matrix $\bm{H}$ took 9 minutes to construct. 3. 3.

Design matrix $\bm{S}$ took 83 minutes to construct from basis functions. Each 1-year, 3-year, and 5-year source support took about 1 minute, 4 minutes, and 9 minutes, respectively. 4. 4.

Design matrix $\bm{S}^{*}$ took about 70 minutes to construct. 5. 5.

Constructing $\bm{K}$ via (3.12) and (3.13) took 2 minutes and 7 seconds, respectively. 6. 6.

Gibbs sampler with 10000 iterations, burn-in of 2000, and thinning to save 1 of every remaining 10 draws took 71 minutes. 7. 7.

Stan sampling with 2000 iterations, burn-in of 1200, and no thinning took 88 minutes.

Note that these results are intended to give a rough idea of run times, and may vary depending on hardware, installed libraries, compilers, and many other factors. Improvements may be possible in future versions of stcos, or in the analysis code itself, to improve scalability.

Bibliography47

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Banerjee et al. (2014) Sudipto Banerjee, Bradley P. Carlin, and Alan E. Gelfand. Hierarchical modeling and analysis for spatial data . Chapman and Hall/CRC, 2nd edition, 2014.
2Bates and Maechler (2019) Douglas Bates and Martin Maechler. Matrix : Sparse and Dense Matrix Classes and Methods , 2019. URL https://CRAN.R-project.org/package=Matrix . R package version 1.2-18.
3Battersby et al. (2014) Sarah E. Battersby, Michael P. Finn, E. Lynn Usery, and Kristina H. Yamamoto. Implications of Web Mercator and its use in online mapping. Cartographica: The International Journal for Geographic Information and Geovisualization , 49(2):85–101, 2014.
4Bivand et al. (2013) Roger S. Bivand, Edzer Pebesma, and Virgilio Gómez-Rubio. Applied Spatial Data Analysis with R . Springer, 2nd edition, 2013.
5Bradley et al. (2015 a) Jonathan R. Bradley, Scott H. Holan, and Christopher K. Wikle. Multivariate spatio-temporal models for high-dimensional areal data with application to Longitudinal Employer-Household Dynamics. Annals of Applied Statistics , 9(4):1761–1791, 2015 a.
6Bradley et al. (2015 b) Jonathan R. Bradley, Christopher K. Wikle, and Scott H. Holan. Spatio-temporal change of support with application to American Community Survey multi-year period estimates. Stat , 4(1):255–270, 2015 b.
7Breakstone and Anderson (2019) Carole D. Breakstone and Tammy S. Anderson. Census Data API User Guide , July 2019. URL https://www.census.gov/data/developers/guidance/api-user-guide.html . Version 1.6.
8Brunsdon (2014) Chris Brunsdon. pycno: Pycnophylactic Interpolation , 2014. URL https://CRAN.R-project.org/package=pycno . R package version 1.2.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Spatio-Temporal Change of Support Modeling with R

Abstract

1 Introduction

2 Change of support concepts and the ACS

3 The STCOS model

4 Implementing STCOS in R

5 Demonstration: City of Columbia neighborhoods

5.1 Assembling the data

5.2 Preparing the analysis

5.3 Fitting with Gibbs sampler

5.4 Fitting with Stan

5.5 Fitting with Maximum Likelihood

6 Conclusions

Acknowledgements

Appendix A Computational details and proofs

Property A.1**.**

Proposition A.2** (Frobenius Norm Minimization).**

Proof.

Remark A.3** (MLE Computation).**

Appendix B Supplementary: Scaling to Large Datasets

Property A.1.

Proposition A.2 (Frobenius Norm Minimization).

Remark A.3 (MLE Computation).