# Probabilistic Predictive Principal Component Analysis for   Spatially-Misaligned and High-Dimensional Air Pollution Data with Missing   Observations

**Authors:** Phuong T. Vu, Timothy V. Larson, Adam A. Szpiro

arXiv: 1905.00393 · 2020-05-19

## TL;DR

This paper introduces probabilistic predictive PCA methods tailored for high-dimensional, spatially-misaligned air pollution data with missing observations, enhancing prediction accuracy through model-based imputation.

## Contribution

It develops a novel probabilistic predictive PCA framework that effectively handles missing data and spatial misalignment in multi-pollutant air quality datasets.

## Key findings

- Improved spatial prediction accuracy for PM2.5 concentrations.
- Effective handling of complex missing data patterns.
- Enhanced predictive performance over existing PCA methods.

## Abstract

Accurate predictions of pollutant concentrations at new locations are often of interest in air pollution studies on fine particulate matters (PM$_{2.5}$), in which data is usually not measured at all study locations. PM$_{2.5}$ is also a mixture of many different chemical components. Principal component analysis (PCA) can be incorporated to obtain lower-dimensional representative scores of such multi-pollutant data. Spatial prediction can then be used to estimate these scores at new locations. Recently developed predictive PCA modifies the traditional PCA algorithm to obtain scores with spatial structures that can be well predicted at unmeasured locations. However, these approaches require complete data, whereas multi-pollutant data tends to have complex missing patterns in practice. We propose probabilistic versions of predictive PCA which allow for flexible model-based imputation that can account for spatial information and subsequently improve the overall predictive performance.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.00393/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1905.00393/full.md

## References

56 references — full list in the complete paper: https://tomesphere.com/paper/1905.00393/full.md

---
Source: https://tomesphere.com/paper/1905.00393