# Kernel generalized least squares regression for network-structured data

**Authors:** Edward Antonian, Gareth W. Peters, Michael Chantler

PMC · DOI: 10.1371/journal.pone.0324087 · PLOS One · 2025-05-30

## TL;DR

This paper introduces new methods for predicting network-structured data using regression models, showing improved performance in time-series applications like air quality monitoring.

## Contribution

The paper extends Kernel Graph Regression (KGR) to handle missing data and correlated errors, and introduces a scalable GLS-KGR algorithm.

## Key findings

- The GLS-KGR algorithm outperforms standard techniques in time-series applications with network-structured data.
- A scalable expression for marginal variance is derived using the Laplace approximation for out-of-sample prediction error.
- The proposed methods are validated on both synthetic and real-world air quality data from California.

## Abstract

In this paper, we study a class of non-parametric regression models for predicting graph signals {𝐲t} as a function of explanatory variables {𝐱t}. Recently, Kernel Graph Regression (KGR) and Gaussian Processes over Graph (GPoG) have emerged as promising techniques for this task. The goal of this paper is to examine several extensions to KGR/GPoG, with the aim of generalising them a wider variety of data scenarios. The first extension we consider is the case of graph signals that have only been partially recorded, meaning a subset of their elements is missing at observation time. Next, we examine the statistical effect of correlated prediction error and propose a method for Generalized Least Squares (GLS) on graphs. In particular, we examine Autoregressive AR(1) vector autoregressive processes, which are commonly found in time-series applications. Finally, we use the Laplace approximation to determine a lower bound for the out-of-sample prediction error and derive a scalable expression for the marginal variance of each prediction. These methods are tested on both real and synthetic data, with the former taken from a network of air quality monitoring stations across California. We find evidence that the generalised GLS-KGR algorithm is well-suited to such time-series applications, outperforming several standard techniques on this dataset.

## Full-text entities

- **Diseases:** burn (MESH:D002056), fire (MESH:D000092422), PCA (MESH:C562643)
- **Chemicals:** Ozone (MESH:D010126), CO (MESH:D002248), PM10 (-), NO2 (MESH:D009585), T (MESH:D014316)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12124555/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12124555/full.md

## References

65 references — full list in the complete paper: https://tomesphere.com/paper/PMC12124555/full.md

---
Source: https://tomesphere.com/paper/PMC12124555