# Multivariate Functional Data Modeling with Time-varying Clustering

**Authors:** Philip A. White, Alan E. Gelfand

arXiv: 1904.11518 · 2021-01-06

## TL;DR

This paper develops a model-based clustering approach for multivariate functional data collected over time at multiple sites, allowing for time-varying clusters using Gaussian processes and Dirichlet processes, demonstrated on ozone and PM10 data from Mexico City.

## Contribution

It introduces a novel framework for time-varying clustering of multivariate functional data using Gaussian processes and Dirichlet processes, with a practical partitioning approach for continuous-time data.

## Key findings

- Effective clustering of ozone and PM10 levels across sites.
- Demonstrated the method on real environmental data from Mexico City.
- Captured temporal changes in pollutant patterns.

## Abstract

We consider the situation where multivariate functional data has been collected over time at each of a set of sites. Our illustrative setting is bivariate, monitoring ozone and PM$_{10}$ levels as a function of time over the course of a year at a set of monitoring sites. The data we work with is from 24 monitoring sites in Mexico City which record hourly ozone and PM$_{10}$ levels. We use the data for the year 2017. Hence, we have 48 functions to work with. Our objective is to implement model-based clustering of the functions across the sites. Using our example, such clustering can be considered for ozone and PM$_{10}$ individually or jointly. It may occur differentially for the two pollutants. More importantly for us, we allow that such clustering can vary with time.   We model the multivariate functions across sites using a multivariate Gaussian process. With many sites and several functions at each site, we use dimension reduction to provide a stochastic process specification for the distribution of the collection of multivariate functions over the say $n$ sites. Furthermore, to cluster the functions, either individually by component or jointly with all components, we use the Dirichlet process which enables shared labeling of the functions across the sites. Specifically, we cluster functions based on their response to exogenous variables. Though the functions arise in continuous time, clustering in continuous time is extremely computationally demanding and not of practical interest. Therefore, we employ a partitioning of the time scale to capture time-varying clustering.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.11518/full.md

## Figures

40 figures with captions in the complete paper: https://tomesphere.com/paper/1904.11518/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/1904.11518/full.md

---
Source: https://tomesphere.com/paper/1904.11518