On Misspecified Error Distributions in Bayesian Functional Clustering: Consequences and Remedies
Fumiya Iwashige, Tomoya Wakayama, Shonosuke Sugasawa, Shintaro Hashimoto

TL;DR
This paper investigates how misspecifying error structures in Bayesian functional clustering causes overestimation of clusters and proposes modeling error dependence with Gaussian processes to improve clustering accuracy.
Contribution
It identifies the fundamental cause of overestimating clusters as error misspecification and introduces a Gaussian process-based approach to incorporate error correlation in Bayesian models.
Findings
Modeling error dependence reduces overestimation of clusters.
Gaussian process approach improves clustering performance.
Simple Dirichlet process clustering benefits from error correlation modeling.
Abstract
Nonparametric Bayesian approaches provide a flexible framework for clustering without pre-specifying the number of groups, yet they are well known to overestimate the number of clusters, especially for functional data. We show that a fundamental cause of this phenomenon lies in misspecification of the error structure: errors are conventionally assumed to be independent across observed points in Bayesian functional models. Through high-dimensional clustering theory, we demonstrate that ignoring the underlying correlation leads to excess clusters regardless of the flexibility of prior distributions. Guided by this theory, we propose incorporating the underlying correlation structures via Gaussian processes and also present its scalable approximation with principled hyperparameter selection. Numerical experiments illustrate that even simple clustering based on Dirichlet processes performs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Gaussian Processes and Bayesian Inference · Statistical Methods and Bayesian Inference
