Similarity-based Random Partition Distribution for Clustering Functional Data
Tomoya Wakayama, Shonosuke Sugasawa, Genya Kobayashi

TL;DR
This paper introduces a novel similarity-based generalized Dirichlet process for clustering functional spatial data, effectively preventing over-clustering and capturing spatial nuances in population flow data.
Contribution
It extends the generalized Dirichlet process with a similarity-based approach, improving clustering accuracy for functional spatial data.
Findings
The SGDP-type distribution prevents excess clusters.
The method accurately detects meaningful spatiotemporal patterns.
Application to Tokyo population data demonstrates practical utility.
Abstract
Random partition distribution is a crucial tool for model-based clustering. This study advances the field of random partition in the context of functional spatial data, focusing on the challenges posed by hourly population data across various regions and dates. We propose an extension of the generalized Dirichlet process, named the similarity-based generalized Dirichlet process (SGDP)-type distribution, to address the limitations of simple random partition distributions (e.g., those induced by the Dirichlet process), such as an overabundance of clusters. This model prevents excess cluster production and incorporates pairwise similarity information to ensure accurate and meaningful clustering. The theoretical properties of the SGDP-type distribution are studied. Then, SGDP-type random partition is applied to a real-world dataset of hourly population flow in meshes in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Mobility and Location-Based Analysis · Bayesian Methods and Mixture Models · Land Use and Ecosystem Services
