Inference for Dependent Data with Learned Clusters
Jianfei Cao, Christian Hansen, Damian Kozbur, Lucciano Villacorta

TL;DR
This paper introduces a cluster-based inference method for dependent spatial data, using learned clusters to improve hypothesis testing accuracy while allowing data-driven cluster determination.
Contribution
It proposes a novel approach combining unsupervised clustering with hypothesis testing for dependent data, with theoretical guarantees and practical validation.
Findings
Achieves asymptotic correct size under certain conditions.
Attains near nominal size in finite samples in simulations.
Provides a data-driven method for choosing the number of clusters.
Abstract
This paper presents and analyzes an approach to cluster-based inference for dependent data. The primary setting considered here is with spatially indexed data in which the dependence structure of observed random variables is characterized by a known, observed dissimilarity measure over spatial indices. Observations are partitioned into clusters with the use of an unsupervised clustering algorithm applied to the dissimilarity measure. Once the partition into clusters is learned, a cluster-based inference procedure is applied to a statistical hypothesis testing procedure. The procedure proposed in the paper allows the number of clusters to depend on the data, which gives researchers a principled method for choosing an appropriate clustering level. The paper gives conditions under which the proposed procedure asymptotically attains correct size. A simulation study shows that the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpatial and Panel Data Analysis · Data-Driven Disease Surveillance · Statistical Methods and Bayesian Inference
