K-medoids Clustering of Data Sequences with Composite Distributions
Tiexing Wang, Qunwei Li, Donald J. Bucci, Yingbin Liang, Biao Chen,, Pramod K. Varshney

TL;DR
This paper introduces a clustering method for data sequences generated from unknown distributions using k-medoids with distribution distance metrics, providing theoretical error bounds and validation through simulations.
Contribution
It proposes distribution-based k-medoids algorithms for unknown and known cluster counts, with theoretical analysis of error decay and convergence.
Findings
Error probability decays exponentially with sample size
Error exponent is simple and metric-independent under certain conditions
Simulation results validate theoretical analysis
Abstract
This paper studies clustering of data sequences using the k-medoids algorithm. All the data sequences are assumed to be generated from \emph{unknown} continuous distributions, which form clusters with each cluster containing a composite set of closely located distributions (based on a certain distance metric between distributions). The maximum intra-cluster distance is assumed to be smaller than the minimum inter-cluster distance, and both values are assumed to be known. The goal is to group the data sequences together if their underlying generative distributions (which are unknown) belong to one cluster. Distribution distance metrics based k-medoids algorithms are proposed for known and unknown number of distribution clusters. Upper bounds on the error probability and convergence results in the large sample regime are also provided. It is shown that the error probability decays…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
