K-medoids Clustering of Data Sequences with Composite Distributions

Tiexing Wang; Qunwei Li; Donald J. Bucci; Yingbin Liang; Biao Chen,; Pramod K. Varshney

arXiv:1807.11620·cs.LG·March 27, 2019

K-medoids Clustering of Data Sequences with Composite Distributions

Tiexing Wang, Qunwei Li, Donald J. Bucci, Yingbin Liang, Biao Chen,, Pramod K. Varshney

PDF

TL;DR

This paper introduces a clustering method for data sequences generated from unknown distributions using k-medoids with distribution distance metrics, providing theoretical error bounds and validation through simulations.

Contribution

It proposes distribution-based k-medoids algorithms for unknown and known cluster counts, with theoretical analysis of error decay and convergence.

Findings

01

Error probability decays exponentially with sample size

02

Error exponent is simple and metric-independent under certain conditions

03

Simulation results validate theoretical analysis

Abstract

This paper studies clustering of data sequences using the k-medoids algorithm. All the data sequences are assumed to be generated from \emph{unknown} continuous distributions, which form clusters with each cluster containing a composite set of closely located distributions (based on a certain distance metric between distributions). The maximum intra-cluster distance is assumed to be smaller than the minimum inter-cluster distance, and both values are assumed to be known. The goal is to group the data sequences together if their underlying generative distributions (which are unknown) belong to one cluster. Distribution distance metrics based k-medoids algorithms are proposed for known and unknown number of distribution clusters. Upper bounds on the error probability and convergence results in the large sample regime are also provided. It is shown that the error probability decays…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.