Information Theoretical Importance Sampling Clustering
Jiangshe Zhang, Lizhen Ji, Meng Wang

TL;DR
This paper introduces an information theoretical importance sampling clustering method (ITISC) that addresses distribution deviations in clustering, validated on synthetic and real-world datasets, and relates to fuzzy c-means.
Contribution
It proposes a novel importance sampling based clustering approach that minimizes worst-case expected distortions under distribution deviation constraints.
Findings
Effective on synthetic datasets and load forecasting tasks.
Reveals fuzzy c-means as a special case of ITISC.
Provides a physical interpretation for fuzzy exponent m.
Abstract
A current assumption of most clustering methods is that the training data and future data are taken from the same distribution. However, this assumption may not hold in most real-world scenarios. In this paper, we propose an information theoretical importance sampling based approach for clustering problems (ITISC) which minimizes the worst case of expected distortions under the constraint of distribution deviation. The distribution deviation constraint can be converted to the constraint over a set of weight distributions centered on the uniform distribution derived from importance sampling. The objective of the proposed approach is to minimize the loss under maximum degradation hence the resulting problem is a constrained minimax optimization problem which can be reformulated to an unconstrained problem using the Lagrange method. The optimization problem can be solved by both an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Anomaly Detection Techniques and Applications · Geoscience and Mining Technology
