Stochastic estimations of a total number of classes for the clusterings with too enormous samples to be accommodate into a clustering engine
Keishu Utimula, Genki I. Prayogo, Kousuke Nakano, Kenta Hongo, Ryo, Maezono

TL;DR
This paper introduces a stochastic method to estimate the total number of irreducible classes in clustering problems with extremely large sample spaces, such as atomic substitutions in alloys, where direct enumeration is infeasible.
Contribution
A novel stochastic framework is developed to estimate the total number of classes in large-scale clustering problems, overcoming input capacity limitations.
Findings
Statistical variation of class counts serves as an effective measure for estimation.
The method successfully estimates class counts in trillion-scale possibility spaces.
The approach provides a practical solution for large-scale clustering challenges.
Abstract
We considered the problem how to handle the exploding number of possibilities to be sorted into irreducible classes by using a clustering tool when its input capacity cannot accommodate the total number of the possibility. Concrete situations are explained taking examples of atomic substitutions in the supercell modeling of alloys. The number of the possibility sometimes amounts to trillion being too large to be accommodate. It is hence not practically feasible to identify how many irreducible classes exist by straightforward manner even though there are several tools available to perform the clustering. We have developed a stochastic framework to avoid the shortage of capacity, providing a method to estimate the total number of irreducible classes (the order of the classes) as a statistical estimate. A prominent conclusion derived here is that the statistical variation of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
