Clustering of Modal Valued Symbolic Data
Vladimir Batagelj, Nata\v{s}a Kej\v{z}ar, Simona, Korenjak-\v{C}erne

TL;DR
This paper introduces new clustering methods for symbolic data with modal values, allowing simultaneous consideration of all measurement types and providing efficient solutions for large datasets.
Contribution
It proposes a novel clustering criterion and compatible hierarchical and non-hierarchical methods specifically for symbolic objects with distributions.
Findings
Effective clustering of symbolic data demonstrated on real datasets.
Hierarchical and non-hierarchical methods are compatible and solve the same optimization problem.
Methods facilitate determining the optimal number of clusters using dendrograms.
Abstract
Symbolic Data Analysis is based on special descriptions of data - symbolic objects (SO). Such descriptions preserve more detailed information about units and their clusters than the usual representations with mean values. A special kind of symbolic object is a representation with frequency or probability distributions (modal values). This representation enables us to consider in the clustering process the variables of all measurement types at the same time. In the paper a clustering criterion function for SOs is proposed such that the representative of each cluster is again composed of distributions of variables' values over the cluster. The corresponding leaders clustering method is based on this result. It is also shown that for the corresponding agglomerative hierarchical method a generalized Ward's formula holds. Both methods are compatible - they are solving the same clustering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
