Massive Data Clustering in Moderate Dimensions from the Dual Spaces of Observation and Attribute Data Clouds
Fionn Murtagh

TL;DR
This paper explores clustering in moderate-dimensional spaces by leveraging the duality between observation and attribute data clouds, proposing an efficient pipeline for both partitioning and hierarchical clustering.
Contribution
It introduces a novel approach that utilizes the dual spaces of observations and attributes for effective clustering in moderate dimensions.
Findings
Effective clustering pipeline established
Applicable to both partitioning and hierarchical methods
Improves clustering efficiency in moderate dimensions
Abstract
Cluster analysis of very high dimensional data can benefit from the properties of such high dimensionality. Informally expressed, in this work, our focus is on the analogous situation when the dimensionality is moderate to small, relative to a massively sized set of observations. Mathematically expressed, these are the dual spaces of observations and attributes. The point cloud of observations is in attribute space, and the point cloud of attributes is in observation space. In this paper, we begin by summarizing various perspectives related to methodologies that are used in multivariate analytics. We draw on these to establish an efficient clustering processing pipeline, both partitioning and hierarchical clustering.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Data Management and Algorithms
