Massive Data Clustering in Moderate Dimensions from the Dual Spaces of   Observation and Attribute Data Clouds

Fionn Murtagh

arXiv:1704.01871·stat.ML·April 7, 2017·1 cites

Massive Data Clustering in Moderate Dimensions from the Dual Spaces of Observation and Attribute Data Clouds

Fionn Murtagh

PDF

Open Access

TL;DR

This paper explores clustering in moderate-dimensional spaces by leveraging the duality between observation and attribute data clouds, proposing an efficient pipeline for both partitioning and hierarchical clustering.

Contribution

It introduces a novel approach that utilizes the dual spaces of observations and attributes for effective clustering in moderate dimensions.

Findings

01

Effective clustering pipeline established

02

Applicable to both partitioning and hierarchical methods

03

Improves clustering efficiency in moderate dimensions

Abstract

Cluster analysis of very high dimensional data can benefit from the properties of such high dimensionality. Informally expressed, in this work, our focus is on the analogous situation when the dimensionality is moderate to small, relative to a massively sized set of observations. Mathematically expressed, these are the dual spaces of observations and attributes. The point cloud of observations is in attribute space, and the point cloud of attributes is in observation space. In this paper, we begin by summarizing various perspectives related to methodologies that are used in multivariate analytics. We draw on these to establish an efficient clustering processing pipeline, both partitioning and hierarchical clustering.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Data Management and Algorithms