Variable Selection for Clustering and Classification

Jeffrey L. Andrews; Paul D. McNicholas

arXiv:1303.5294·stat.CO·March 22, 2013·J. Classif.

Variable Selection for Clustering and Classification

Jeffrey L. Andrews, Paul D. McNicholas

PDF

Open Access

TL;DR

This paper introduces a new, computationally efficient variable selection method for clustering and classification, especially suited for high-dimensional data, demonstrated through simulated and real data comparisons.

Contribution

A novel variable selection technique that is intuitive and computationally efficient, adaptable for mixture model-based clustering and classification.

Findings

01

Outperforms existing methods in computational efficiency.

02

Effective in high-dimensional data scenarios.

03

Demonstrated success on real and simulated datasets.

Abstract

As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering algorithms are based upon determining the best variable subspace according to model fitting in a stepwise manner. These techniques are often computationally intensive and can require extended periods of time to run; in fact, some are prohibitively computationally expensive for high-dimensional data. In this paper, a novel variable selection technique is introduced for use in clustering and classification analyses that is both intuitive and computationally efficient. We focus largely on applications in mixture model-based learning, but the technique could be adapted for use with various other clustering/classification methods. Our approach is illustrated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Gene expression and cancer classification