Parameterized Complexity of Feature Selection for Categorical Data Clustering
Sayan Bandyapadhyay, Fedor V. Fomin, Petr A. Golovach, Kirill Simonov

TL;DR
This paper introduces a fixed-parameter tractable algorithm for feature selection in categorical data clustering, addressing the challenge of irrelevant features by leveraging parameterized complexity.
Contribution
It develops the first fixed-parameter algorithm for feature selection in categorical clustering based on parameters like cluster count, budget, and alphabet size.
Findings
Algorithm solves feature selection in quadratic time relative to data size.
Problem is fixed-parameter tractable with respect to cluster number, budget, and alphabet size.
Provides complexity lower bounds complementing the algorithmic results.
Abstract
We develop new algorithmic methods with provable guarantees for feature selection in regard to categorical data clustering. While feature selection is one of the most common approaches to reduce dimensionality in practice, most of the known feature selection methods are heuristics. We study the following mathematical model. We assume that there are some inadvertent (or undesirable) features of the input data that unnecessarily increase the cost of clustering. Consequently, we want to select a subset of the original features from the data such that there is a small-cost clustering on the selected features. More precisely, for given integers (the number of irrelevant features) and (the number of clusters), budget , and a set of categorical data points (represented by -dimensional vectors whose elements belong to a finite set of values ), we want to select…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Graph theory and applications
