Regularized Greedy Column Subset Selection

Bruno Ordozgoiti; Alberto Mozo; Jes\'us Garc\'ia L\'opez de Lacalle

arXiv:1804.04421·cs.LG·April 13, 2018

Regularized Greedy Column Subset Selection

Bruno Ordozgoiti, Alberto Mozo, Jes\'us Garc\'ia L\'opez de Lacalle

PDF

TL;DR

This paper introduces a regularized version of the Column Subset Selection Problem, along with a greedy algorithm that enhances robustness and stability in feature selection, especially with noisy or scarce data.

Contribution

It proposes a novel regularized formulation and an efficient greedy algorithm for feature selection, improving robustness and stability over existing methods.

Findings

01

Enhanced robustness to noise and scarce data

02

Improved conditioning of selected features

03

Maintains efficiency comparable to existing greedy algorithms

Abstract

The Column Subset Selection Problem provides a natural framework for unsupervised feature selection. Despite being a hard combinatorial optimization problem, there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of regularization, and is therefore very sensitive to noise when presented with scarce data. In this paper we propose a regularized formulation of this problem, and derive a correct greedy algorithm that is similar in efficiency to existing greedy methods for the unregularized problem. We study its adequacy for feature selection and propose suitable formulations. Additionally, we derive a lower bound for the error of the proposed problems. Through various numerical experiments on real and synthetic data, we demonstrate the significantly increased robustness and stability of our method, as well…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.