A Distance-Based Branch and Bound Feature Selection Algorithm

Ari Frank; Dan Geiger; Zohar Yakhini

arXiv:1212.2488·cs.LG·December 12, 2012·2 cites

A Distance-Based Branch and Bound Feature Selection Algorithm

Ari Frank, Dan Geiger, Zohar Yakhini

PDF

Open Access

TL;DR

This paper introduces a Branch and Bound algorithm for optimal feature selection of Gaussian features to minimize Bayesian classification error, outperforming greedy methods especially in complex datasets.

Contribution

It presents a novel distance-based Branch and Bound approach that guarantees optimal feature subset selection for Gaussian features, improving over existing greedy algorithms.

Findings

01

Successfully applied to synthetic data

02

Effective on gene expression data

03

Achieves optimal feature subsets

Abstract

There is no known efficient method for selecting k Gaussian features from n which achieve the lowest Bayesian classification error. We show an example of how greedy algorithms faced with this task are led to give results that are not optimal. This motivates us to propose a more robust approach. We present a Branch and Bound algorithm for finding a subset of k independent Gaussian features which minimizes the naive Bayesian classification error. Our algorithm uses additive monotonic distance measures to produce bounds for the Bayesian classification error in order to exclude many feature subsets from evaluation, while still returning an optimal solution. We test our method on synthetic data as well as data obtained from gene expression profiling.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Machine Learning and Algorithms · Machine Learning and Data Classification