Robust Feature Selection by Mutual Information Distributions
Marco Zaffalon, Marcus Hutter

TL;DR
This paper develops a Bayesian approach to the distribution of mutual information for feature selection, providing analytical formulas and asymptotic approximations, and demonstrates improved performance over traditional methods in classification tasks.
Contribution
It introduces a Bayesian distribution framework for mutual information, with analytical expressions and asymptotic approximations, enhancing feature selection in incremental learning and classification.
Findings
Bayesian distribution of mutual information derived with exact mean and approximate variance.
Proposed method outperforms traditional empirical mutual information in real data classification.
Efficient extension of methods to incomplete samples is achieved.
Abstract
Mutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sample-to-population inferential approaches. This paper deals with the distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean and an analytical approximation of the variance are reported. Asymptotic approximations of the distribution are proposed. The results are applied to the problem of selecting features for incremental learning and classification of the naive Bayes classifier. A fast, newly defined method is shown to outperform the traditional approach based on empirical mutual information on a number of real data sets. Finally, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
