Information-theoretic Feature Selection via Tensor Decomposition and Submodularity
Magda Amiridi, Nikos Kargas, Nicholas D. Sidiropoulos

TL;DR
This paper introduces a novel feature selection method that leverages tensor decomposition and submodularity to efficiently maximize high-order mutual information, improving prediction performance while reducing computational complexity.
Contribution
It proposes a low-rank tensor model of the joint probability distribution and formulates feature selection as a submodular maximization problem with theoretical guarantees.
Findings
Outperforms state-of-the-art methods on standard datasets
Reduces complexity of high-order mutual information estimation
Provides a greedy algorithm with performance guarantees
Abstract
Feature selection by maximizing high-order mutual information between the selected feature vector and a target variable is the gold standard in terms of selecting the best subset of relevant features that maximizes the performance of prediction models. However, such an approach typically requires knowledge of the multivariate probability distribution of all features and the target, and involves a challenging combinatorial optimization problem. Recent work has shown that any joint Probability Mass Function (PMF) can be represented as a naive Bayes model, via Canonical Polyadic (tensor rank) Decomposition. In this paper, we introduce a low-rank tensor model of the joint PMF of all variables and indirect targeting as a way of mitigating complexity and maximizing the classification performance for a given number of features. Through low-rank modeling of the joint PMF, it is possible to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFeature Selection
