Learning to Maximize Mutual Information for Dynamic Feature Selection
Ian Covert, Wei Qiu, Mingyu Lu, Nayoon Kim, Nathan White, Su-In Lee

TL;DR
This paper introduces a simple, greedy, mutual information-based approach for dynamic feature selection, which outperforms existing methods and is trained via amortized optimization to approximate the optimal policy.
Contribution
It proposes a novel learning method for dynamic feature selection based on mutual information, bridging the gap between theoretical appeal and practical implementation.
Findings
Outperforms existing feature selection methods in experiments
Recovers the greedy policy when trained optimally
Validates the approach as simple yet powerful
Abstract
Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning, but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality, and it outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Fuzzy Logic and Control Systems · Machine Learning and Algorithms
MethodsFeature Selection
