MISFEAT: Feature Selection for Subgroups with Systematic Missing Data
Bar Genossar, Thinh On, Md. Mouinul Islam, Ben Eliav, Senjuti Basu Roy, and Avigdor Gal

TL;DR
This paper introduces MISFEAT, a novel feature selection method for datasets with systematic missing data across subgroups, using a graph neural network to infer missing mutual information and improve feature selection.
Contribution
It proposes a heterogeneous graph neural network model to infer mutual information in subgroup-specific missing data scenarios, enabling effective feature selection.
Findings
The model accurately infers missing mutual information between features and targets.
It scales efficiently to large datasets with systematic missing data.
Empirical results show improved feature selection performance over baseline methods.
Abstract
We investigate the problem of selecting features for datasets that can be naturally partitioned into subgroups (e.g., according to socio-demographic groups and age), each with its own dominant set of features. Within this subgroup-oriented framework, we address the challenge of systematic missing data, a scenario in which some feature values are missing for all tuples of a subgroup, due to flawed data integration, regulatory constraints, or privacy concerns. Feature selection is governed by finding mutual Information, a popular quantification of correlation, between features and a target variable. Our goal is to identify top-K feature subsets of some fixed size with the highest joint mutual information with a target variable. In the presence of systematic missing data, the closed form of mutual information could not simply be applied. We argue that in such a setting, leveraging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies
MethodsSparse Evolutionary Training · Feature Selection · Graph Neural Network
