Distributionally Robust Feature Selection

Maitreyi Swaroop; Tamar Krishnamurti; Bryan Wilder

arXiv:2510.21113·cs.LG·October 27, 2025

Distributionally Robust Feature Selection

Maitreyi Swaroop, Tamar Krishnamurti, Bryan Wilder

PDF

TL;DR

This paper introduces a distributionally robust feature selection method that optimizes feature subsets to ensure high predictive performance across multiple subpopulations, especially when feature collection is costly.

Contribution

It proposes a novel, model-agnostic framework using a noising mechanism and variance optimization to select features that perform well across diverse groups without backpropagation.

Findings

01

Effective in synthetic and real-world datasets

02

Balances performance across multiple subpopulations

03

Does not require backpropagation through model training

Abstract

We study the problem of selecting limited features to observe such that models trained on them can perform well simultaneously across multiple subpopulations. This problem has applications in settings where collecting each feature is costly, e.g. requiring adding survey questions or physical sensors, and we must be able to use the selected features to create high-quality downstream models for different populations. Our method frames the problem as a continuous relaxation of traditional variable selection using a noising mechanism, without requiring backpropagation through model training processes. By optimizing over the variance of a Bayes-optimal predictor, we develop a model-agnostic framework that balances overall performance of downstream prediction across populations. We validate our approach through experiments on both synthetic datasets and real-world data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.