Robust Grouped Variable Selection Using Distributionally Robust Optimization
Ruidi Chen, Ioannis Ch. Paschalidis

TL;DR
This paper introduces a distributionally robust optimization framework for grouped variable selection that enhances model interpretability, sparsity, and prediction accuracy, especially under data perturbations and outliers.
Contribution
It develops a Wasserstein-based DRO model for grouped variable selection, linking robustness with regularization, and proposes a spectral clustering method for unknown group structures.
Findings
Improves prediction and estimation in presence of outliers
Provides probabilistic bounds on out-of-sample loss and bias
Demonstrates effectiveness on synthetic and real datasets
Abstract
We propose a Distributionally Robust Optimization (DRO) formulation with a Wasserstein-based uncertainty set for selecting grouped variables under perturbations on the data for both linear regression and classification problems. The resulting model offers robustness explanations for Grouped Least Absolute Shrinkage and Selection Operator (GLASSO) algorithms and highlights the connection between robustness and regularization. We prove probabilistic bounds on the out-of-sample loss and the estimation bias, and establish the grouping effect of our estimator, showing that coefficients in the same group converge to the same value as the sample correlation between covariates approaches 1. Based on this result, we propose to use the spectral clustering algorithm with the Gaussian similarity function to perform grouping on the predictors, which makes our approach applicable without knowing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Portfolio Optimization · Health Systems, Economic Evaluations, Quality of Life · Statistical Methods and Inference
MethodsSpectral Clustering · Linear Regression
