Submodularity in Statistics: Comparing the Success of Model Selection Methods
Kory D. Johnson, Robert A. Stine, Dean P. Foster

TL;DR
This paper explores how submodularity can characterize the difficulty of feature selection in statistics, linking it to combinatorial optimization and common statistical assumptions, thereby providing insights into model selection challenges.
Contribution
It introduces submodularity as a key concept for understanding feature selection difficulty and connects it to existing statistical methods like Lasso and Dantzig selector.
Findings
Submodularity differentiates routine from difficult feature selection scenarios.
It provides a measure to quantify the difficulty of feature selection tasks.
Connections are established between submodularity and popular statistical assumptions.
Abstract
We demonstrate the usefulness of submodularity in statistics as a characterization of the difficulty of the \emph{search} problem of feature selection. The search problem is the ability of a procedure to identify an informative set of features as opposed to the performance of the optimal set of features. Submodularity arises naturally in this setting due to its connection to combinatorial optimization. In statistics, submodularity isolates cases in which collinearity makes the choice of model features difficult from those in which this task is routine. Researchers often report the signal-to-noise ratio to measure the difficulty of simulated data examples. A measure of submodularity should also be provided as it characterizes an independent component difficulty. Furthermore, it is closely related to other statistical assumptions used in the development of the Lasso, Dantzig selector, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Probabilistic and Robust Engineering Design · Statistical Methods and Inference
