A Comparison of Hamming Errors of Representative Variable Selection Methods
Zheng Tracy Ke, Longlin Wang

TL;DR
This paper compares the Hamming error performance of six variable selection methods, including Lasso, Elastic net, and others, under theoretical conditions with correlated variables and specific coefficient distributions.
Contribution
It provides a theoretical comparison of these methods' expected Hamming errors, deriving convergence rates and phase diagrams to evaluate their effectiveness.
Findings
Elastic net and SCAD outperform Lasso in correlated settings.
Thresholded Lasso and forward backward selection show competitive Hamming errors.
Theoretical phase diagrams illustrate method advantages under different conditions.
Abstract
Lasso is a celebrated method for variable selection in linear models, but it faces challenges when the variables are moderately or strongly correlated. This motivates alternative approaches such as using a non-convex penalty, adding a ridge regularization, or conducting a post-Lasso thresholding. In this paper, we compare Lasso with 5 other methods: Elastic net, SCAD, forward selection, thresholded Lasso, and forward backward selection. We measure their performances theoretically by the expected Hamming error, assuming that the regression coefficients are iid drawn from a two-point mixture and that the Gram matrix is block-wise diagonal. By deriving the rates of convergence of Hamming errors and the phase diagrams, we obtain useful conclusions about the pros and cons of different methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStatistical Methods and Inference · Markov Chains and Monte Carlo Methods · Bayesian Methods and Mixture Models
