Model Selection in High-Dimensional Misspecified Models
Pallavi Basu, Yang Feng, Jinchi Lv

TL;DR
This paper explores model selection in high-dimensional settings with potential misspecification, proposing generalized criteria that account for misspecification effects and demonstrating their consistency and effectiveness through theoretical analysis and simulations.
Contribution
It introduces generalized AIC and BIC for high-dimensional misspecified models, incorporating model misspecification effects into the selection criteria.
Findings
Generalized BIC with a logarithmic penalty performs well in simulations.
The covariance contrast matrix estimator is shown to be consistent.
Model misspecification significantly impacts model selection criteria.
Abstract
Model selection is indispensable to high-dimensional sparse modeling in selecting the best set of covariates among a sequence of candidate models. Most existing work assumes implicitly that the model is correctly specified or of fixed dimensions. Yet model misspecification and high dimensionality are common in real applications. In this paper, we investigate two classical Kullback-Leibler divergence and Bayesian principles of model selection in the setting of high-dimensional misspecified models. Asymptotic expansions of these principles reveal that the effect of model misspecification is crucial and should be taken into account, leading to the generalized AIC and generalized BIC in high dimensions. With a natural choice of prior probabilities, we suggest the generalized BIC with prior probability which involves a logarithmic factor of the dimensionality in penalizing model complexity.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Soil Geostatistics and Mapping
