Nonparametric Variable Selection, Clustering and Prediction for High-Dimensional Regression
Subharup Guha, Veerabhadran Baladandayuthapani

TL;DR
This paper introduces VariScan, a nonparametric framework for variable selection, clustering, and prediction in high-dimensional regression, effectively handling complex interactions with theoretical guarantees and superior empirical performance.
Contribution
The paper presents VariScan, a novel nonparametric method that combines Poisson-Dirichlet processes with adaptive mixture models for high-dimensional regression analysis.
Findings
VariScan accurately clusters covariates with theoretical guarantees.
The method outperforms existing techniques in prediction accuracy.
Theoretical results support model selection and clustering consistency.
Abstract
The development of parsimonious models for reliable inference and prediction of responses in high-dimensional regression settings is often challenging due to relatively small sample sizes and the presence of complex interaction patterns between a large number of covariates. We propose an efficient, nonparametric framework for simultaneous variable selection, clustering and prediction in high-throughput regression settings with continuous or discrete outcomes, called VariScan. The VariScan model utilizes the sparsity induced by Poisson-Dirichlet processes (PDPs) to group the covariates into lower-dimensional latent clusters consisting of covariates with similar patterns among the samples. The data are permitted to direct the choice of a suitable cluster allocation scheme, choosing between PDPs and their special case, a Dirichlet process. Subsequently, the latent clusters are used to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Advanced Clustering Algorithms Research
