TL;DR
Nash introduces a neural network-based framework for adaptive, structure-aware sparse regression that improves accuracy and efficiency without cross-validation, especially in high-dimensional biomedical data.
Contribution
It presents Nash, a novel neural adaptive shrinkage method that incorporates covariate structure into sparse regression and employs a scalable variational Bayes algorithm.
Findings
Achieves 74 to 106 times speedup over previous methods.
Improves regression accuracy and adaptability on real biomedical data.
Effectively integrates covariate structure into regularization.
Abstract
Sparse linear regression is a fundamental tool in data analysis. However, traditional approaches often fall short when covariates exhibit structure or arise from heterogeneous sources. In biomedical applications, covariates may stem from distinct modalities or be structured according to an underlying graph. We introduce \textit{Neural Adaptive Shrinkage} (Nash), a unified framework that integrates covariate-specific side information into sparse regression via neural networks. Nash adaptively modulates penalties on a per-covariate basis, learning to tailor regularization without cross-validation. We use a \textit{split variational empirical Bayes} algorithm that decouples prior learning from posterior inference, reducing the M-step from neural-network passes per sweep to a single batched pass, a \textit{74 to 106x wall-clock speedup} over previously proposed coordinate…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
S1. The split VEB approach addresses a real computational bottleneck of the mr.ash variational formulatio. By decoupling the updates of problems, Nash requires only one neural network update per coordinate ascent iteration per updates. Theorem A.1 provide the lower-bound relationship to mr.ash. S2. The authors successfully demonstrates how Nash can encompass various structured penalties (group lasso, fused lasso, IPF-lasso) within a single framework. S3. The authors clearly explains the vari
W1. The authors claim Nash is "the first work to propose the use of a neural network to incorporate covariate side information when learning the penalty function," but fail to demonstrate why neural networks are necessary. They can compare more classical baselines including Kernel-based methods (e.g., RBF kernels on side information). W2. The author employed only 4 real datasets, no synthetic data demonstrating when/why NNs help. It would be helpful if they can provide scenarios with complex no
1. Regression with structural information about the coefficients is an central problem in high dimensional statistics with countless applications. Any progress on this problem is welcome. 2. The proposed framework is very general. 3. Simulations demonstrate some promising results.
1. The prior construction seems a straightforward extension of Wang & Stephens (2021) and Kim et al. (2024). The main innovation is the introduction of the "side information" d_j. While this is helpful, especially for graph-based tasks considered here, it is a very natural idea. 2. The variational inference algorithm is an application of standard methodology.
The topic is interesting and the paper is rather easy to read.
Weaknesses: - Clarity: - I am not sure I understood the proposed algorithm, would it be possible to encapsulate it in an environment, as done in [1], maybe it would be a good opportunity to highlight the difference with [1] - In Figure 1, bottom left, where dore the GNN-based prior comes from? Is it a pre-trained GNN? On other data? Could authors provide more details on this specific figure? - Novelty: I am not sure I understood the difference with [1], could authors comment on that? Is
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Generative Adversarial Networks and Image Synthesis · Statistical Methods and Inference
MethodsVariational Inference · Linear Regression
