Nash: Neural Adaptive Shrinkage for Structured High-Dimensional Regression

William R.P. Denault

arXiv:2505.11143·stat.ML·May 19, 2026

Nash: Neural Adaptive Shrinkage for Structured High-Dimensional Regression

William R.P. Denault

PDF

3 Reviews

TL;DR

Nash introduces a neural network-based framework for adaptive, structure-aware sparse regression that improves accuracy and efficiency without cross-validation, especially in high-dimensional biomedical data.

Contribution

It presents Nash, a novel neural adaptive shrinkage method that incorporates covariate structure into sparse regression and employs a scalable variational Bayes algorithm.

Findings

01

Achieves 74 to 106 times speedup over previous methods.

02

Improves regression accuracy and adaptability on real biomedical data.

03

Effectively integrates covariate structure into regularization.

Abstract

Sparse linear regression is a fundamental tool in data analysis. However, traditional approaches often fall short when covariates exhibit structure or arise from heterogeneous sources. In biomedical applications, covariates may stem from distinct modalities or be structured according to an underlying graph. We introduce \textit{Neural Adaptive Shrinkage} (Nash), a unified framework that integrates covariate-specific side information into sparse regression via neural networks. Nash adaptively modulates penalties on a per-covariate basis, learning to tailor regularization without cross-validation. We use a \textit{split variational empirical Bayes} algorithm that decouples prior learning from posterior inference, reducing the M-step from $O (p)$ neural-network passes per sweep to a single batched pass, a \textit{74 to 106x wall-clock speedup} over previously proposed coordinate…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 6Confidence 2

Strengths

S1. The split VEB approach addresses a real computational bottleneck of the mr.ash variational formulatio. By decoupling the updates of problems, Nash requires only one neural network update per coordinate ascent iteration per updates. Theorem A.1 provide the lower-bound relationship to mr.ash. S2. The authors successfully demonstrates how Nash can encompass various structured penalties (group lasso, fused lasso, IPF-lasso) within a single framework. S3. The authors clearly explains the vari

Weaknesses

W1. The authors claim Nash is "the first work to propose the use of a neural network to incorporate covariate side information when learning the penalty function," but fail to demonstrate why neural networks are necessary. They can compare more classical baselines including Kernel-based methods (e.g., RBF kernels on side information). W2. The author employed only 4 real datasets, no synthetic data demonstrating when/why NNs help. It would be helpful if they can provide scenarios with complex no

Reviewer 02Rating 4Confidence 4

Strengths

1. Regression with structural information about the coefficients is an central problem in high dimensional statistics with countless applications. Any progress on this problem is welcome. 2. The proposed framework is very general. 3. Simulations demonstrate some promising results.

Weaknesses

1. The prior construction seems a straightforward extension of Wang & Stephens (2021) and Kim et al. (2024). The main innovation is the introduction of the "side information" d_j. While this is helpful, especially for graph-based tasks considered here, it is a very natural idea. 2. The variational inference algorithm is an application of standard methodology.

Reviewer 03Rating 4Confidence 3

Strengths

The topic is interesting and the paper is rather easy to read.

Weaknesses

Weaknesses: - Clarity: - I am not sure I understood the proposed algorithm, would it be possible to encapsulate it in an environment, as done in [1], maybe it would be a good opportunity to highlight the difference with [1] - In Figure 1, bottom left, where dore the GNN-based prior comes from? Is it a pre-trained GNN? On other data? Could authors provide more details on this specific figure? - Novelty: I am not sure I understood the difference with [1], could authors comment on that? Is

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Generative Adversarial Networks and Image Synthesis · Statistical Methods and Inference

MethodsVariational Inference · Linear Regression