# In Defense of the Indefensible: A Very Naive Approach to   High-Dimensional Inference

**Authors:** Sen Zhao, Daniela Witten, Ali Shojaie

arXiv: 1705.05543 · 2020-07-02

## TL;DR

This paper demonstrates that a simple two-step procedure using lasso followed by least squares can produce valid high-dimensional inference under certain conditions, challenging conventional wisdom about data peeking.

## Contribution

It shows that the variable set selected by lasso is deterministic under assumptions, enabling asymptotically valid inference with naive confidence intervals and score tests.

## Key findings

- Lasso-selected variables are often deterministic under assumptions.
- Naive confidence intervals are asymptotically valid for selected coefficients.
- Naive score tests can reliably test hypotheses in high-dimensional models.

## Abstract

A great deal of interest has recently focused on conducting inference on the parameters in a high-dimensional linear model.   In this paper, we consider a simple and very na\"{i}ve two-step procedure for this task, in which we (i) fit a lasso model in order to obtain a subset of the variables, and (ii) fit a least squares model on the lasso-selected set. Conventional statistical wisdom tells us that we cannot make use of the standard statistical inference tools for the resulting least squares model (such as confidence intervals and $p$-values), since we peeked at the data twice: once in running the lasso, and again in fitting the least squares model. However, in this paper, we show that under a certain set of assumptions, with high probability, the set of variables selected by the lasso is identical to the one selected by the noiseless lasso and is hence deterministic. Consequently, the na\"{i}ve two-step approach can yield asymptotically valid inference. We utilize this finding to develop the \emph{na\"ive confidence interval}, which can be used to draw inference on the regression coefficients of the model selected by the lasso, as well as the \emph{na\"ive score test}, which can be used to test the hypotheses regarding the full-model regression coefficients.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.05543/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1705.05543/full.md

## References

65 references — full list in the complete paper: https://tomesphere.com/paper/1705.05543/full.md

---
Source: https://tomesphere.com/paper/1705.05543