Improving variable selection properties with data integration and transfer learning
Paul Rognon-Vael, David Rossell, Piotr Zwiernik

TL;DR
This paper introduces a new approach for variable selection in high-dimensional linear regression that leverages external information and transfer learning to improve accuracy and consistency.
Contribution
It develops likelihood penalties incorporating external data, demonstrating improved variable selection consistency and faster rates over traditional methods that ignore external information.
Findings
Penalties achieve variable selection consistency where traditional methods fail.
Empirical Bayes procedures learn penalties from data, enhancing performance.
Application to genomics shows practical benefits in real-world data.
Abstract
We study variable selection (also called support recovery) in high-dimensional sparse linear regression when one has external information on which variables are likely to be associated with the response. Consistent recovery is only possible under somewhat restrictive conditions on sample size, dimension, signal strength, and sparsity. We investigate how these conditions can be relaxed by incorporating said external information. A key application that we consider is structural transfer learning, where variables selected in one or more source datasets are used to guide variable selection in a target dataset. We introduce a family of likelihood penalties that depend on the external information, motivated by connections to Bayesian variable selection. We show that these methods achieve variable selection consistency in regimes where any method ignoring external information fails, and that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Gene expression and cancer classification · Gaussian Processes and Bayesian Inference
