Replica Analysis for Ensemble Techniques in Variable Selection

Takashi Takahashi

arXiv:2408.16799·math.ST·February 27, 2025

Replica Analysis for Ensemble Techniques in Variable Selection

Takashi Takahashi

PDF

Open Access

TL;DR

This paper uses the replica method from statistical mechanics to analyze the performance of ensemble techniques like stability selection and knockoffs in high-dimensional variable selection, revealing their relative strengths.

Contribution

It introduces a systematic analytical framework for evaluating ensemble methods in high-dimensional settings using the replica approach.

Findings

01

dKO outperforms vanilla knockoff and SS

02

Increasing bootstrap resampling in SS can improve detection power

03

Analytical insights into ensemble method performance in high dimensions

Abstract

Variable selection is a problem of statistics that aims to find the subset of the $N$ -dimensional possible explanatory variables that are truly related to the generation process of the response variable. In high-dimensional setups, where the input dimension $N$ is comparable to the data size $M$ , it is difficult to use classic methods based on $p$ -values. Therefore, methods based on the ensemble learning are often used. In this review article, we introduce how the performance of these ensemble-based methods can be systematically analyzed using the replica method from statistical mechanics when $N$ and $M$ diverge at the same rate as $N, M \to \infty, M / N \to α \in (0, \infty)$ . As a concrete application, we analyze the power of stability selection (SS) and the derandomized knockoff (dKO) with the $ℓ_{1}$ -regularized statistics in the high-dimensional linear model. The result indicates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Fault Detection and Control Systems