Random-projection ensemble dimension reduction

Wenxing Zhou; Timothy I. Cannings

arXiv:2410.04922·stat.ME·October 8, 2024

Random-projection ensemble dimension reduction

Wenxing Zhou, Timothy I. Cannings

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper proposes a novel ensemble-based dimension reduction method using random projections and empirical performance to select and aggregate projections, improving high-dimensional regression.

Contribution

It introduces a new framework combining random projections with empirical performance-based selection and aggregation, including a double-application strategy for dimension reduction.

Findings

01

The method stabilizes error as the number of projection groups increases.

02

Empirical results show excellent performance on simulated and real data.

03

Singular values indicate the importance of projection directions.

Abstract

We introduce a new framework for dimension reduction in the context of high-dimensional regression. Our proposal is to aggregate an ensemble of random projections, which have been carefully chosen based on the empirical regression performance after being applied to the covariates. More precisely, we consider disjoint groups of independent random projections, apply a base regression method after each projection, and retain the projection in each group based on the empirical performance. We aggregate the selected projections by taking the singular value decomposition of their empirical average and then output the leading order singular vectors. A particularly appealing aspect of our approach is that the singular values provide a measure of the relative importance of the corresponding projection directions, which can be used to select the final projection dimension. We investigate in…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 2

Strengths

* Proposes a novel, flexible, and theoretically principled framework for dimension reduction in high-dimensional regression via random-projection ensembles. * Adaptively selects informative projection directions, delivering more stable estimates. * Provides theoretical guarantees showing that the expected estimation error decreases monotonically as the number of projection groups increases.

Weaknesses

I am not deeply familiar with the SDR literature; my expertise lies more in random-projection methods for computational efficiency. Consequently, I cannot fully determine whether the comparisons with existing SDR approaches are sufficiently comprehensive or representative of the current state of the art. The following comments are therefore offered as provisional observations: * Theorem 1 relies on the assumption $L = \infty$, which limits its practical value for guiding the selection of $L$ in

Reviewer 02Rating 4Confidence 3

Strengths

1. The proposed method is parallelizable, making it applicable to distributed systems. 2. The paper not just shows empirical results but also provides theoretical bound for estimation error.

Weaknesses

1. The superiority of proposed method is not so promising. According to experiment results in Table 2-7, drMARS sometimes shows better performances and needs less runtime. On the other hand, proposed RPE2 and RPE are not always optimal and consume much more runtime. This weakens the contribution of this work. 2. This framework has multiple parameters, including L, M and d, and each needs special tuning, which makes this framework less practical. 3. The computation cost of proposed method may bri

Reviewer 03Rating 6Confidence 3

Strengths

1. The method cleanly combines random projections, held-out selection, and SVD-based aggregation, making it plug-and-play with different projection families and base regressors while yielding a stable subspace estimator (with an optional double-pass refinement and a built-in dimension-selection procedure). 2. The finite-sample analysis isolates an “infinite-simulation” term and a sampling error that shrinks as L^(-1/2), clarifying how performance improves with the number of projection groups. 3

Weaknesses

1. The Contributions section is overly long and diffuses the main claims. The current text mixes method mechanics into the contributions. The Contributions should state only the new ideas. 2. The paper discusses how to choose L,M, and d, but provides no guidance for n_1 (the within-group training split used for projection selection). 3. Some arguments in the proof are confusing.

Code & Models

Repositories

Wenxing99/RPEDR
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · Image and Signal Denoising Methods · Face and Expression Recognition

MethodsBalanced Selection