Random Subspace with Trees for Feature Selection Under Memory Constraints
Antonio Sutera, C\'elia Ch\^atel, Gilles Louppe, Louis Wehenkel,, Pierre Geurts

TL;DR
This paper introduces a novel tree-based feature selection method designed for high-dimensional datasets with memory constraints, combining theoretical analysis and preliminary empirical results.
Contribution
It presents a new randomized tree approach for feature selection under memory limits and provides a comprehensive theoretical analysis of its convergence and relevance detection.
Findings
Method is theoretically sound for feature relevance detection.
Convergence speed varies with variable dependence scenarios.
Preliminary results show potential effectiveness.
Abstract
Dealing with datasets of very high dimension is a major challenge in machine learning. In this paper, we consider the problem of feature selection in applications where the memory is not large enough to contain all features. In this setting, we propose a novel tree-based feature selection approach that builds a sequence of randomized trees on small subsamples of variables mixing both variables already identified as relevant by previous models and variables randomly selected among the other variables. As our main contribution, we provide an in-depth theoretical analysis of this method in infinite sample setting. In particular, we study its soundness with respect to common definitions of feature relevance and its convergence speed under various variable dependance scenarios. We also provide some preliminary empirical results highlighting the potential of the approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Machine Learning and Algorithms · Algorithms and Data Compression
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
