Combined Pruning for Nested Cross-Validation to Accelerate Automated Hyperparameter Optimization for Embedded Feature Selection in High-Dimensional Data with Very Small Sample Sizes
Sigrun May, Sven Hartmann, Frank Klawonn

TL;DR
This paper introduces a combined pruning strategy that accelerates hyperparameter optimization in embedded feature selection for high-dimensional, small-sample datasets, reducing computation time significantly while maintaining performance.
Contribution
It develops a novel combination of pruning methods, including domain knowledge and extrapolation strategies, to improve efficiency in nested cross-validation for hyperparameter tuning.
Findings
Up to 81.3% fewer models trained with same results
Significant reduction in computation time and resources
Enables more extensive hyperparameter searches within limited time
Abstract
Background: Embedded feature selection in high-dimensional data with very small sample sizes requires optimized hyperparameters for the model building process. For this hyperparameter optimization, nested cross-validation must be applied to avoid a biased performance estimation. The resulting repeated training with high-dimensional data leads to very long computation times. Moreover, it is likely to observe a high variance in the individual performance evaluation metrics caused by outliers in tiny validation sets. Therefore, early stopping applying standard pruning algorithms to save time risks discarding promising hyperparameter sets. Result: To speed up feature selection for high-dimensional data with tiny sample size, we adapt the use of a state-of-the-art asynchronous successive halving pruner. In addition, we combine it with two complementary pruning strategies based on domain or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Metaheuristic Optimization Algorithms Research · Advanced Multi-Objective Optimization Algorithms
MethodsPruning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Feature Selection · Early Stopping
