Boosting SISSO Performance on Small Sample Datasets by Using Random Forests Prescreening for Complex Feature Selection
Xiaolin Jiang, Guanqi Liu, Jiaying Xie, Zhenpeng Hu

TL;DR
This paper introduces RF-SISSO, a hybrid method combining Random Forests with SISSO, to improve feature selection and predictive accuracy in small sample datasets in materials science.
Contribution
The paper proposes RF-SISSO, a novel prescreening approach that enhances SISSO's performance and efficiency on small datasets by integrating Random Forests for complex feature selection.
Findings
RF-SISSO maintains over 0.9 accuracy across various training sizes
Significantly improves regression efficiency, especially with small samples
Outperforms original SISSO in accuracy and computational speed
Abstract
In materials science, data-driven methods accelerate material discovery and optimization while reducing costs and improving success rates. Symbolic regression is a key to extracting material descriptors from large datasets, in particular the Sure Independence Screening and Sparsifying Operator (SISSO) method. While SISSO needs to store the entire expression space to impose heavy memory demands, it limits the performance in complex problems. To address this issue, we propose a RF-SISSO algorithm by combining Random Forests (RF) with SISSO. In this algorithm, the Random Forest algorithm is used for prescreening, capturing non-linear relationships and improving feature selection, which may enhance the quality of the input data and boost the accuracy and efficiency on regression and classification tasks. For a testing on the SISSO's verification problem for 299 materials, RF-SISSO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition
