BOLIMES: Boruta and LIME optiMized fEature Selection for Gene Expression   Classification

Bich-Chung Phan; Thanh Ma; Huu-Hoa Nguyen; Thanh-Nghi Do

arXiv:2502.13080·cs.LG·February 27, 2025

BOLIMES: Boruta and LIME optiMized fEature Selection for Gene Expression Classification

Bich-Chung Phan, Thanh Ma, Huu-Hoa Nguyen, Thanh-Nghi Do

PDF

Open Access

TL;DR

BOLIMES is a novel feature selection method that combines Boruta and LIME to improve gene expression classification by reducing dimensionality while maintaining interpretability and high accuracy.

Contribution

It introduces a hybrid feature selection algorithm integrating Boruta and LIME for more effective gene selection in high-dimensional data.

Findings

01

Enhanced classification accuracy with fewer genes.

02

Effective reduction of irrelevant features.

03

Improved interpretability of selected genes.

Abstract

Gene expression classification is a pivotal yet challenging task in bioinformatics, primarily due to the high dimensionality of genomic data and the risk of overfitting. To bridge this gap, we propose BOLIMES, a novel feature selection algorithm designed to enhance gene expression classification by systematically refining the feature subset. Unlike conventional methods that rely solely on statistical ranking or classifier-specific selection, we integrate the robustness of Boruta with the interpretability of LIME, ensuring that only the most relevant and influential genes are retained. BOLIMES first employs Boruta to filter out non-informative genes by comparing each feature against its randomized counterpart, thus preserving valuable information. It then uses LIME to rank the remaining genes based on their local importance to the classifier. Finally, an iterative classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Machine Learning in Bioinformatics

MethodsFeature Selection · Local Interpretable Model-Agnostic Explanations