Dual-stage optimizer for systematic overestimation adjustment applied to multi-objective genetic algorithms for biomarker selection
Luca Cattelani, Vittorio Fortino

TL;DR
This paper introduces DOSA-MO, a novel multi-objective optimization algorithm that reduces overestimation bias during genetic algorithm-based biomarker selection, leading to more accurate models in cancer subtype and survival prediction.
Contribution
DOSA-MO is the first algorithm to reduce overestimation during multi-objective optimization, enhancing model selection accuracy in biomarker discovery from omics data.
Findings
DOSA-MO improves model performance on external datasets.
It reduces overestimation bias in genetic algorithms.
Effective in cancer subtype and survival prediction.
Abstract
The challenge in biomarker discovery using machine learning from omics data lies in the abundance of molecular features but scarcity of samples. Most feature selection methods in machine learning require evaluating various sets of features (models) to determine the most effective combination. This process, typically conducted using a validation dataset, involves testing different feature sets to optimize the model's performance. Evaluations have performance estimation error and when the selection involves many models the best ones are almost certainly overestimated. Biomarker identification with feature selection methods can be addressed as a multi-objective problem with trade-offs between predictive ability and parsimony in the number of features. Genetic algorithms are a popular tool for multi-objective optimization but they evolve numerous solutions thus are prone to overestimation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Machine Learning and Data Classification · Gene expression and cancer classification
MethodsSparse Evolutionary Training · Feature Selection
