Dual-stage optimizer for systematic overestimation adjustment applied to   multi-objective genetic algorithms for biomarker selection

Luca Cattelani; Vittorio Fortino

arXiv:2312.16624·q-bio.QM·January 3, 2025·1 cites

Dual-stage optimizer for systematic overestimation adjustment applied to multi-objective genetic algorithms for biomarker selection

Luca Cattelani, Vittorio Fortino

PDF

Open Access

TL;DR

This paper introduces DOSA-MO, a novel multi-objective optimization algorithm that reduces overestimation bias during genetic algorithm-based biomarker selection, leading to more accurate models in cancer subtype and survival prediction.

Contribution

DOSA-MO is the first algorithm to reduce overestimation during multi-objective optimization, enhancing model selection accuracy in biomarker discovery from omics data.

Findings

01

DOSA-MO improves model performance on external datasets.

02

It reduces overestimation bias in genetic algorithms.

03

Effective in cancer subtype and survival prediction.

Abstract

The challenge in biomarker discovery using machine learning from omics data lies in the abundance of molecular features but scarcity of samples. Most feature selection methods in machine learning require evaluating various sets of features (models) to determine the most effective combination. This process, typically conducted using a validation dataset, involves testing different feature sets to optimize the model's performance. Evaluations have performance estimation error and when the selection involves many models the best ones are almost certainly overestimated. Biomarker identification with feature selection methods can be addressed as a multi-objective problem with trade-offs between predictive ability and parsimony in the number of features. Genetic algorithms are a popular tool for multi-objective optimization but they evolve numerous solutions thus are prone to overestimation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms · Machine Learning and Data Classification · Gene expression and cancer classification

MethodsSparse Evolutionary Training · Feature Selection